The “What’s New in Python” series of essays takes tours through the most
important changes between major Python versions. They are a “must read” for
anyone wishing to stay up-to-date after a new release.
This article explains the new features in Python 3.2 as compared to 3.1. It
focuses on a few highlights and gives a few examples. For full details, see the
Misc/NEWS file.
In the past, extension modules built for one Python version were often
not usable with other Python versions. Particularly on Windows, every
feature release of Python required rebuilding all extension modules that
one wanted to use. This requirement was the result of the free access to
Python interpreter internals that extension modules could use.
With Python 3.2, an alternative approach becomes available: extension
modules which restrict themselves to a limited API (by defining
Py_LIMITED_API) cannot use many of the internals, but are constrained
to a set of API functions that are promised to be stable for several
releases. As a consequence, extension modules built for 3.2 in that
mode will also work with 3.3, 3.4, and so on. Extension modules that
make use of details of memory structures can still be built, but will
need to be recompiled for every feature release.
A new module for command line parsing, argparse, was introduced to
overcome the limitations of optparse which did not provide support for
positional arguments (not just options), subcommands, required options and other
common patterns of specifying and validating options.
This module has already had widespread success in the community as a
third-party module. Being more fully featured than its predecessor, the
argparse module is now the preferred module for command-line processing.
The older module is still being kept available because of the substantial amount
of legacy code that depends on it.
Here’s an annotated example parser showing features like limiting results to a
set of choices, specifying a metavar in the help screen, validating that one
or more positional arguments is present, and making a required option:
importargparseparser=argparse.ArgumentParser(description='Manage servers',# main description for helpepilog='Tested on Solaris and Linux')# displayed after helpparser.add_argument('action',# argument namechoices=['deploy','start','stop'],# three allowed valueshelp='action on each target')# help msgparser.add_argument('targets',metavar='HOSTNAME',# var name used in help msgnargs='+',# require one or more targetshelp='url for target machines')# help msg explanationparser.add_argument('-u','--user',# -u or --user optionrequired=True,# make it a required argumenthelp='login as user')
Example of calling the parser on a command string:
Example of the parser’s automatically generated help:
>>> parser.parse_args('-h'.split())usage: manage_cloud.py [-h] -u USER {deploy,start,stop} HOSTNAME [HOSTNAME ...]Manage serverspositional arguments: {deploy,start,stop} action on each target HOSTNAME url for target machinesoptional arguments: -h, --help show this help message and exit -u USER, --user USER login as userTested on Solaris and Linux
An especially nice argparse feature is the ability to define subparsers,
each with their own argument patterns and help displays:
import argparse
parser = argparse.ArgumentParser(prog='HELM')
subparsers = parser.add_subparsers()
parser_l = subparsers.add_parser('launch', help='Launch Control') # first subgroup
parser_l.add_argument('-m', '--missiles', action='store_true')
parser_l.add_argument('-t', '--torpedos', action='store_true')
parser_m = subparsers.add_parser('move', help='Move Vessel', # second subgroup
aliases=('steer', 'turn')) # equivalent names
parser_m.add_argument('-c', '--course', type=int, required=True)
parser_m.add_argument('-s', '--speed', type=int, default=0)
$ ./helm.py --help # top level help (launch and move)
$ ./helm.py launch --help # help for launch options
$ ./helm.py launch --missiles # set missiles=True and torpedos=False
$ ./helm.py steer --course 180 --speed 5 # set movement parameters
PEP 391: Dictionary Based Configuration for Logging¶
The logging module provided two kinds of configuration, one style with
function calls for each option or another style driven by an external file saved
in a ConfigParser format. Those options did not provide the flexibility
to create configurations from JSON or YAML files, nor did they support
incremental configuration, which is needed for specifying logger options from a
command line.
To support a more flexible style, the module now offers
logging.config.dictConfig() for specifying logging configuration with
plain Python dictionaries. The configuration options include formatters,
handlers, filters, and loggers. Here’s a working example of a configuration
dictionary:
Code for creating and managing concurrency is being collected in a new top-level
namespace, concurrent. Its first member is a futures package which provides
a uniform high-level interface for managing threads and processes.
The design for concurrent.futures was inspired by
java.util.concurrent.package. In that model, a running call and its result
are represented by a Future object that abstracts
features common to threads, processes, and remote procedure calls. That object
supports status checks (running or done), timeouts, cancellations, adding
callbacks, and access to results or exceptions.
The primary offering of the new module is a pair of executor classes for
launching and managing calls. The goal of the executors is to make it easier to
use existing tools for making parallel calls. They save the effort needed to
setup a pool of resources, launch the calls, create a results queue, add
time-out handling, and limit the total number of threads, processes, or remote
procedure calls.
Ideally, each application should share a single executor across multiple
components so that process and thread limits can be centrally managed. This
solves the design challenge that arises when each component has its own
competing strategy for resource management.
Both classes share a common interface with three methods:
submit() for scheduling a callable and
returning a Future object;
map() for scheduling many asynchronous calls
at a time, and shutdown() for freeing
resources. The class is a context manager and can be used in a
with statement to assure that resources are automatically released
when currently pending futures are done executing.
A simple of example of ThreadPoolExecutor is a
launch of four parallel threads for copying files:
Python’s scheme for caching bytecode in .pyc files did not work well in
environments with multiple Python interpreters. If one interpreter encountered
a cached file created by another interpreter, it would recompile the source and
overwrite the cached file, thus losing the benefits of caching.
The issue of “pyc fights” has become more pronounced as it has become
commonplace for Linux distributions to ship with multiple versions of Python.
These conflicts also arise with CPython alternatives such as Unladen Swallow.
To solve this problem, Python’s import machinery has been extended to use
distinct filenames for each interpreter. Instead of Python 3.2 and Python 3.3 and
Unladen Swallow each competing for a file called “mymodule.pyc”, they will now
look for “mymodule.cpython-32.pyc”, “mymodule.cpython-33.pyc”, and
“mymodule.unladen10.pyc”. And to prevent all of these new files from
cluttering source directories, the pyc files are now collected in a
“__pycache__” directory stored under the package directory.
Aside from the filenames and target directories, the new scheme has a few
aspects that are visible to the programmer:
Imported modules now have a __cached__ attribute which stores the name
of the actual file that was imported:
The tag that is unique to each interpreter is accessible from the imp
module:
>>> importimp>>> imp.get_tag()'cpython-32'
Scripts that try to deduce source filename from the imported file now need to
be smarter. It is no longer sufficient to simply strip the “c” from a ”.pyc”
filename. Instead, use the new functions in the imp module:
The py_compile and compileall modules have been updated to
reflect the new naming convention and target directory. The command-line
invocation of compileall has new options: -i for
specifying a list of files and directories to compile and -b which causes
bytecode files to be written to their legacy location rather than
__pycache__.
The importlib.abc module has been updated with new abstract base
classes for loading bytecode files. The obsolete
ABCs, PyLoader and
PyPycLoader, have been deprecated (instructions on how
to stay Python 3.1 compatible are included with the documentation).
The PYC repository directory allows multiple bytecode cache files to be
co-located. This PEP implements a similar mechanism for shared object files by
giving them a common directory and distinct names for each version.
The common directory is “pyshared” and the file names are made distinct by
identifying the Python implementation (such as CPython, PyPy, Jython, etc.), the
major and minor version numbers, and optional build flags (such as “d” for
debug, “m” for pymalloc, “u” for wide-unicode). For an arbitrary package “foo”,
you may see these files when the distribution package is installed:
In Python itself, the tags are accessible from functions in the sysconfig
module:
>>> importsysconfig>>> sysconfig.get_config_var('SOABI')# find the version tag'cpython-32mu'>>> sysconfig.get_config_var('SO')# find the full filename extension'.cpython-32mu.so'
PEP 3333: Python Web Server Gateway Interface v1.0.1¶
This informational PEP clarifies how bytes/text issues are to be handled by the
WSGI protocol. The challenge is that string handling in Python 3 is most
conveniently handled with the str type even though the HTTP protocol
is itself bytes oriented.
The PEP differentiates so-called native strings that are used for
request/response headers and metadata versus byte strings which are used for
the bodies of requests and responses.
The native strings are always of type str but are restricted to code
points between U+0000 through U+00FF which are translatable to bytes using
Latin-1 encoding. These strings are used for the keys and values in the
environment dictionary and for response headers and statuses in the
start_response() function. They must follow RFC 2616 with respect to
encoding. That is, they must either be ISO-8859-1 characters or use
RFC 2047 MIME encoding.
For developers porting WSGI applications from Python 2, here are the salient
points:
If the app already used strings for headers in Python 2, no change is needed.
If instead, the app encoded output headers or decoded input headers, then the
headers will need to be re-encoded to Latin-1. For example, an output header
encoded in utf-8 was using h.encode('utf-8') now needs to convert from
bytes to native strings using h.encode('utf-8').decode('latin-1').
Values yielded by an application or sent using the write() method
must be byte strings. The start_response() function and environ
must use native strings. The two cannot be mixed.
For server implementers writing CGI-to-WSGI pathways or other CGI-style
protocols, the users must to be able access the environment using native strings
even though the underlying platform may have a different convention. To bridge
this gap, the wsgiref module has a new function,
wsgiref.handlers.read_environ() for transcoding CGI variables from
os.environ into native strings and returning a new dictionary.
See also
PEP 3333 - Python Web Server Gateway Interface v1.0.1
Some smaller changes made to the core Python language are:
String formatting for format() and str.format() gained new
capabilities for the format character #. Previously, for integers in
binary, octal, or hexadecimal, it caused the output to be prefixed with ‘0b’,
‘0o’, or ‘0x’ respectively. Now it can also handle floats, complex, and
Decimal, causing the output to always have a decimal point even when no digits
follow it.
(Suggested by Mark Dickinson and implemented by Eric Smith in issue 7094.)
There is also a new str.format_map() method that extends the
capabilities of the existing str.format() method by accepting arbitrary
mapping objects. This new method makes it possible to use string
formatting with any of Python’s many dictionary-like objects such as
defaultdict, Shelf,
ConfigParser, or dbm. It is also useful with
custom dict subclasses that normalize keys before look-up or that
supply a __missing__() method for unknown keys:
>>> importshelve>>> d=shelve.open('tmp.shl')>>> 'The {project_name} status is {status} as of {date}'.format_map(d)'The testing project status is green as of February 15, 2011'>>> classLowerCasedDict(dict): def __getitem__(self, key): return dict.__getitem__(self, key.lower())>>> lcd=LowerCasedDict(part='widgets',quantity=10)>>> 'There are {QUANTITY} {Part} in stock'.format_map(lcd)'There are 10 widgets in stock'>>> classPlaceholderDict(dict): def __missing__(self, key): return '<{}>'.format(key)>>> 'Hello {name}, welcome to {location}'.format_map(PlaceholderDict())'Hello <name>, welcome to <location>'
(Suggested by Raymond Hettinger and implemented by Eric Smith in
issue 6081.)
The interpreter can now be started with a quiet option, -q, to prevent
the copyright and version information from being displayed in the interactive
mode. The option can be introspected using the sys.flags attribute:
The hasattr() function works by calling getattr() and detecting
whether an exception is raised. This technique allows it to detect methods
created dynamically by __getattr__() or __getattribute__() which
would otherwise be absent from the class dictionary. Formerly, hasattr
would catch any exception, possibly masking genuine errors. Now, hasattr
has been tightened to only catch AttributeError and let other
exceptions pass through:
>>> classA: @property def f(self): return 1 // 0>>> a=A()>>> hasattr(a,'f')Traceback (most recent call last):...ZeroDivisionError: integer division or modulo by zero
(Discovered by Yury Selivanov and fixed by Benjamin Peterson; issue 9666.)
The str() of a float or complex number is now the same as its
repr(). Previously, the str() form was shorter but that just
caused confusion and is no longer needed now that the shortest possible
repr() is displayed by default:
(Proposed and implemented by Mark Dickinson; issue 9337.)
memoryview objects now have a release() method
and they also now support the context manager protocol. This allows timely
release of any resources that were acquired when requesting a buffer from the
original object.
Previously it was illegal to delete a name from the local namespace if it
occurs as a free variable in a nested block:
defouter(x):definner():returnxinner()delx
This is now allowed. Remember that the target of an except clause
is cleared, so this code which used to work with Python 2.6, raised a
SyntaxError with Python 3.1 and now works again:
deff():defprint_error():print(e)try:somethingexceptExceptionase:print_error()# implicit "del e" here
The internal structsequence tool now creates subclasses of tuple.
This means that C structures like those returned by os.stat(),
time.gmtime(), and sys.version_info now work like a
named tuple and now work with functions and methods that
expect a tuple as an argument. This is a big step forward in making the C
structures as flexible as their pure Python counterparts:
(Suggested by Barry Warsaw and implemented by Philip Jenvey in issue 7301.)
A new warning category, ResourceWarning, has been added. It is
emitted when potential issues with resource consumption or cleanup
are detected. It is silenced by default in normal release builds but
can be enabled through the means provided by the warnings
module, or on the command line.
A ResourceWarning is issued at interpreter shutdown if the
gc.garbage list isn’t empty, and if gc.DEBUG_UNCOLLECTABLE is
set, all uncollectable objects are printed. This is meant to make the
programmer aware that their code contains object finalization issues.
A ResourceWarning is also issued when a file object is destroyed
without having been explicitly closed. While the deallocator for such
object ensures it closes the underlying operating system resource
(usually, a file descriptor), the delay in deallocating the object could
produce various issues, especially under Windows. Here is an example
of enabling the warning from the command line:
$ python -q -Wdefault
>>> f = open("foo", "wb")
>>> del f
__main__:1: ResourceWarning: unclosed file <_io.BufferedWriter name='foo'>
range objects now support index and count methods. This is part
of an effort to make more objects fully implement the
collections.Sequenceabstract base class. As a result, the
language will have a more uniform API. In addition, range objects
now support slicing and negative indices, even with values larger than
sys.maxsize. This makes range more interoperable with lists:
The callable() builtin function from Py2.x was resurrected. It provides
a concise, readable alternative to using an abstract base class in an
expression like isinstance(x,collections.Callable):
Python’s import mechanism can now load modules installed in directories with
non-ASCII characters in the path name. This solved an aggravating problem
with home directories for users with non-ASCII characters in their usernames.
(Required extensive work by Victor Stinner in issue 9425.)
Python’s standard library has undergone significant maintenance efforts and
quality improvements.
The biggest news for Python 3.2 is that the email package, mailbox
module, and nntplib modules now work correctly with the bytes/text model
in Python 3. For the first time, there is correct handling of messages with
mixed encodings.
Throughout the standard library, there has been more careful attention to
encodings and text versus bytes issues. In particular, interactions with the
operating system are now better able to exchange non-ASCII data using the
Windows MBCS encoding, locale-aware encodings, or UTF-8.
Another significant win is the addition of substantially better support for
SSL connections and security certificates.
In addition, more classes now implement a context manager to support
convenient and reliable resource clean-up using a with statement.
The usability of the email package in Python 3 has been mostly fixed by
the extensive efforts of R. David Murray. The problem was that emails are
typically read and stored in the form of bytes rather than str
text, and they may contain multiple encodings within a single email. So, the
email package had to be extended to parse and generate email messages in bytes
format.
Given bytes input to the model, get_payload()
will by default decode a message body that has a
Content-Transfer-Encoding of 8bit using the charset
specified in the MIME headers and return the resulting string.
Given bytes input to the model, Generator will
convert message bodies that have a Content-Transfer-Encoding of
8bit to instead have a 7bitContent-Transfer-Encoding.
Headers with unencoded non-ASCII bytes are deemed to be RFC 2047-encoded
using the unknown-8bit character set.
A new class BytesGenerator produces bytes as output,
preserving any unchanged non-ASCII data that was present in the input used to
build the model, including message bodies with a
Content-Transfer-Encoding of 8bit.
The smtplibSMTP class now accepts a byte string
for the msg argument to the sendmail() method,
and a new method, send_message() accepts a
Message object and can optionally obtain the
from_addr and to_addrs addresses directly from the object.
The functools module includes a new decorator for caching function
calls. functools.lru_cache() can save repeated queries to an external
resource whenever the results are expected to be the same.
For example, adding a caching decorator to a database query function can save
database accesses for popular searches:
>>> importfunctools>>> @functools.lru_cache(maxsize=300)>>> defget_phone_number(name): c = conn.cursor() c.execute('SELECT phonenumber FROM phonelist WHERE name=?', (name,)) return c.fetchone()[0]
The functools.wraps() decorator now adds a __wrapped__ attribute
pointing to the original callable function. This allows wrapped functions to
be introspected. It also copies __annotations__ if defined. And now
it also gracefully skips over missing attributes such as __doc__ which
might not be defined for the wrapped callable.
In the above example, the cache can be removed by recovering the original
function:
>>> get_phone_number=get_phone_number.__wrapped__# uncached function
To help write classes with rich comparison methods, a new decorator
functools.total_ordering() will use a existing equality and inequality
methods to fill in the remaining methods.
For example, supplying __eq__ and __lt__ will enable
total_ordering() to fill-in __le__, __gt__ and __ge__:
The collections.Counter class now has two forms of in-place
subtraction, the existing -= operator for saturating subtraction and the new
subtract() method for regular subtraction. The
former is suitable for multisets
which only have positive counts, and the latter is more suitable for use cases
that allow negative counts:
The collections.OrderedDict class has a new method
move_to_end() which takes an existing key and
moves it to either the first or last position in the ordered sequence.
The default is to move an item to the last position. This is equivalent of
renewing an entry with od[k]=od.pop(k).
A fast move-to-end operation is useful for resequencing entries. For example,
an ordered dictionary can be used to track order of access by aging entries
from the oldest to the most recently accessed.
The threading module has a new Barrier
synchronization class for making multiple threads wait until all of them have
reached a common barrier point. Barriers are useful for making sure that a task
with multiple preconditions does not run until all of the predecessor tasks are
complete.
Barriers can work with an arbitrary number of threads. This is a generalization
of a Rendezvous which
is defined for only two threads.
Implemented as a two-phase cyclic barrier, Barrier objects
are suitable for use in loops. The separate filling and draining phases
assure that all threads get released (drained) before any one of them can loop
back and re-enter the barrier. The barrier fully resets after each cycle.
Example of using barriers:
fromthreadingimportBarrier,Threaddefget_votes(site):ballots=conduct_election(site)all_polls_closed.wait()# do not count until all polls are closedtotals=summarize(ballots)publish(site,totals)all_polls_closed=Barrier(len(sites))forsiteinsites:Thread(target=get_votes,args=(site,)).start()
In this example, the barrier enforces a rule that votes cannot be counted at any
polling site until all polls are closed. Notice how a solution with a barrier
is similar to one with threading.Thread.join(), but the threads stay alive
and continue to do work (summarizing ballots) after the barrier point is
crossed.
If any of the predecessor tasks can hang or be delayed, a barrier can be created
with an optional timeout parameter. Then if the timeout period elapses before
all the predecessor tasks reach the barrier point, all waiting threads are
released and a BrokenBarrierError exception is raised:
In this example, the barrier enforces a more robust rule. If some election
sites do not finish before midnight, the barrier times-out and the ballots are
sealed and deposited in a queue for later handling.
The datetime module has a new type timezone that
implements the tzinfo interface by returning a fixed UTC
offset and timezone name. This makes it easier to create timezone-aware
datetime objects:
Also, timedelta objects can now be multiplied by
float and divided by float and int objects.
And timedelta objects can now divide one another.
The datetime.date.strftime() method is no longer restricted to years
after 1900. The new supported year range is from 1000 to 9999 inclusive.
Whenever a two-digit year is used in a time tuple, the interpretation has been
governed by time.accept2dyear. The default is True which means that
for a two-digit year, the century is guessed according to the POSIX rules
governing the %y strptime format.
Starting with Py3.2, use of the century guessing heuristic will emit a
DeprecationWarning. Instead, it is recommended that
time.accept2dyear be set to False so that large date ranges
can be used without guesswork:
>>> importtime,warnings>>> warnings.resetwarnings()# remove the default warning filters>>> time.accept2dyear=True# guess whether 11 means 11 or 2011>>> time.asctime((11,1,1,12,34,56,4,1,0))Warning (from warnings module): ...DeprecationWarning: Century info guessed for a 2-digit year.'Fri Jan 1 12:34:56 2011'>>> time.accept2dyear=False# use the full range of allowable dates>>> time.asctime((11,1,1,12,34,56,4,1,0))'Fri Jan 1 12:34:56 11'
Several functions now have significantly expanded date ranges. When
time.accept2dyear is false, the time.asctime() function will
accept any year that fits in a C int, while the time.mktime() and
time.strftime() functions will accept the full range supported by the
corresponding operating system functions.
The expm1() function computes e**x-1 for small values of x
without incurring the loss of precision that usually accompanies the subtraction
of nearly equal quantities:
>>> expm1(0.013671875)# more accurate way to compute e**x-1 for a small x0.013765762467652909
>>> erf(1.0/sqrt(2.0))# portion of normal distribution within 1 standard deviation0.682689492137086>>> erfc(1.0/sqrt(2.0))# portion of normal distribution outside 1 standard deviation0.31731050786291404>>> erf(1.0/sqrt(2.0))+erfc(1.0/sqrt(2.0))1.0
The gamma() function is a continuous extension of the factorial
function. See http://en.wikipedia.org/wiki/Gamma_function for details. Because
the function is related to factorials, it grows large even for small values of
x, so there is also a lgamma() function for computing the natural
logarithm of the gamma function:
>>> gamma(7.0)# six factorial720.0>>> lgamma(801.0)# log(800 factorial)4551.950730698041
The io.BytesIO has a new method, getbuffer(), which
provides functionality similar to memoryview(). It creates an editable
view of the data without making a copy. The buffer’s random access and support
for slice notation are well-suited to in-place editing:
When writing a __repr__() method for a custom container, it is easy to
forget to handle the case where a member refers back to the container itself.
Python’s builtin objects such as list and set handle
self-reference by displaying ”...” in the recursive part of the representation
string.
To help write such __repr__() methods, the reprlib module has a new
decorator, recursive_repr(), for detecting recursive calls to
__repr__() and substituting a placeholder string instead:
In addition to dictionary-based configuration described above, the
logging package has many other improvements.
The logging documentation has been augmented by a basic tutorial, an advanced tutorial, and a cookbook of
logging recipes. These documents are the fastest way to learn about logging.
The logging.basicConfig() set-up function gained a style argument to
support three different types of string formatting. It defaults to “%” for
traditional %-formatting, can be set to “{” for the new str.format() style, or
can be set to “$” for the shell-style formatting provided by
string.Template. The following three configurations are equivalent:
If no configuration is set-up before a logging event occurs, there is now a
default configuration using a StreamHandler directed to
sys.stderr for events of WARNING level or higher. Formerly, an
event occurring before a configuration was set-up would either raise an
exception or silently drop the event depending on the value of
logging.raiseExceptions. The new default handler is stored in
logging.lastResort.
The use of filters has been simplified. Instead of creating a
Filter object, the predicate can be any Python callable that
returns True or False.
There were a number of other improvements that add flexibility and simplify
configuration. See the module documentation for a full listing of changes in
Python 3.2.
The csv module now supports a new dialect, unix_dialect,
which applies quoting for all fields and a traditional Unix style with '\n' as
the line terminator. The registered dialect name is unix.
The csv.DictWriter has a new method,
writeheader() for writing-out an initial row to document
the field names:
There is a new and slightly mind-blowing tool
ContextDecorator that is helpful for creating a
context manager that does double duty as a function decorator.
As a convenience, this new functionality is used by
contextmanager() so that no extra effort is needed to support
both roles.
The basic idea is that both context managers and function decorators can be used
for pre-action and post-action wrappers. Context managers wrap a group of
statements using a with statement, and function decorators wrap a
group of statements enclosed in a function. So, occasionally there is a need to
write a pre-action or post-action wrapper that can be used in either role.
For example, it is sometimes useful to wrap functions or groups of statements
with a logger that can track the time of entry and time of exit. Rather than
writing both a function decorator and a context manager for the task, the
contextmanager() provides both capabilities in a single
definition:
Formerly, this would have only been usable as a context manager:
withtrack_entry_and_exit('widget loader'):print('Some time consuming activity goes here')load_widget()
Now, it can be used as a decorator as well:
@track_entry_and_exit('widget loader')defactivity():print('Some time consuming activity goes here')load_widget()
Trying to fulfill two roles at once places some limitations on the technique.
Context managers normally have the flexibility to return an argument usable by
a with statement, but there is no parallel for function decorators.
In the above example, there is not a clean way for the track_entry_and_exit
context manager to return a logging instance for use in the body of enclosed
statements.
Mark Dickinson crafted an elegant and efficient scheme for assuring that
different numeric datatypes will have the same hash value whenever their actual
values are equal (issue 8188):
Some of the hashing details are exposed through a new attribute,
sys.hash_info, which describes the bit width of the hash value, the
prime modulus, the hash values for infinity and nan, and the multiplier
used for the imaginary part of a number:
An early decision to limit the inter-operability of various numeric types has
been relaxed. It is still unsupported (and ill-advised) to have implicit
mixing in arithmetic expressions such as Decimal('1.1')+float('1.1')
because the latter loses information in the process of constructing the binary
float. However, since existing floating point value can be converted losslessly
to either a decimal or rational representation, it makes sense to add them to
the constructor and to support mixed-type comparisons.
Another useful change for the decimal module is that the
Context.clamp attribute is now public. This is useful in creating
contexts that correspond to the decimal interchange formats specified in IEEE
754 (see issue 8540).
(Contributed by Mark Dickinson and Raymond Hettinger.)
The ftplib.FTP class now supports the context manager protocol to
unconditionally consume socket.error exceptions and to close the FTP
connection when done:
The FTP_TLS class now accepts a context parameter, which is a
ssl.SSLContext object allowing bundling SSL configuration options,
certificates and private keys into a single (potentially long-lived) structure.
The select module now exposes a new, constant attribute,
PIPE_BUF, which gives the minimum number of bytes which are
guaranteed not to block when select.select() says a pipe is ready
for writing.
>>> importselect>>> select.PIPE_BUF512
(Available on Unix systems. Patch by Sébastien Sablé in issue 9862)
The gzip module also gains the compress() and
decompress() functions for easier in-memory compression and
decompression. Keep in mind that text needs to be encoded as bytes
before compressing and decompressing:
>>> s='Three shall be the number thou shalt count, '>>> s+='and the number of the counting shall be three'>>> b=s.encode()# convert to utf-8>>> len(b)89>>> c=gzip.compress(b)>>> len(c)77>>> gzip.decompress(c).decode()[:42]# decompress and convert to text'Three shall be the number thou shalt count,'
Also, the zipfile.ZipExtFile class was reworked internally to represent
files stored inside an archive. The new implementation is significantly faster
and can be wrapped in a io.BufferedReader object for more speedups. It
also solves an issue where interleaved calls to read and readline gave the
wrong results.
The TarFile class can now be used as a context manager. In
addition, its add() method has a new option, filter,
that controls which files are added to the archive and allows the file metadata
to be edited.
The new filter option replaces the older, less flexible exclude parameter
which is now deprecated. If specified, the optional filter parameter needs to
be a keyword argument. The user-supplied filter function accepts a
TarInfo object and returns an updated
TarInfo object, or if it wants the file to be excluded, the
function can return None:
>>> importtarfile,glob>>> defmyfilter(tarinfo): if tarinfo.isfile(): # only save real files tarinfo.uname = 'monty' # redact the user name return tarinfo>>> withtarfile.open(name='myarchive.tar.gz',mode='w:gz')astf: for filename in glob.glob('*.txt'): tf.add(filename, filter=myfilter) tf.list()-rw-r--r-- monty/501 902 2011-01-26 17:59:11 annotations.txt-rw-r--r-- monty/501 123 2011-01-26 17:59:11 general_questions.txt-rw-r--r-- monty/501 3514 2011-01-26 17:59:11 prion.txt-rw-r--r-- monty/501 124 2011-01-26 17:59:11 py_todo.txt-rw-r--r-- monty/501 1399 2011-01-26 17:59:11 semaphore_notes.txt
(Proposed by Tarek Ziadé and implemented by Lars Gustäbel in issue 6856.)
The hashlib module has two new constant attributes listing the hashing
algorithms guaranteed to be present in all implementations and those available
on the current implementation:
The ast module has a wonderful a general-purpose tool for safely
evaluating expression strings using the Python literal
syntax. The ast.literal_eval() function serves as a secure alternative to
the builtin eval() function which is easily abused. Python 3.2 adds
bytes and set literals to the list of supported types:
strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
Different operating systems use various encodings for filenames and environment
variables. The os module provides two new functions,
fsencode() and fsdecode(), for encoding and decoding
filenames:
Some operating systems allow direct access to encoded bytes in the
environment. If so, the os.supports_bytes_environ constant will be
true.
For direct access to encoded environment variables (if available),
use the new os.getenvb() function or use os.environb
which is a bytes version of os.environ.
ignore_dangling_symlinks: when symlinks=False so that the function
copies a file pointed to by a symlink, not the symlink itself. This option
will silence the error raised if the file doesn’t exist.
copy_function: is a callable that will be used to copy files.
shutil.copy2() is used by default.
(Contributed by Tarek Ziadé.)
In addition, the shutil module now supports archiving operations for zipfiles, uncompressed tarfiles, gzipped tarfiles,
and bzipped tarfiles. And there are functions for registering additional
archiving file formats (such as xz compressed tarfiles or custom formats).
The principal functions are make_archive() and
unpack_archive(). By default, both operate on the current
directory (which can be set by os.chdir()) and on any sub-directories.
The archive filename needs to be specified with a full pathname. The archiving
step is non-destructive (the original files are left unchanged).
>>> importshutil,pprint>>> os.chdir('mydata')# change to the source directory>>> f=shutil.make_archive('/var/backup/mydata', 'zip') # archive the current directory>>> f# show the name of archive'/var/backup/mydata.zip'>>> os.chdir('tmp')# change to an unpacking>>> shutil.unpack_archive('/var/backup/mydata.zip')# recover the data>>> pprint.pprint(shutil.get_archive_formats())# display known formats[('bztar', "bzip2'ed tar-file"), ('gztar', "gzip'ed tar-file"), ('tar', 'uncompressed tar file'), ('zip', 'ZIP file')]>>> shutil.register_archive_format(# register a new archive format name = 'xz', function = xz.compress, # callable archiving function extra_args = [('level', 8)], # arguments to the function description = 'xz compression')
Socket objects now have a detach() method which puts
the socket into closed state without actually closing the underlying file
descriptor. The latter can then be reused for other purposes.
(Added by Antoine Pitrou; issue 8524.)
socket.create_connection() now supports the context manager protocol
to unconditionally consume socket.error exceptions and to close the
socket when done.
(Contributed by Giampaolo Rodolà; issue 9794.)
The ssl module added a number of features to satisfy common requirements
for secure (encrypted, authenticated) internet connections:
A new class, SSLContext, serves as a container for persistent
SSL data, such as protocol settings, certificates, private keys, and various
other options. It includes a wrap_socket() for creating
an SSL socket from an SSL context.
A new function, ssl.match_hostname(), supports server identity
verification for higher-level protocols by implementing the rules of HTTPS
(from RFC 2818) which are also suitable for other protocols.
The ssl.wrap_socket() constructor function now takes a ciphers
argument. The ciphers string lists the allowed encryption algorithms using
the format described in the OpenSSL documentation.
When linked against recent versions of OpenSSL, the ssl module now
supports the Server Name Indication extension to the TLS protocol, allowing
multiple “virtual hosts” using different certificates on a single IP port.
This extension is only supported in client mode, and is activated by passing
the server_hostname argument to ssl.SSLContext.wrap_socket().
Various options have been added to the ssl module, such as
OP_NO_SSLv2 which disables the insecure and obsolete SSLv2
protocol.
The extension now loads all the OpenSSL ciphers and digest algorithms. If
some SSL certificates cannot be verified, they are reported as an “unknown
algorithm” error.
The nntplib module has a revamped implementation with better bytes and
text semantics as well as more practical APIs. These improvements break
compatibility with the nntplib version in Python 3.1, which was partly
dysfunctional in itself.
There were a number of small API improvements in the http.client module.
The old-style HTTP 0.9 simple responses are no longer supported and the strict
parameter is deprecated in all classes.
The HTTPConnection and
HTTPSConnection classes now have a source_address
parameter for a (host, port) tuple indicating where the HTTP connection is made
from.
Support for certificate checking and HTTPS virtual hosts were added to
HTTPSConnection.
The request() method on connection objects
allowed an optional body argument so that a file object could be used
to supply the content of the request. Conveniently, the body argument now
also accepts an iterable object so long as it includes an explicit
Content-Length header. This extended interface is much more flexible than
before.
To establish an HTTPS connection through a proxy server, there is a new
set_tunnel() method that sets the host and
port for HTTP Connect tunneling.
To match the behavior of http.server, the HTTP client library now also
encodes headers with ISO-8859-1 (Latin-1) encoding. It was already doing that
for incoming headers, so now the behavior is consistent for both incoming and
outgoing traffic. (See work by Armin Ronacher in issue 10980.)
The unittest module has a number of improvements supporting test discovery for
packages, easier experimentation at the interactive prompt, new testcase
methods, improved diagnostic messages for test failures, and better method
names.
The command-line call python-munittest can now accept file paths
instead of module names for running specific tests (issue 10620). The new
test discovery can find tests within packages, locating any test importable
from the top-level directory. The top-level directory can be specified with
the -t option, a pattern for matching files with -p, and a directory to
start discovery with -s:
Another new method, assertCountEqual() is used to
compare two iterables to determine if their element counts are equal (whether
the same elements are present with the same number of occurrences regardless
of order):
A principal feature of the unittest module is an effort to produce meaningful
diagnostics when a test fails. When possible, the failure is recorded along
with a diff of the output. This is especially helpful for analyzing log files
of failed test runs. However, since diffs can sometime be voluminous, there is
a new maxDiff attribute that sets maximum length of
diffs displayed.
In addition, the method names in the module have undergone a number of clean-ups.
For example, assertRegex() is the new name for
assertRegexpMatches() which was misnamed because the
test uses re.search(), not re.match(). Other methods using
regular expressions are now named using short form “Regex” in preference to
“Regexp” – this matches the names used in other unittest implementations,
matches Python’s old name for the re module, and it has unambiguous
camel-casing.
(Contributed by Raymond Hettinger and implemented by Ezio Melotti.)
To improve consistency, some long-standing method aliases are being
deprecated in favor of the preferred names:
Likewise, the TestCase.fail* methods deprecated in Python 3.1 are expected
to be removed in Python 3.3. Also see the Deprecated aliases section in
the unittest documentation.
The assertDictContainsSubset() method was deprecated
because it was misimplemented with the arguments in the wrong order. This
created hard-to-debug optical illusions where tests like
TestCase().assertDictContainsSubset({'a':1,'b':2},{'a':1}) would fail.
The integer methods in the random module now do a better job of producing
uniform distributions. Previously, they computed selections with
int(n*random()) which had a slight bias whenever n was not a power of two.
Now, multiple selections are made from a range up to the next power of two and a
selection is kept only when it falls within the range 0<=x<n. The
functions and methods affected are randrange(),
randint(), choice(), shuffle() and
sample().
POP3_SSL class now accepts a context parameter, which is a
ssl.SSLContext object allowing bundling SSL configuration options,
certificates and private keys into a single (potentially long-lived)
structure.
asyncore.dispatcher now provides a
handle_accepted() method
returning a (sock, addr) pair which is called when a connection has actually
been established with a new remote endpoint. This is supposed to be used as a
replacement for old handle_accept() and avoids
the user to call accept() directly.
(Contributed by Rodolpho Eckhardt and Nick Coghlan, issue 10220.)
To support lookups without the possibility of activating a dynamic attribute,
the inspect module has a new function, getattr_static().
Unlike hasattr(), this is a true read-only search, guaranteed not to
change state while it is searching:
The pydoc module now provides a much-improved Web server interface, as
well as a new command-line option -b to automatically open a browser window
to display that server:
The dis module gained two new functions for inspecting code,
code_info() and show_code(). Both provide detailed code
object information for the supplied function, method, source code string or code
object. The former returns a string and the latter prints it:
>>> importdis,random>>> dis.show_code(random.choice)Name: choiceFilename: /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/random.pyArgument count: 2Kw-only arguments: 0Number of locals: 3Stack size: 11Flags: OPTIMIZED, NEWLOCALS, NOFREEConstants: 0: 'Choose a random element from a non-empty sequence.' 1: 'Cannot choose from an empty sequence'Names: 0: _randbelow 1: len 2: ValueError 3: IndexErrorVariable names: 0: self 1: seq 2: i
In addition, the dis() function now accepts string arguments
so that the common idiom dis(compile(s,'','eval')) can be shortened
to dis(s):
Taken together, these improvements make it easier to explore how CPython is
implemented and to see for yourself what the language syntax does
under-the-hood.
The new sysconfig module makes it straightforward to discover
installation paths and configuration variables that vary across platforms and
installations.
The module offers access simple access functions for platform and version
information:
get_platform() returning values like linux-i586 or
macosx-10.6-ppc.
It also provides access to the paths and variables corresponding to one of
seven named schemes used by distutils. Those include posix_prefix,
posix_home, posix_user, nt, nt_user, os2, os2_home:
get_paths() makes a dictionary containing installation paths
for the current installation scheme.
get_config_vars() returns a dictionary of platform specific
variables.
There is also a convenient command-line interface:
C:\Python32>python-msysconfigPlatform:"win32"Pythonversion:"3.2"Currentinstallationscheme:"nt"Paths:data="C:\Python32"include="C:\Python32\Include"platinclude="C:\Python32\Include"platlib="C:\Python32\Lib\site-packages"platstdlib="C:\Python32\Lib"purelib="C:\Python32\Lib\site-packages"scripts="C:\Python32\Scripts"stdlib="C:\Python32\Lib"Variables:BINDIR="C:\Python32"BINLIBDEST="C:\Python32\Lib"EXE=".exe"INCLUDEPY="C:\Python32\Include"LIBDEST="C:\Python32\Lib"SO=".pyd"VERSION="32"abiflags=""base="C:\Python32"exec_prefix="C:\Python32"platbase="C:\Python32"prefix="C:\Python32"projectbase="C:\Python32"py_version="3.2"py_version_nodot="32"py_version_short="3.2"srcdir="C:\Python32"userbase="C:\Documents and Settings\Raymond\Application Data\Python"
The configparser module was modified to improve usability and
predictability of the default parser and its supported INI syntax. The old
ConfigParser class was removed in favor of SafeConfigParser
which has in turn been renamed to ConfigParser. Support
for inline comments is now turned off by default and section or option
duplicates are not allowed in a single configuration source.
Config parsers gained a new API based on the mapping protocol:
The new API is implemented on top of the classical API, so custom parser
subclasses should be able to use it without modifications.
The INI file structure accepted by config parsers can now be customized. Users
can specify alternative option/value delimiters and comment prefixes, change the
name of the DEFAULT section or switch the interpolation syntax.
There is support for pluggable interpolation including an additional interpolation
handler ExtendedInterpolation:
A number of smaller features were also introduced, like support for specifying
encoding in read operations, specifying fallback values for get-functions, or
reading directly from dictionaries and strings.
And, the urlencode() function is now much more flexible,
accepting either a string or bytes type for the query argument. If it is a
string, then the safe, encoding, and error parameters are sent to
quote_plus() for encoding:
>>> urllib.parse.urlencode([ ('type', 'telenovela'), ('name', '¿Dónde Está Elisa?')], encoding='latin-1')'type=telenovela&name=%BFD%F3nde+Est%E1+Elisa%3F'
As detailed in Parsing ASCII Encoded Bytes, all the urllib.parse
functions now accept ASCII-encoded byte strings as input, so long as they are
not mixed with regular strings. If ASCII-encoded byte strings are given as
parameters, the return types will also be an ASCII-encoded byte strings:
Thanks to a concerted effort by R. David Murray, the mailbox module has
been fixed for Python 3.2. The challenge was that mailbox had been originally
designed with a text interface, but email messages are best represented with
bytes because various parts of a message may have different encodings.
The solution harnessed the email package’s binary support for parsing
arbitrary email messages. In addition, the solution required a number of API
changes.
As expected, the add() method for
mailbox.Mailbox objects now accepts binary input.
StringIO and text file input are deprecated. Also, string input
will fail early if non-ASCII characters are used. Previously it would fail when
the email was processed in a later step.
There is also support for binary output. The get_file()
method now returns a file in the binary mode (where it used to incorrectly set
the file to text-mode). There is also a new get_bytes()
method that returns a bytes representation of a message corresponding
to a given key.
It is still possible to get non-binary output using the old API’s
get_string() method, but that approach
is not very useful. Instead, it is best to extract messages from
a Message object or to load them from binary input.
(Contributed by R. David Murray, with efforts from Steffen Daode Nurpmeso and an
initial patch by Victor Stinner in issue 9124.)
The demonstration code for the turtle module was moved from the Demo
directory to main library. It includes over a dozen sample scripts with
lively displays. Being on sys.path, it can now be run directly
from the command-line:
$ python -m turtledemo
(Moved from the Demo directory by Alexander Belopolsky in issue 10199.)
The mechanism for serializing execution of concurrently running Python threads
(generally known as the GIL or Global Interpreter Lock) has
been rewritten. Among the objectives were more predictable switching
intervals and reduced overhead due to lock contention and the number of
ensuing system calls. The notion of a “check interval” to allow thread
switches has been abandoned and replaced by an absolute duration expressed in
seconds. This parameter is tunable through sys.setswitchinterval().
It currently defaults to 5 milliseconds.
Additional details about the implementation can be read from a python-dev
mailing-list message
(however, “priority requests” as exposed in this message have not been kept
for inclusion).
(Contributed by Antoine Pitrou.)
Regular and recursive locks now accept an optional timeout argument to their
acquire() method. (Contributed by Antoine Pitrou;
issue 7316.)
Regular and recursive lock acquisitions can now be interrupted by signals on
platforms using Pthreads. This means that Python programs that deadlock while
acquiring locks can be successfully killed by repeatedly sending SIGINT to the
process (by pressing Ctrl+C in most shells).
(Contributed by Reid Kleckner; issue 8844.)
A number of small performance enhancements have been added:
Python’s peephole optimizer now recognizes patterns such xin{1,2,3} as
being a test for membership in a set of constants. The optimizer recasts the
set as a frozenset and stores the pre-built constant.
Now that the speed penalty is gone, it is practical to start writing
membership tests using set-notation. This style is both semantically clear
and operationally fast:
(Patch and additional tests contributed by Dave Malcolm; issue 6690).
Serializing and unserializing data using the pickle module is now
several times faster.
(Contributed by Alexandre Vassalotti, Antoine Pitrou
and the Unladen Swallow team in issue 9410 and issue 3873.)
The Timsort algorithm used in
list.sort() and sorted() now runs faster and uses less memory
when called with a key function. Previously, every element of
a list was wrapped with a temporary object that remembered the key value
associated with each element. Now, two arrays of keys and values are
sorted in parallel. This saves the memory consumed by the sort wrappers,
and it saves time lost to delegating comparisons.
JSON decoding performance is improved and memory consumption is reduced
whenever the same string is repeated for multiple keys. Also, JSON encoding
now uses the C speedups when the sort_keys argument is true.
(Contributed by Antoine Pitrou in issue 7451 and by Raymond Hettinger and
Antoine Pitrou in issue 10314.)
Recursive locks (created with the threading.RLock() API) now benefit
from a C implementation which makes them as fast as regular locks, and between
10x and 15x faster than their previous pure Python implementation.
The fast-search algorithm in stringlib is now used by the split(),
splitlines() and replace() methods on
bytes, bytearray and str objects. Likewise, the
algorithm is also used by rfind(), rindex(), rsplit() and
rpartition().
String to integer conversions now work two “digits” at a time, reducing the
number of division and modulo operations.
(issue 6713 by Gawain Bolton, Mark Dickinson, and Victor Stinner.)
There were several other minor optimizations. Set differencing now runs faster
when one operand is much larger than the other (patch by Andress Bennetts in
issue 8685). The array.repeat() method has a faster implementation
(issue 1569291 by Alexander Belopolsky). The BaseHTTPRequestHandler
has more efficient buffering (issue 3709 by Andrew Schaaf). The
operator.attrgetter() function has been sped-up (issue 10160 by
Christos Georgiou). And ConfigParser loads multi-line arguments a bit
faster (issue 7113 by Łukasz Langa).
Python has been updated to Unicode 6.0.0. The update to the standard adds
over 2,000 new characters including emoji
symbols which are important for mobile phones.
In addition, the updated standard has altered the character properties for two
Kannada characters (U+0CF1, U+0CF2) and one New Tai Lue numeric character
(U+19DA), making the former eligible for use in identifiers while disqualifying
the latter. For more information, see Unicode Character Database Changes.
Support was added for cp720 Arabic DOS encoding (issue 1616979).
MBCS encoding no longer ignores the error handler argument. In the default
strict mode, it raises an UnicodeDecodeError when it encounters an
undecodable byte sequence and an UnicodeEncodeError for an unencodable
character.
The MBCS codec supports 'strict' and 'ignore' error handlers for
decoding, and 'strict' and 'replace' for encoding.
To emulate Python3.1 MBCS encoding, select the 'ignore' handler for decoding
and the 'replace' handler for encoding.
On Mac OS X, Python decodes command line arguments with 'utf-8' rather than
the locale encoding.
By default, tarfile uses 'utf-8' encoding on Windows (instead of
'mbcs') and the 'surrogateescape' error handler on all operating
systems.
A table of quick links has been added to the top of lengthy sections such as
内置函数. In the case of itertools, the links are
accompanied by tables of cheatsheet-style summaries to provide an overview and
memory jog without having to read all of the docs.
In some cases, the pure Python source code can be a helpful adjunct to the
documentation, so now many modules now feature quick links to the latest
version of the source code. For example, the functools module
documentation has a quick link at the top labeled:
The datetime module now has an auxiliary implementation in pure Python.
No functionality was changed. This just provides an easier-to-read alternate
implementation.
(Contributed by Alexander Belopolsky in issue 9528.)
The unmaintained Demo directory has been removed. Some demos were
integrated into the documentation, some were moved to the Tools/demo
directory, and others were removed altogether.
After the 3.2 release, there are plans to switch to Mercurial as the primary
repository. This distributed version control system should make it easier for
members of the community to create and share external changesets. See
PEP 385 for details.
Changes to Python’s build process and to the C API include:
The idle, pydoc and 2to3 scripts are now installed with a
version-specific suffix on makealtinstall (issue 10679).
The C functions that access the Unicode Database now accept and return
characters from the full Unicode range, even on narrow unicode builds
(Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others). A visible difference
in Python is that unicodedata.numeric() now returns the correct value
for large code points, and repr() may consider more characters as
printable.
(Reported by Bupjoe Lee and fixed by Amaury Forgeot D’Arc; issue 5127.)
Computed gotos are now enabled by default on supported compilers (which are
detected by the configure script). They can still be disabled selectively by
specifying --without-computed-gotos.
The option --with-wctype-functions was removed. The built-in unicode
database is now used for all functions.
(Contributed by Amaury Forgeot D’Arc; issue 9210.)
Hash values are now values of a new type, Py_hash_t, which is
defined to be the same size as a pointer. Previously they were of type long,
which on some 64-bit operating systems is still only 32 bits long. As a
result of this fix, set and dict can now hold more than
2**32 entries on builds with 64-bit pointers (previously, they could grow
to that size but their performance degraded catastrophically).
(Suggested by Raymond Hettinger and implemented by Benjamin Peterson;
issue 9778.)
A new macro Py_VA_COPY copies the state of the variable argument
list. It is equivalent to C99 va_copy but available on all Python platforms
(issue 2443).
PyEval_CallObject is now only available in macro form. The
function declaration, which was kept for backwards compatibility reasons, is
now removed – the macro was introduced in 1997 (issue 8276).
There is a new function PyErr_NewExceptionWithDoc() that is
like PyErr_NewException() but allows a docstring to be specified.
This lets C exceptions have the same self-documenting capabilities as
their pure Python counterparts (issue 7033).
When compiled with the --with-valgrind option, the pymalloc
allocator will be automatically disabled when running under Valgrind. This
gives improved memory leak detection when running under Valgrind, while taking
advantage of pymalloc at other times (issue 2422).
Removed the O? format from the PyArg_Parse functions. The format is no
longer used and it had never been documented (issue 8837).
There were a number of other small changes to the C-API. See the
Misc/NEWS file for a complete list.
Also, there were a number of updates to the Mac OS X build, see
Mac/BuildScript/README.txt for details. For users running a 32/64-bit
build, there is a known problem with the default Tcl/Tk on Mac OS X 10.6.
Accordingly, we recommend installing an updated alternative such as
ActiveState Tcl/Tk 8.5.9.
See http://www.python.org/download/mac/tcltk/ for additional details.
This section lists previously described changes and other bugfixes that may
require changes to your code:
The configparser module has a number of clean-ups. The major change is
to replace the old ConfigParser class with long-standing preferred
alternative SafeConfigParser. In addition there are a number of
smaller incompatibilities:
The interpolation syntax is now validated on
get() and
set() operations. In the default
interpolation scheme, only two tokens with percent signs are valid: %(name)s
and %%, the latter being an escaped percent sign.
The set() and
add_section() methods now verify that
values are actual strings. Formerly, unsupported types could be introduced
unintentionally.
Duplicate sections or options from a single source now raise either
DuplicateSectionError or
DuplicateOptionError. Formerly, duplicates would
silently overwrite a previous entry.
Inline comments are now disabled by default so now the ; character
can be safely used in values.
Comments now can be indented. Consequently, for ; or # to appear at
the start of a line in multiline values, it has to be interpolated. This
keeps comment prefix characters in values from being mistaken as comments.
"" is now a valid value and is no longer automatically converted to an
empty string. For empty strings, use "option=" in a line.
The nntplib module was reworked extensively, meaning that its APIs
are often incompatible with the 3.1 APIs.
bytearray objects can no longer be used as filenames; instead,
they should be converted to bytes.
The array.tostring() and array.fromstring() have been renamed to
array.tobytes() and array.frombytes() for clarity. The old names
have been deprecated. (See issue 8990.)
PyArg_Parse*() functions:
“t#” format has been removed: use “s#” or “s*” instead
“w” and “w#” formats has been removed: use “w*” instead
The PyCObject type, deprecated in 3.1, has been removed. To wrap
opaque C pointers in Python objects, the PyCapsule API should be used
instead; the new type has a well-defined interface for passing typing safety
information and a less complicated signature for calling a destructor.
The sys.setfilesystemencoding() function was removed because
it had a flawed design.
The random.seed() function and method now salt string seeds with an
sha512 hash function. To access the previous version of seed in order to
reproduce Python 3.1 sequences, set the version argument to 1,
random.seed(s,version=1).
The previously deprecated string.maketrans() function has been removed
in favor of the static methods bytes.maketrans() and
bytearray.maketrans(). This change solves the confusion around which
types were supported by the string module. Now, str,
bytes, and bytearray each have their own maketrans and
translate methods with intermediate translation tables of the appropriate
type.
The previously deprecated contextlib.nested() function has been removed
in favor of a plain with statement which can accept multiple
context managers. The latter technique is faster (because it is built-in),
and it does a better job finalizing multiple context managers when one of them
raises an exception:
struct.pack() now only allows bytes for the s string pack code.
Formerly, it would accept text arguments and implicitly encode them to bytes
using UTF-8. This was problematic because it made assumptions about the
correct encoding and because a variable-length encoding can fail when writing
to fixed length segment of a structure.
Code such as struct.pack('<6sHHBBB','GIF87a',x,y) should be rewritten
with to use bytes instead of text, struct.pack('<6sHHBBB',b'GIF87a',x,y).
(Discovered by David Beazley and fixed by Victor Stinner; issue 10783.)
The new, longer str() value on floats may break doctests which rely on
the old output format.
In subprocess.Popen, the default value for close_fds is now
True under Unix; under Windows, it is True if the three standard
streams are set to None, False otherwise. Previously, close_fds
was always False by default, which produced difficult to solve bugs
or race conditions when open file descriptors would leak into the child
process.
Support for legacy HTTP 0.9 has been removed from urllib.request
and http.client. Such support is still present on the server side
(in http.server).
Regular Python dictionaries iterate over key/value pairs in arbitrary order.
Over the years, a number of authors have written alternative implementations
that remember the order that the keys were originally inserted. Based on
the experiences from those implementations, a new
collections.OrderedDict class has been introduced.
The OrderedDict API is substantially the same as regular dictionaries
but will iterate over keys and values in a guaranteed order depending on
when a key was first inserted. If a new entry overwrites an existing entry,
the original insertion position is left unchanged. Deleting an entry and
reinserting it will move it to the end.
The standard library now supports use of ordered dictionaries in several
modules. The configparser module uses them by default. This lets
configuration files be read, modified, and then written back in their original
order. The _asdict() method for collections.namedtuple() now
returns an ordered dictionary with the values appearing in the same order as
the underlying tuple indicies. The json module is being built-out with
an object_pairs_hook to allow OrderedDicts to be built by the decoder.
Support was also added for third-party tools like PyYAML.
PEP written by Armin Ronacher and Raymond Hettinger. Implementation
written by Raymond Hettinger.
PEP 378: Format Specifier for Thousands Separator¶
The built-in format() function and the str.format() method use
a mini-language that now includes a simple, non-locale aware way to format
a number with a thousands separator. That provides a way to humanize a
program’s output, improving its professional appearance and readability:
Discussions are underway about how to specify alternative separators
like dots, spaces, apostrophes, or underscores. Locale-aware applications
should use the existing n format specifier which already has some support
for thousands separators.
See also
PEP 378 - Format Specifier for Thousands Separator
PEP written by Raymond Hettinger and implemented by Eric Smith and
Mark Dickinson.
Some smaller changes made to the core Python language are:
Directories and zip archives containing a __main__.py
file can now be executed directly by passing their name to the
interpreter. The directory/zipfile is automatically inserted as the
first entry in sys.path. (Suggestion and initial patch by Andy Chu;
revised patch by Phillip J. Eby and Nick Coghlan; issue 1739468.)
The int() type gained a bit_length method that returns the
number of bits necessary to represent its argument in binary:
The string.maketrans() function is deprecated and is replaced by new
static methods, bytes.maketrans() and bytearray.maketrans().
This change solves the confusion around which types were supported by the
string module. Now, str, bytes, and
bytearray each have their own maketrans and translate
methods with intermediate translation tables of the appropriate type.
Python now uses David Gay’s algorithm for finding the shortest floating
point representation that doesn’t change its value. This should help
mitigate some of the confusion surrounding binary floating point
numbers.
The significance is easily seen with a number like 1.1 which does not
have an exact equivalent in binary floating point. Since there is no exact
equivalent, an expression like float('1.1') evaluates to the nearest
representable value which is 0x1.199999999999ap+0 in hex or
1.100000000000000088817841970012523233890533447265625 in decimal. That
nearest value was and still is used in subsequent floating point
calculations.
What is new is how the number gets displayed. Formerly, Python used a
simple approach. The value of repr(1.1) was computed as format(1.1,'.17g') which evaluated to '1.1000000000000001'. The advantage of
using 17 digits was that it relied on IEEE-754 guarantees to assure that
eval(repr(1.1)) would round-trip exactly to its original value. The
disadvantage is that many people found the output to be confusing (mistaking
intrinsic limitations of binary floating point representation as being a
problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'.
Effectively, it searches all equivalent string representations (ones that
get stored with the same underlying float value) and returns the shortest
representation.
The new algorithm tends to emit cleaner representations when possible, but
it does not change the underlying values. So, it is still the case that
1.1+2.2!=3.3 even though the representations may suggest otherwise.
The new algorithm depends on certain features in the underlying floating
point implementation. If the required features are not found, the old
algorithm will continue to be used. Also, the text pickle protocols
assure cross-platform portability by using the old algorithm.
(Contributed by Eric Smith and Mark Dickinson; issue 1580)
Added a new module, tkinter.ttk for access to the Tk themed widget set.
The basic idea of ttk is to separate, to the extent possible, the code
implementing a widget’s behavior from the code implementing its appearance.
The long decimal result shows the actual binary fraction being
stored for 1.1. The fraction has many digits because 1.1 cannot
be exactly represented in binary.
(Contributed by Raymond Hettinger and Mark Dickinson.)
collections.namedtuple() now supports a keyword argument
rename which lets invalid fieldnames be automatically converted to
positional names in the form _0, _1, etc. This is useful when
the field names are being created by an external source such as a
CSV header, SQL field list, or user input:
The logging module now implements a simple logging.NullHandler
class for applications that are not using logging but are calling
library code that does. Setting-up a null handler will suppress
spurious warnings such as “No handlers could be found for logger foo”:
The runpy module which supports the -m command line switch
now supports the execution of packages by looking for and executing
a __main__ submodule when a package name is supplied.
The unittest module now supports skipping individual tests or classes
of tests. And it supports marking a test as a expected failure, a test that
is known to be broken, but shouldn’t be counted as a failure on a
TestResult:
In addition, several new assertion methods were added including
assertSetEqual(), assertDictEqual(),
assertDictContainsSubset(), assertListEqual(),
assertTupleEqual(), assertSequenceEqual(),
assertRaisesRegexp(), assertIsNone(),
and assertIsNotNone().
(Contributed by Benjamin Peterson and Antoine Pitrou.)
The io module has three new constants for the seek()
method SEEK_SET, SEEK_CUR, and SEEK_END.
The pickle module has been adapted for better interoperability with
Python 2.x when used with protocol 2 or lower. The reorganization of the
standard library changed the formal reference for many objects. For
example, __builtin__.set in Python 2 is called builtins.set in Python
3. This change confounded efforts to share data between different versions of
Python. But now when protocol 2 or lower is selected, the pickler will
automatically use the old Python 2 names for both loading and dumping. This
remapping is turned-on by default but can be disabled with the fix_imports
option:
An unfortunate but unavoidable side-effect of this change is that protocol 2
pickles produced by Python 3.1 won’t be readable with Python 3.0. The latest
pickle protocol, protocol 3, should be used when migrating data between
Python 3.x implementations, as it doesn’t attempt to remain compatible with
Python 2.x.
(Contributed by Alexandre Vassalotti and Antoine Pitrou, issue 6137.)
A new module, importlib was added. It provides a complete, portable,
pure Python reference implementation of the import statement and its
counterpart, the __import__() function. It represents a substantial
step forward in documenting and defining the actions that take place during
imports.
The new I/O library (as defined in PEP 3116) was mostly written in
Python and quickly proved to be a problematic bottleneck in Python 3.0.
In Python 3.1, the I/O library has been entirely rewritten in C and is
2 to 20 times faster depending on the task at hand. The pure Python
version is still available for experimentation purposes through
the _pyio module.
(Contributed by Amaury Forgeot d’Arc and Antoine Pitrou.)
Added a heuristic so that tuples and dicts containing only untrackable objects
are not tracked by the garbage collector. This can reduce the size of
collections and therefore the garbage collection overhead on long-running
programs, depending on their particular use of datatypes.
Enabling a configure option named --with-computed-gotos
on compilers that support it (notably: gcc, SunPro, icc), the bytecode
evaluation loop is compiled with a new dispatch mechanism which gives
speedups of up to 20%, depending on the system, the compiler, and
the benchmark.
(Contributed by Antoine Pitrou along with a number of other participants,
issue 4753).
The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times
faster.
(Contributed by Antoine Pitrou and Amaury Forgeot d’Arc, issue 4868.)
The json module now has a C extension to substantially improve
its performance. In addition, the API was modified so that json works
only with str, not with bytes. That change makes the
module closely match the JSON specification
which is defined in terms of Unicode.
(Contributed by Bob Ippolito and converted to Py3.1 by Antoine Pitrou
and Benjamin Peterson; issue 4136.)
Unpickling now interns the attribute names of pickled objects. This saves
memory and allows pickles to be smaller.
(Contributed by Jake McGuire and Antoine Pitrou; issue 5084.)
Changes to Python’s build process and to the C API include:
Integers are now stored internally either in base 2**15 or in base
2**30, the base being determined at build time. Previously, they
were always stored in base 2**15. Using base 2**30 gives
significant performance improvements on 64-bit machines, but
benchmark results on 32-bit machines have been mixed. Therefore,
the default is to use base 2**30 on 64-bit machines and base 2**15
on 32-bit machines; on Unix, there’s a new configure option
--enable-big-digits that can be used to override this default.
Apart from the performance improvements this change should be invisible to
end users, with one exception: for testing and debugging purposes there’s a
new sys.int_info that provides information about the
internal format, giving the number of bits per digit and the size in bytes
of the C type used to store each digit:
Added PyCapsule as a replacement for the PyCObject API.
The principal difference is that the new type has a well defined interface
for passing typing safety information and a less complicated signature
for calling a destructor. The old type had a problematic API and is now
deprecated.
This section lists previously described changes and other bugfixes
that may require changes to your code:
The new floating point string representations can break existing doctests.
For example:
defe():'''Compute the base of natural logarithms. >>> e() 2.7182818284590451 '''returnsum(1/math.factorial(x)forxinreversed(range(30)))doctest.testmod()**********************************************************************Failedexample:e()Expected:2.7182818284590451Got:2.718281828459045**********************************************************************
The automatic name remapping in the pickle module for protocol 2 or lower can
make Python 3.1 pickles unreadable in Python 3.0. One solution is to use
protocol 3. Another solution is to set the fix_imports option to False.
See the discussion above for more details.
This article explains the new features in Python 3.0, compared to 2.6.
Python 3.0, also known as “Python 3000” or “Py3K”, is the first ever
intentionally backwards incompatible Python release. There are more
changes than in a typical release, and more that are important for all
Python users. Nevertheless, after digesting the changes, you’ll find
that Python really hasn’t changed all that much – by and large, we’re
mostly fixing well-known annoyances and warts, and removing a lot of
old cruft.
This article doesn’t attempt to provide a complete specification of
all new features, but instead tries to give a convenient overview.
For full details, you should refer to the documentation for Python
3.0, and/or the many PEPs referenced in the text. If you want to
understand the complete implementation and design rationale for a
particular feature, PEPs usually have more details than the regular
documentation; but note that PEPs usually are not kept up-to-date once
a feature has been fully implemented.
Due to time constraints this document is not as complete as it should
have been. As always for a new release, the Misc/NEWS file in the
source distribution contains a wealth of detailed information about
every small thing that was changed.
The print statement has been replaced with a print()
function, with keyword arguments to replace most of the special syntax
of the old print statement (PEP 3105). Examples:
Old:print"The answer is",2*2New:print("The answer is",2*2)Old:printx,# Trailing comma suppresses newlineNew:print(x,end=" ")# Appends a space instead of a newlineOld:print# Prints a newlineNew:print()# You must call the function!Old:print>>sys.stderr,"fatal error"New:print("fatal error",file=sys.stderr)Old:print(x,y)# prints repr((x, y))New:print((x,y))# Not the same as print(x, y)!
You can also customize the separator between items, e.g.:
print("There are <",2**32,"> possibilities!",sep="")
which produces:
There are <4294967296> possibilities!
Note:
The print() function doesn’t support the “softspace” feature of
the old print statement. For example, in Python 2.x,
print"A\n","B" would write "A\nB\n"; but in Python 3.0,
print("A\n","B") writes "A\nB\n".
Initially, you’ll be finding yourself typing the old printx
a lot in interactive mode. Time to retrain your fingers to type
print(x) instead!
When using the 2to3 source-to-source conversion tool, all
print statements are automatically converted to
print() function calls, so this is mostly a non-issue for
larger projects.
dict methods dict.keys(), dict.items() and
dict.values() return “views” instead of lists. For example,
this no longer works: k=d.keys();k.sort(). Use k=sorted(d) instead (this works in Python 2.5 too and is just
as efficient).
Also, the dict.iterkeys(), dict.iteritems() and
dict.itervalues() methods are no longer supported.
map() and filter() return iterators. If you really need
a list, a quick fix is e.g. list(map(...)), but a better fix is
often to use a list comprehension (especially when the original code
uses lambda), or rewriting the code so it doesn’t need a
list at all. Particularly tricky is map() invoked for the
side effects of the function; the correct transformation is to use a
regular for loop (since creating a list would just be
wasteful).
range() now behaves like xrange() used to behave, except
it works with values of arbitrary size. The latter no longer
exists.
Python 3.0 has simplified the rules for ordering comparisons:
The ordering comparison operators (<, <=, >=, >)
raise a TypeError exception when the operands don’t have a
meaningful natural ordering. Thus, expressions like 1<'', 0>None or len<=len are no longer valid, and e.g. None<None raises TypeError instead of returning
False. A corollary is that sorting a heterogeneous list
no longer makes sense – all the elements must be comparable to each
other. Note that this does not apply to the == and !=
operators: objects of different incomparable types always compare
unequal to each other.
builtin.sorted() and list.sort() no longer accept the
cmp argument providing a comparison function. Use the key
argument instead. N.B. the key and reverse arguments are now
“keyword-only”.
The cmp() function should be treated as gone, and the __cmp__()
special method is no longer supported. Use __lt__() for sorting,
__eq__() with __hash__(), and other rich comparisons as needed.
(If you really need the cmp() functionality, you could use the
expression (a>b)-(a<b) as the equivalent for cmp(a,b).)
PEP 0237: Essentially, long renamed to int.
That is, there is only one built-in integral type, named
int; but it behaves mostly like the old long type.
PEP 0238: An expression like 1/2 returns a float. Use
1//2 to get the truncating behavior. (The latter syntax has
existed for years, at least since Python 2.2.)
The sys.maxint constant was removed, since there is no
longer a limit to the value of integers. However, sys.maxsize
can be used as an integer larger than any practical list or string
index. It conforms to the implementation’s “natural” integer size
and is typically the same as sys.maxint in previous releases
on the same platform (assuming the same build options).
The repr() of a long integer doesn’t include the trailing L
anymore, so code that unconditionally strips that character will
chop off the last digit instead. (Use str() instead.)
Octal literals are no longer of the form 0720; use 0o720
instead.
Everything you thought you knew about binary data and Unicode has
changed.
Python 3.0 uses the concepts of text and (binary) data instead
of Unicode strings and 8-bit strings. All text is Unicode; however
encoded Unicode is represented as binary data. The type used to
hold text is str, the type used to hold data is
bytes. The biggest difference with the 2.x situation is
that any attempt to mix text and data in Python 3.0 raises
TypeError, whereas if you were to mix Unicode and 8-bit
strings in Python 2.x, it would work if the 8-bit string happened to
contain only 7-bit (ASCII) bytes, but you would get
UnicodeDecodeError if it contained non-ASCII values. This
value-specific behavior has caused numerous sad faces over the
years.
As a consequence of this change in philosophy, pretty much all code
that uses Unicode, encodings or binary data most likely has to
change. The change is for the better, as in the 2.x world there
were numerous bugs having to do with mixing encoded and unencoded
text. To be prepared in Python 2.x, start using unicode
for all unencoded text, and str for binary or encoded data
only. Then the 2to3 tool will do most of the work for you.
You can no longer use u"..." literals for Unicode text.
However, you must use b"..." literals for binary data.
As the str and bytes types cannot be mixed, you
must always explicitly convert between them. Use str.encode()
to go from str to bytes, and bytes.decode()
to go from bytes to str. You can also use
bytes(s,encoding=...) and str(b,encoding=...),
respectively.
All backslashes in raw string literals are interpreted literally.
This means that '\U' and '\u' escapes in raw strings are not
treated specially. For example, r'\u20ac' is a string of 6
characters in Python 3.0, whereas in 2.6, ur'\u20ac' was the
single “euro” character. (Of course, this change only affects raw
string literals; the euro character is '\u20ac' in Python 3.0.)
The built-in basestring abstract type was removed. Use
str instead. The str and bytes types
don’t have functionality enough in common to warrant a shared base
class. The 2to3 tool (see below) replaces every occurrence of
basestring with str.
Files opened as text files (still the default mode for open())
always use an encoding to map between strings (in memory) and bytes
(on disk). Binary files (opened with a b in the mode argument)
always use bytes in memory. This means that if a file is opened
using an incorrect mode or encoding, I/O will likely fail loudly,
instead of silently producing incorrect data. It also means that
even Unix users will have to specify the correct mode (text or
binary) when opening a file. There is a platform-dependent default
encoding, which on Unixy platforms can be set with the LANG
environment variable (and sometimes also with some other
platform-specific locale-related environment variables). In many
cases, but not all, the system default is UTF-8; you should never
count on this default. Any application reading or writing more than
pure ASCII text should probably have a way to override the encoding.
There is no longer any need for using the encoding-aware streams
in the codecs module.
Filenames are passed to and returned from APIs as (Unicode) strings.
This can present platform-specific problems because on some
platforms filenames are arbitrary byte strings. (On the other hand,
on Windows filenames are natively stored as Unicode.) As a
work-around, most APIs (e.g. open() and many functions in the
os module) that take filenames accept bytes objects
as well as strings, and a few APIs have a way to ask for a
bytes return value. Thus, os.listdir() returns a
list of bytes instances if the argument is a bytes
instance, and os.getcwdb() returns the current working
directory as a bytes instance. Note that when
os.listdir() returns a list of strings, filenames that
cannot be decoded properly are omitted rather than raising
UnicodeError.
Some system APIs like os.environ and sys.argv can
also present problems when the bytes made available by the system is
not interpretable using the default encoding. Setting the LANG
variable and rerunning the program is probably the best approach.
PEP 3138: The repr() of a string no longer escapes
non-ASCII characters. It still escapes control characters and code
points with non-printable status in the Unicode standard, however.
PEP 3120: The default source encoding is now UTF-8.
PEP 3131: Non-ASCII letters are now allowed in identifiers.
(However, the standard library remains ASCII-only with the exception
of contributor names in comments.)
The StringIO and cStringIO modules are gone. Instead,
import the io module and use io.StringIO or
io.BytesIO for text and data respectively.
See also the Unicode HOWTO, which was updated for Python 3.0.
PEP 3107: Function argument and return value annotations. This
provides a standardized way of annotating a function’s parameters
and return value. There are no semantics attached to such
annotations except that they can be introspected at runtime using
the __annotations__ attribute. The intent is to encourage
experimentation through metaclasses, decorators or frameworks.
PEP 3102: Keyword-only arguments. Named parameters occurring
after *args in the parameter list must be specified using
keyword syntax in the call. You can also use a bare * in the
parameter list to indicate that you don’t accept a variable-length
argument list, but you do have keyword-only arguments.
Keyword arguments are allowed after the list of base classes in a
class definition. This is used by the new convention for specifying
a metaclass (see next section), but can be used for other purposes
as well, as long as the metaclass supports it.
PEP 3104: nonlocal statement. Using nonlocalx
you can now assign directly to a variable in an outer (but
non-global) scope. nonlocal is a new reserved word.
PEP 3132: Extended Iterable Unpacking. You can now write things
like a,b,*rest=some_sequence. And even *rest,a=stuff. The rest object is always a (possibly empty) list; the
right-hand side may be any iterable. Example:
(a,*rest,b)=range(5)
This sets a to 0, b to 4, and rest to [1,2,3].
Dictionary comprehensions: {k:vfork,vinstuff} means the
same thing as dict(stuff) but is more flexible. (This is
PEP 0274 vindicated. :-)
Set literals, e.g. {1,2}. Note that {} is an empty
dictionary; use set() for an empty set. Set comprehensions are
also supported; e.g., {xforxinstuff} means the same thing as
set(stuff) but is more flexible.
New octal literals, e.g. 0o720 (already in 2.6). The old octal
literals (0720) are gone.
New binary literals, e.g. 0b1010 (already in 2.6), and
there is a new corresponding built-in function, bin().
Bytes literals are introduced with a leading b or B, and
there is a new corresponding built-in function, bytes().
The module-global __metaclass__ variable is no longer
supported. (It was a crutch to make it easier to default to
new-style classes without deriving every class from
object.)
List comprehensions no longer support the syntactic form
[...forvarinitem1,item2,...]. Use
[...forvarin(item1,item2,...)] instead.
Also note that list comprehensions have different semantics: they
are closer to syntactic sugar for a generator expression inside a
list() constructor, and in particular the loop control
variables are no longer leaked into the surrounding scope.
The ellipsis (...) can be used as an atomic expression
anywhere. (Previously it was only allowed in slices.) Also, it
must now be spelled as .... (Previously it could also be
spelled as ..., by a mere accident of the grammar.)
Removed keyword: exec() is no longer a keyword; it remains as
a function. (Fortunately the function syntax was also accepted in
2.x.) Also note that exec() no longer takes a stream argument;
instead of exec(f) you can use exec(f.read()).
Integer literals no longer support a trailing l or L.
String literals no longer support a leading u or U.
The frommoduleimport* syntax is only
allowed at the module level, no longer inside functions.
The only acceptable syntax for relative imports is from.[module]importname. All import forms not starting with . are
interpreted as absolute imports. (PEP 0328)
Since many users presumably make the jump straight from Python 2.5 to
Python 3.0, this section reminds the reader of new features that were
originally designed for Python 3.0 but that were back-ported to Python
2.6. The corresponding sections in What’s New in Python 2.6 should be
consulted for longer descriptions.
PEP 3101: Advanced String Formatting. Note: the 2.6 description mentions the
format() method for both 8-bit and Unicode strings. In 3.0,
only the str type (text strings with Unicode support)
supports this method; the bytes type does not. The plan is
to eventually make this the only API for string formatting, and to
start deprecating the % operator in Python 3.1.
PEP 3112: Byte Literals. The b"..." string literal notation (and its
variants like b'...', b"""...""", and br"...") now
produces a literal of type bytes.
PEP 3116: New I/O Library. The io module is now the standard way of
doing file I/O, and the initial values of sys.stdin,
sys.stdout and sys.stderr are now instances of
io.TextIOBase. The built-in open() function is now an
alias for io.open() and has additional keyword arguments
encoding, errors, newline and closefd. Also note that an
invalid mode argument now raises ValueError, not
IOError. The binary file object underlying a text file
object can be accessed as f.buffer (but beware that the
text object maintains a buffer of itself in order to speed up
the encoding and decoding operations).
Due to time constraints, this document does not exhaustively cover the
very extensive changes to the standard library. PEP 3108 is the
reference for the major changes to the library. Here’s a capsule
review:
Many old modules were removed. Some, like gopherlib (no
longer used) and md5 (replaced by hashlib), were
already deprecated by PEP 0004. Others were removed as a result
of the removal of support for various platforms such as Irix, BeOS
and Mac OS 9 (see PEP 0011). Some modules were also selected for
removal in Python 3.0 due to lack of use or because a better
replacement exists. See PEP 3108 for an exhaustive list.
The bsddb3 package was removed because its presence in the
core standard library has proved over time to be a particular burden
for the core developers due to testing instability and Berkeley DB’s
release schedule. However, the package is alive and well,
externally maintained at http://www.jcea.es/programacion/pybsddb.htm.
Some modules were renamed because their old name disobeyed
PEP 0008, or for various other reasons. Here’s the list:
Old Name
New Name
_winreg
winreg
ConfigParser
configparser
copy_reg
copyreg
Queue
queue
SocketServer
socketserver
markupbase
_markupbase
repr
reprlib
test.test_support
test.support
A common pattern in Python 2.x is to have one version of a module
implemented in pure Python, with an optional accelerated version
implemented as a C extension; for example, pickle and
cPickle. This places the burden of importing the accelerated
version and falling back on the pure Python version on each user of
these modules. In Python 3.0, the accelerated versions are
considered implementation details of the pure Python versions.
Users should always import the standard version, which attempts to
import the accelerated version and falls back to the pure Python
version. The pickle / cPickle pair received this
treatment. The profile module is on the list for 3.1. The
StringIO module has been turned into a class in the io
module.
Some related modules have been grouped into packages, and usually
the submodule names have been simplified. The resulting new
packages are:
tkinter (all Tkinter-related modules except
turtle). The target audience of turtle doesn’t
really care about tkinter. Also note that as of Python
2.6, the functionality of turtle has been greatly enhanced.
Cleanup of the sys module: removed sys.exitfunc(),
sys.exc_clear(), sys.exc_type, sys.exc_value,
sys.exc_traceback. (Note that sys.last_type
etc. remain.)
Cleanup of the array.array type: the read() and
write() methods are gone; use fromfile() and
tofile() instead. Also, the 'c' typecode for array is
gone – use either 'b' for bytes or 'u' for Unicode
characters.
Cleanup of the operator module: removed
sequenceIncludes() and isCallable().
Cleanup of the thread module: acquire_lock() and
release_lock() are gone; use acquire() and
release() instead.
Cleanup of the random module: removed the jumpahead() API.
The new module is gone.
The functions os.tmpnam(), os.tempnam() and
os.tmpfile() have been removed in favor of the tempfile
module.
The tokenize module has been changed to work with bytes. The
main entry point is now tokenize.tokenize(), instead of
generate_tokens.
string.letters and its friends (string.lowercase and
string.uppercase) are gone. Use
string.ascii_letters etc. instead. (The reason for the
removal is that string.letters and friends had
locale-specific behavior, which is a bad idea for such
attractively-named global “constants”.)
Renamed module __builtin__ to builtins (removing the
underscores, adding an ‘s’). The __builtins__ variable
found in most global namespaces is unchanged. To modify a builtin,
you should use builtins, not __builtins__!
A new system for built-in string formatting operations replaces the
% string formatting operator. (However, the % operator is
still supported; it will be deprecated in Python 3.1 and removed
from the language at some later time.) Read PEP 3101 for the full
scoop.
The APIs for raising and catching exception have been cleaned up and
new powerful features added:
PEP 0352: All exceptions must be derived (directly or indirectly)
from BaseException. This is the root of the exception
hierarchy. This is not new as a recommendation, but the
requirement to inherit from BaseException is new. (Python
2.6 still allowed classic classes to be raised, and placed no
restriction on what you can catch.) As a consequence, string
exceptions are finally truly and utterly dead.
Almost all exceptions should actually derive from Exception;
BaseException should only be used as a base class for
exceptions that should only be handled at the top level, such as
SystemExit or KeyboardInterrupt. The recommended
idiom for handling all exceptions except for this latter category is
to use exceptException.
StandardError was removed.
Exceptions no longer behave as sequences. Use the args
attribute instead.
PEP 3109: Raising exceptions. You must now use raiseException(args) instead of raiseException,args.
Additionally, you can no longer explicitly specify a traceback;
instead, if you have to do this, you can assign directly to the
__traceback__ attribute (see below).
PEP 3110: Catching exceptions. You must now use
exceptSomeExceptionasvariable instead
of exceptSomeException,variable. Moreover, the
variable is explicitly deleted when the except block
is left.
PEP 3134: Exception chaining. There are two cases: implicit
chaining and explicit chaining. Implicit chaining happens when an
exception is raised in an except or finally
handler block. This usually happens due to a bug in the handler
block; we call this a secondary exception. In this case, the
original exception (that was being handled) is saved as the
__context__ attribute of the secondary exception.
Explicit chaining is invoked with this syntax:
raiseSecondaryException()fromprimary_exception
(where primary_exception is any expression that produces an
exception object, probably an exception that was previously caught).
In this case, the primary exception is stored on the
__cause__ attribute of the secondary exception. The
traceback printed when an unhandled exception occurs walks the chain
of __cause__ and __context__ attributes and prints a
separate traceback for each component of the chain, with the primary
exception at the top. (Java users may recognize this behavior.)
PEP 3134: Exception objects now store their traceback as the
__traceback__ attribute. This means that an exception
object now contains all the information pertaining to an exception,
and there are fewer reasons to use sys.exc_info() (though the
latter is not removed).
A few exception messages are improved when Windows fails to load an
extension module. For example, errorcode193 is now %1isnotavalidWin32application. Strings now deal with non-English
locales.
!= now returns the opposite of ==, unless == returns
NotImplemented.
The concept of “unbound methods” has been removed from the language.
When referencing a method as a class attribute, you now get a plain
function object.
__getslice__(), __setslice__() and __delslice__()
were killed. The syntax a[i:j] now translates to
a.__getitem__(slice(i,j)) (or __setitem__() or
__delitem__(), when used as an assignment or deletion target,
respectively).
PEP 3114: the standard next() method has been renamed to
__next__().
The __oct__() and __hex__() special methods are removed
– oct() and hex() use __index__() now to convert
the argument to an integer.
Removed support for __members__ and __methods__.
The function attributes named func_X have been renamed to
use the __X__ form, freeing up these names in the function
attribute namespace for user-defined attributes. To wit,
func_closure, func_code, func_defaults,
func_dict, func_doc, func_globals,
func_name were renamed to __closure__,
__code__, __defaults__, __dict__,
__doc__, __globals__, __name__,
respectively.
PEP 3135: New super(). You can now invoke super()
without arguments and (assuming this is in a regular instance method
defined inside a class statement) the right class and
instance will automatically be chosen. With arguments, the behavior
of super() is unchanged.
PEP 3111: raw_input() was renamed to input(). That
is, the new input() function reads a line from
sys.stdin and returns it with the trailing newline stripped.
It raises EOFError if the input is terminated prematurely.
To get the old behavior of input(), use eval(input()).
A new built-in function next() was added to call the
__next__() method on an object.
The round() function rounding strategy and return type have
changed. Exact halfway cases are now rounded to the nearest even
result instead of away from zero. (For example, round(2.5) now
returns 2 rather than 3.) round(x[,n])() now
delegates to x.__round__([n]) instead of always returning a
float. It generally returns an integer when called with a single
argument and a value of the same type as x when called with two
arguments.
The net result of the 3.0 generalizations is that Python 3.0 runs the
pystone benchmark around 10% slower than Python 2.5. Most likely the
biggest cause is the removal of special-casing for small integers.
There’s room for improvement, but it will happen after 3.0 is
released!
For porting existing Python 2.5 or 2.6 source code to Python 3.0, the
best strategy is the following:
(Prerequisite:) Start with excellent test coverage.
Port to Python 2.6. This should be no more work than the average
port from Python 2.x to Python 2.(x+1). Make sure all your tests
pass.
(Still using 2.6:) Turn on the -3 command line switch.
This enables warnings about features that will be removed (or
change) in 3.0. Run your test suite again, and fix code that you
get warnings about until there are no warnings left, and all your
tests still pass.
Run the 2to3 source-to-source translator over your source code
tree. (See 2to3 - Automated Python 2 to 3 code translation for more on this tool.) Run the
result of the translation under Python 3.0. Manually fix up any
remaining issues, fixing problems until all tests pass again.
It is not recommended to try to write source code that runs unchanged
under both Python 2.6 and 3.0; you’d have to use a very contorted
coding style, e.g. avoiding print statements, metaclasses,
and much more. If you are maintaining a library that needs to support
both Python 2.6 and Python 3.0, the best approach is to modify step 3
above by editing the 2.6 version of the source code and running the
2to3 translator again, rather than editing the 3.0 version of the
source code.
This article explains the new features in Python 2.7. The final
release of 2.7 is currently scheduled for July 2010; the detailed
schedule is described in PEP 373.
Numeric handling has been improved in many ways, for both
floating-point numbers and for the Decimal class. There are
some useful additions to the standard library, such as a greatly
enhanced unittest module, the argparse module for
parsing command-line options, convenient ordered-dictionary and
Counter classes in the collections module, and many
other improvements.
Python 2.7 is planned to be the last of the 2.x releases, so we worked
on making it a good release for the long term. To help with porting
to Python 3, several new features from the Python 3.x series have been
included in 2.7.
This article doesn’t attempt to provide a complete specification of
the new features, but instead provides a convenient overview. For
full details, you should refer to the documentation for Python 2.7 at
http://docs.python.org. If you want to understand the rationale for
the design and implementation, refer to the PEP for a particular new
feature or the issue on http://bugs.python.org in which a change was
discussed. Whenever possible, “What’s New in Python” links to the
bug/patch item for each change.
Python 2.7 is intended to be the last major release in the 2.x series.
The Python maintainers are planning to focus their future efforts on
the Python 3.x series.
This means that 2.7 will remain in place for a long time, running
production systems that have not been ported to Python 3.x.
Two consequences of the long-term significance of 2.7 are:
It’s very likely the 2.7 release will have a longer period of
maintenance compared to earlier 2.x versions. Python 2.7 will
continue to be maintained while the transition to 3.x continues, and
the developers are planning to support Python 2.7 with bug-fix
releases beyond the typical two years.
A policy decision was made to silence warnings only of interest to
developers. DeprecationWarning and its
descendants are now ignored unless otherwise requested, preventing
users from seeing warnings triggered by an application. This change
was also made in the branch that will become Python 3.2. (Discussed
on stdlib-sig and carried out in issue 7319.)
In previous releases, DeprecationWarning messages were
enabled by default, providing Python developers with a clear
indication of where their code may break in a future major version
of Python.
However, there are increasingly many users of Python-based
applications who are not directly involved in the development of
those applications. DeprecationWarning messages are
irrelevant to such users, making them worry about an application
that’s actually working correctly and burdening application developers
with responding to these concerns.
You can re-enable display of DeprecationWarning messages by
running Python with the -Wdefault (short form:
-Wd) switch, or by setting the PYTHONWARNINGS
environment variable to "default" (or "d") before running
Python. Python code can also re-enable them
by calling warnings.simplefilter('default').
Much as Python 2.6 incorporated features from Python 3.0,
version 2.7 incorporates some of the new features
in Python 3.1. The 2.x series continues to provide tools
for migrating to the 3.x series.
A partial list of 3.1 features that were backported to 2.7:
The syntax for set literals ({1,2,3} is a mutable set).
Dictionary and set comprehensions ({i:i*2foriinrange(3)}).
Multiple context managers in a single with statement.
A new version of the io library, rewritten in C for performance.
The repr() of a float x is shorter in many cases: it’s now
based on the shortest decimal string that’s guaranteed to round back
to x. As in previous versions of Python, it’s guaranteed that
float(repr(x)) recovers x.
Float-to-string and string-to-float conversions are correctly rounded.
The round() function is also now correctly rounded.
The PyCapsule type, used to provide a C API for extension modules.
operator.isCallable() and operator.sequenceIncludes(),
which are not supported in 3.x, now trigger warnings.
The -3 switch now automatically
enables the -Qwarn switch that causes warnings
about using classic division with integers and long integers.
PEP 372: Adding an Ordered Dictionary to collections¶
Regular Python dictionaries iterate over key/value pairs in arbitrary order.
Over the years, a number of authors have written alternative implementations
that remember the order that the keys were originally inserted. Based on
the experiences from those implementations, 2.7 introduces a new
OrderedDict class in the collections module.
The OrderedDict API provides the same interface as regular
dictionaries but iterates over keys and values in a guaranteed order
depending on when a key was first inserted:
The popitem() method has an optional last
argument that defaults to True. If last is True, the most recently
added key is returned and removed; if it’s False, the
oldest key is selected:
Comparing two ordered dictionaries checks both the keys and values,
and requires that the insertion order was the same:
>>> od1=OrderedDict([('first',1),... ('second',2),... ('third',3)])>>> od2=OrderedDict([('third',3),... ('first',1),... ('second',2)])>>> od1==od2False>>> # Move 'third' key to the end>>> delod2['third'];od2['third']=3>>> od1==od2True
Comparing an OrderedDict with a regular dictionary
ignores the insertion order and just compares the keys and values.
How does the OrderedDict work? It maintains a
doubly-linked list of keys, appending new keys to the list as they’re inserted.
A secondary dictionary maps keys to their corresponding list node, so
deletion doesn’t have to traverse the entire linked list and therefore
remains O(1).
The standard library now supports use of ordered dictionaries in several
modules.
The ConfigParser module uses them by default, meaning that
configuration files can now be read, modified, and then written back
in their original order.
The _asdict() method for
collections.namedtuple() now returns an ordered dictionary with the
values appearing in the same order as the underlying tuple indices.
The json module’s JSONDecoder class
constructor was extended with an object_pairs_hook parameter to
allow OrderedDict instances to be built by the decoder.
Support was also added for third-party tools like
PyYAML.
See also
PEP 372 - Adding an ordered dictionary to collections
PEP written by Armin Ronacher and Raymond Hettinger;
implemented by Raymond Hettinger.
PEP 378: Format Specifier for Thousands Separator¶
To make program output more readable, it can be useful to add
separators to large numbers, rendering them as
18,446,744,073,709,551,616 instead of 18446744073709551616.
The fully general solution for doing this is the locale module,
which can use different separators (”,” in North America, ”.” in
Europe) and different grouping sizes, but locale is complicated
to use and unsuitable for multi-threaded applications where different
threads are producing output for different locales.
Therefore, a simple comma-grouping mechanism has been added to the
mini-language used by the str.format() method. When
formatting a floating-point number, simply include a comma between the
width and the precision:
This mechanism is not adaptable at all; commas are always used as the
separator and the grouping is always into three-digit groups. The
comma-formatting mechanism isn’t as general as the locale
module, but it’s easier to use.
See also
PEP 378 - Format Specifier for Thousands Separator
PEP written by Raymond Hettinger; implemented by Eric Smith.
PEP 389: The argparse Module for Parsing Command Lines¶
The argparse module for parsing command-line arguments was
added as a more powerful replacement for the
optparse module.
This means Python now supports three different modules for parsing
command-line arguments: getopt, optparse, and
argparse. The getopt module closely resembles the C
library’s getopt() function, so it remains useful if you’re writing a
Python prototype that will eventually be rewritten in C.
optparse becomes redundant, but there are no plans to remove it
because there are many scripts still using it, and there’s no
automated way to update these scripts. (Making the argparse
API consistent with optparse‘s interface was discussed but
rejected as too messy and difficult.)
In short, if you’re writing a new script and don’t need to worry
about compatibility with earlier versions of Python, use
argparse instead of optparse.
Here’s an example:
importargparseparser=argparse.ArgumentParser(description='Command-line example.')# Add optional switchesparser.add_argument('-v',action='store_true',dest='is_verbose',help='produce verbose output')parser.add_argument('-o',action='store',dest='output',metavar='FILE',help='direct output to FILE instead of stdout')parser.add_argument('-C',action='store',type=int,dest='context',metavar='NUM',default=0,help='display NUM lines of added context')# Allow any number of additional arguments.parser.add_argument(nargs='*',action='store',dest='inputs',help='input filenames (default is stdin)')args=parser.parse_args()printargs.__dict__
Unless you override it, -h and --help switches
are automatically added, and produce neatly formatted output:
argparse has much fancier validation than optparse; you
can specify an exact number of arguments as an integer, 0 or more
arguments by passing '*', 1 or more by passing '+', or an
optional argument with '?'. A top-level parser can contain
sub-parsers to define subcommands that have different sets of
switches, as in svncommit, svncheckout, etc. You can
specify an argument’s type as FileType, which will
automatically open files for you and understands that '-' means
standard input or output.
Part of the Python documentation, describing how to convert
code that uses optparse.
PEP 389 - argparse - New Command Line Parsing Module
PEP written and implemented by Steven Bethard.
PEP 391: Dictionary-Based Configuration For Logging¶
The logging module is very flexible; applications can define
a tree of logging subsystems, and each logger in this tree can filter
out certain messages, format them differently, and direct messages to
a varying number of handlers.
All this flexibility can require a lot of configuration. You can
write Python statements to create objects and set their properties,
but a complex set-up requires verbose but boring code.
logging also supports a fileConfig()
function that parses a file, but the file format doesn’t support
configuring filters, and it’s messier to generate programmatically.
Python 2.7 adds a dictConfig() function that
uses a dictionary to configure logging. There are many ways to
produce a dictionary from different sources: construct one with code;
parse a file containing JSON; or use a YAML parsing library if one is
installed.
The following example configures two loggers, the root logger and a
logger named “network”. Messages sent to the root logger will be
sent to the system log using the syslog protocol, and messages
to the “network” logger will be written to a network.log file
that will be rotated once the log reaches 1Mb.
importloggingimportlogging.configconfigdict={'version':1,# Configuration schema in use; must be 1 for now'formatters':{'standard':{'format':('%(asctime)s %(name)-15s ''%(levelname)-8s %(message)s')}},'handlers':{'netlog':{'backupCount':10,'class':'logging.handlers.RotatingFileHandler','filename':'/logs/network.log','formatter':'standard','level':'INFO','maxBytes':1024*1024},'syslog':{'class':'logging.handlers.SysLogHandler','formatter':'standard','level':'ERROR'}},# Specify all the subordinate loggers'loggers':{'network':{'handlers':['netlog']}},# Specify properties of the root logger'root':{'handlers':['syslog']},}# Set up configurationlogging.config.dictConfig(configdict)# As an example, log two error messageslogger=logging.getLogger('/')logger.error('Database not found')netlogger=logging.getLogger('network')netlogger.error('Connection failed')
Three smaller enhancements to the logging module, all
implemented by Vinay Sajip, are:
The SysLogHandler class now supports
syslogging over TCP. The constructor has a socktype parameter
giving the type of socket to use, either socket.SOCK_DGRAM
for UDP or socket.SOCK_STREAM for TCP. The default
protocol remains UDP.
Logger instances gained a getChild() method that retrieves a
descendant logger using a relative path. For example,
once you retrieve a logger by doing log=getLogger('app'),
calling log.getChild('network.listen') is equivalent to
getLogger('app.network.listen').
The LoggerAdapter class gained a isEnabledFor() method
that takes a level and returns whether the underlying logger would
process a message of that level of importance.
See also
PEP 391 - Dictionary-Based Configuration For Logging
The dictionary methods keys(), values(), and items()
are different in Python 3.x. They return an object called a view
instead of a fully materialized list.
It’s not possible to change the return values of keys(),
values(), and items() in Python 2.7 because too much code
would break. Instead the 3.x versions were added under the new names
viewkeys(), viewvalues(), and viewitems().
memoryview objects allow modifying the underlying object if
it’s a mutable object.
>>> m2[0]=75Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: cannot modify read-only memory>>> b=bytearray(string.letters)# Creating a mutable object>>> bbytearray(b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')>>> mb=memoryview(b)>>> mb[0]='*'# Assign to view, changing the bytearray.>>> b[0:5]# The bytearray has been changed.bytearray(b'*bcde')>>>
Some smaller changes made to the core Python language are:
The syntax for set literals has been backported from Python 3.x.
Curly brackets are used to surround the contents of the resulting
mutable set; set literals are
distinguished from dictionaries by not containing colons and values.
{} continues to represent an empty dictionary; use
set() for an empty set.
Dictionary and set comprehensions are another feature backported from
3.x, generalizing list/generator comprehensions to use
the literal syntax for sets and dictionaries.
The with statement can now use multiple context managers
in one statement. Context managers are processed from left to right
and each one is treated as beginning a new with statement.
This means that:
withA()asa,B()asb:...suiteofstatements...
is equivalent to:
withA()asa:withB()asb:...suiteofstatements...
The contextlib.nested() function provides a very similar
function, so it’s no longer necessary and has been deprecated.
Conversions between floating-point numbers and strings are
now correctly rounded on most platforms. These conversions occur
in many different places: str() on
floats and complex numbers; the float and complex
constructors;
numeric formatting; serializing and
deserializing floats and complex numbers using the
marshal, pickle
and json modules;
parsing of float and imaginary literals in Python code;
and Decimal-to-float conversion.
Related to this, the repr() of a floating-point number x
now returns a result based on the shortest decimal string that’s
guaranteed to round back to x under correct rounding (with
round-half-to-even rounding mode). Previously it gave a string
based on rounding x to 17 decimal digits.
The rounding library responsible for this improvement works on
Windows and on Unix platforms using the gcc, icc, or suncc
compilers. There may be a small number of platforms where correct
operation of this code cannot be guaranteed, so the code is not
used on such systems. You can find out which code is being used
by checking sys.float_repr_style, which will be short
if the new code is in use and legacy if it isn’t.
Implemented by Eric Smith and Mark Dickinson, using David Gay’s
dtoa.c library; issue 7117.
Conversions from long integers and regular integers to floating
point now round differently, returning the floating-point number
closest to the number. This doesn’t matter for small integers that
can be converted exactly, but for large numbers that will
unavoidably lose precision, Python 2.7 now approximates more
closely. For example, Python 2.6 computed the following:
Integer division is also more accurate in its rounding behaviours. (Also
implemented by Mark Dickinson; issue 1811.)
Implicit coercion for complex numbers has been removed; the interpreter
will no longer ever attempt to call a __coerce__() method on complex
objects. (Removed by Meador Inge and Mark Dickinson; issue 5211.)
The str.format() method now supports automatic numbering of the replacement
fields. This makes using str.format() more closely resemble using
%s formatting:
The auto-numbering takes the fields from left to right, so the first {...}
specifier will use the first argument to str.format(), the next
specifier will use the next argument, and so on. You can’t mix auto-numbering
and explicit numbering – either number all of your specifier fields or none
of them – but you can mix auto-numbering and named fields, as in the second
example above. (Contributed by Eric Smith; issue 5237.)
Complex numbers now correctly support usage with format(),
and default to being right-aligned.
Specifying a precision or comma-separation applies to both the real
and imaginary parts of the number, but a specified field width and
alignment is applied to the whole of the resulting 1.5+3j
output. (Contributed by Eric Smith; issue 1588 and issue 7988.)
The ‘F’ format code now always formats its output using uppercase characters,
so it will now produce ‘INF’ and ‘NAN’.
(Contributed by Eric Smith; issue 3382.)
A low-level change: the object.__format__() method now triggers
a PendingDeprecationWarning if it’s passed a format string,
because the __format__() method for object converts
the object to a string representation and formats that. Previously
the method silently applied the format string to the string
representation, but that could hide mistakes in Python code. If
you’re supplying formatting information such as an alignment or
precision, presumably you’re expecting the formatting to be applied
in some object-specific way. (Fixed by Eric Smith; issue 7994.)
The int() and long() types gained a bit_length
method that returns the number of bits necessary to represent
its argument in binary:
(Contributed by Fredrik Johansson and Victor Stinner; issue 3439.)
The import statement will no longer try an absolute import
if a relative import (e.g. from.osimportsep) fails. This
fixes a bug, but could possibly break certain import
statements that were only working by accident. (Fixed by Meador Inge;
issue 7902.)
It’s now possible for a subclass of the built-in unicode type
to override the __unicode__() method. (Implemented by
Victor Stinner; issue 1583863.)
When using @classmethod and @staticmethod to wrap
methods as class or static methods, the wrapper object now
exposes the wrapped function as their __func__ attribute.
(Contributed by Amaury Forgeot d’Arc, after a suggestion by
George Sakkis; issue 5982.)
When a restricted set of attributes were set using __slots__,
deleting an unset attribute would not raise AttributeError
as you would expect. Fixed by Benjamin Peterson; issue 7604.)
Two new encodings are now supported: “cp720”, used primarily for
Arabic text; and “cp858”, a variant of CP 850 that adds the euro
symbol. (CP720 contributed by Alexander Belchenko and Amaury
Forgeot d’Arc in issue 1616979; CP858 contributed by Tim Hatch in
issue 8016.)
The file object will now set the filename attribute
on the IOError exception when trying to open a directory
on POSIX platforms (noted by Jan Kaliszewski; issue 4764), and
now explicitly checks for and forbids writing to read-only file objects
instead of trusting the C library to catch and report the error
(fixed by Stefan Krah; issue 5677).
The Python tokenizer now translates line endings itself, so the
compile() built-in function now accepts code using any
line-ending convention. Additionally, it no longer requires that the
code end in a newline.
Extra parentheses in function definitions are illegal in Python 3.x,
meaning that you get a syntax error from deff((x)):pass. In
Python3-warning mode, Python 2.7 will now warn about this odd usage.
(Noted by James Lingard; issue 7362.)
It’s now possible to create weak references to old-style class
objects. New-style classes were always weak-referenceable. (Fixed
by Antoine Pitrou; issue 8268.)
When a module object is garbage-collected, the module’s dictionary is
now only cleared if no one else is holding a reference to the
dictionary (issue 7140).
A new environment variable, PYTHONWARNINGS,
allows controlling warnings. It should be set to a string
containing warning settings, equivalent to those
used with the -W switch, separated by commas.
(Contributed by Brian Curtin; issue 7301.)
For example, the following setting will print warnings every time
they occur, but turn warnings from the Cookie module into an
error. (The exact syntax for setting an environment variable varies
across operating systems and shells.)
A new opcode was added to perform the initial setup for
with statements, looking up the __enter__() and
__exit__() methods. (Contributed by Benjamin Peterson.)
The garbage collector now performs better for one common usage
pattern: when many objects are being allocated without deallocating
any of them. This would previously take quadratic
time for garbage collection, but now the number of full garbage collections
is reduced as the number of objects on the heap grows.
The new logic only performs a full garbage collection pass when
the middle generation has been collected 10 times and when the
number of survivor objects from the middle generation exceeds 10% of
the number of objects in the oldest generation. (Suggested by Martin
von Löwis and implemented by Antoine Pitrou; issue 4074.)
The garbage collector tries to avoid tracking simple containers
which can’t be part of a cycle. In Python 2.7, this is now true for
tuples and dicts containing atomic types (such as ints, strings,
etc.). Transitively, a dict containing tuples of atomic types won’t
be tracked either. This helps reduce the cost of each
garbage collection by decreasing the number of objects to be
considered and traversed by the collector.
(Contributed by Antoine Pitrou; issue 4688.)
Long integers are now stored internally either in base 2**15 or in base
2**30, the base being determined at build time. Previously, they
were always stored in base 2**15. Using base 2**30 gives
significant performance improvements on 64-bit machines, but
benchmark results on 32-bit machines have been mixed. Therefore,
the default is to use base 2**30 on 64-bit machines and base 2**15
on 32-bit machines; on Unix, there’s a new configure option
--enable-big-digits that can be used to override this default.
Apart from the performance improvements this change should be
invisible to end users, with one exception: for testing and
debugging purposes there’s a new structseq sys.long_info that
provides information about the internal format, giving the number of
bits per digit and the size in bytes of the C type used to store
each digit:
Another set of changes made long objects a few bytes smaller: 2 bytes
smaller on 32-bit systems and 6 bytes on 64-bit.
(Contributed by Mark Dickinson; issue 5260.)
The division algorithm for long integers has been made faster
by tightening the inner loop, doing shifts instead of multiplications,
and fixing an unnecessary extra iteration.
Various benchmarks show speedups of between 50% and 150% for long
integer divisions and modulo operations.
(Contributed by Mark Dickinson; issue 5512.)
Bitwise operations are also significantly faster (initial patch by
Gregory Smith; issue 1087418).
The implementation of % checks for the left-side operand being
a Python string and special-cases it; this results in a 1-3%
performance increase for applications that frequently use %
with strings, such as templating libraries.
(Implemented by Collin Winter; issue 5176.)
List comprehensions with an if condition are compiled into
faster bytecode. (Patch by Antoine Pitrou, back-ported to 2.7
by Jeffrey Yasskin; issue 4715.)
Converting an integer or long integer to a decimal string was made
faster by special-casing base 10 instead of using a generalized
conversion function that supports arbitrary bases.
(Patch by Gawain Bolton; issue 6713.)
The split(), replace(), rindex(),
rpartition(), and rsplit() methods of string-like types
(strings, Unicode strings, and bytearray objects) now use a
fast reverse-search algorithm instead of a character-by-character
scan. This is sometimes faster by a factor of 10. (Added by
Florent Xicluna; issue 7462 and issue 7622.)
The pickle and cPickle modules now automatically
intern the strings used for attribute names, reducing memory usage
of the objects resulting from unpickling. (Contributed by Jake
McGuire; issue 5084.)
The cPickle module now special-cases dictionaries,
nearly halving the time required to pickle them.
(Contributed by Collin Winter; issue 5670.)
As in every release, Python’s standard library received a number of
enhancements and bug fixes. Here’s a partial list of the most notable
changes, sorted alphabetically by module name. Consult the
Misc/NEWS file in the source tree for a more complete list of
changes, or look through the Subversion logs for all the details.
The bdb module’s base debugging class Bdb
gained a feature for skipping modules. The constructor
now takes an iterable containing glob-style patterns such as
django.*; the debugger will not step into stack frames
from a module that matches one of these patterns.
(Contributed by Maru Newby after a suggestion by
Senthil Kumaran; issue 5142.)
The binascii module now supports the buffer API, so it can be
used with memoryview instances and other similar buffer objects.
(Backported from 3.x by Florent Xicluna; issue 7703.)
Updated module: the bsddb module has been updated from 4.7.2devel9
to version 4.8.4 of
the pybsddb package.
The new version features better Python 3.x compatibility, various bug fixes,
and adds several new BerkeleyDB flags and methods.
(Updated by Jesús Cea Avión; issue 8156. The pybsddb
changelog can be read at http://hg.jcea.es/pybsddb/file/tip/ChangeLog.)
The bz2 module’s BZ2File now supports the context
management protocol, so you can write withbz2.BZ2File(...)asf:.
(Contributed by Hagen Fürstenau; issue 3860.)
New class: the Counter class in the collections
module is useful for tallying data. Counter instances
behave mostly like dictionaries but return zero for missing keys instead of
raising a KeyError:
There are three additional Counter methods.
most_common() returns the N most common
elements and their counts. elements()
returns an iterator over the contained elements, repeating each
element as many times as its count.
subtract() takes an iterable and
subtracts one for each element instead of adding; if the argument is
a dictionary or another Counter, the counts are
subtracted.
New method: The deque data type now has a
count() method that returns the number of
contained elements equal to the supplied argument x, and a
reverse() method that reverses the elements
of the deque in-place. deque also exposes its maximum
length as the read-only maxlen attribute.
(Both features added by Raymond Hettinger.)
The namedtuple class now has an optional rename parameter.
If rename is true, field names that are invalid because they’ve
been repeated or aren’t legal Python identifiers will be
renamed to legal names that are derived from the field’s
position within the list of fields:
Finally, the Mapping abstract base class now
returns NotImplemented if a mapping is compared to
another type that isn’t a Mapping.
(Fixed by Daniel Stutzbach; issue 8729.)
Constructors for the parsing classes in the ConfigParser module now
take a allow_no_value parameter, defaulting to false; if true,
options without values will be allowed. For example:
>>> importConfigParser,StringIO>>> sample_config="""... [mysqld]... user = mysql... pid-file = /var/run/mysqld/mysqld.pid... skip-bdb... """>>> config=ConfigParser.RawConfigParser(allow_no_value=True)>>> config.readfp(StringIO.StringIO(sample_config))>>> config.get('mysqld','user')'mysql'>>> printconfig.get('mysqld','skip-bdb')None>>> printconfig.get('mysqld','unknown')Traceback (most recent call last):...NoOptionError: No option 'unknown' in section: 'mysqld'
Deprecated function: contextlib.nested(), which allows
handling more than one context manager with a single with
statement, has been deprecated, because the with statement
now supports multiple context managers.
The cookielib module now ignores cookies that have an invalid
version field, one that doesn’t contain an integer value. (Fixed by
John J. Lee; issue 3924.)
The copy module’s deepcopy() function will now
correctly copy bound instance methods. (Implemented by
Robert Collins; issue 1515.)
The ctypes module now always converts None to a C NULL
pointer for arguments declared as pointers. (Changed by Thomas
Heller; issue 4606.) The underlying libffi library has been updated to version
3.0.9, containing various fixes for different platforms. (Updated
by Matthias Klose; issue 8142.)
New method: the datetime module’s timedelta class
gained a total_seconds() method that returns the
number of seconds in the duration. (Contributed by Brian Quinlan; issue 5788.)
New method: the Decimal class gained a
from_float() class method that performs an exact
conversion of a floating-point number to a Decimal.
This exact conversion strives for the
closest decimal approximation to the floating-point representation’s value;
the resulting decimal value will therefore still include the inaccuracy,
if any.
For example, Decimal.from_float(0.1) returns
Decimal('0.1000000000000000055511151231257827021181583404541015625').
(Implemented by Raymond Hettinger; issue 4796.)
Comparing instances of Decimal with floating-point
numbers now produces sensible results based on the numeric values
of the operands. Previously such comparisons would fall back to
Python’s default rules for comparing objects, which produced arbitrary
results based on their type. Note that you still cannot combine
Decimal and floating-point in other operations such as addition,
since you should be explicitly choosing how to convert between float and
Decimal.
(Fixed by Mark Dickinson; issue 2531.)
The constructor for Decimal now accepts
floating-point numbers (added by Raymond Hettinger; issue 8257)
and non-European Unicode characters such as Arabic-Indic digits
(contributed by Mark Dickinson; issue 6595).
Most of the methods of the Context class now accept integers
as well as Decimal instances; the only exceptions are the
canonical() and is_canonical()
methods. (Patch by Juan José Conti; issue 7633.)
When using Decimal instances with a string’s
format() method, the default alignment was previously
left-alignment. This has been changed to right-alignment, which is
more sensible for numeric types. (Changed by Mark Dickinson; issue 6857.)
Comparisons involving a signaling NaN value (or sNAN) now signal
InvalidOperation instead of silently returning a true or
false value depending on the comparison operator. Quiet NaN values
(or NaN) are now hashable. (Fixed by Mark Dickinson;
issue 7279.)
The difflib module now produces output that is more
compatible with modern diff/patch tools
through one small change, using a tab character instead of spaces as
a separator in the header giving the filename. (Fixed by Anatoly
Techtonik; issue 7585.)
The Distutils sdist command now always regenerates the
MANIFEST file, since even if the MANIFEST.in or
setup.py files haven’t been modified, the user might have
created some new files that should be included.
(Fixed by Tarek Ziadé; issue 8688.)
The doctest module’s IGNORE_EXCEPTION_DETAIL flag
will now ignore the name of the module containing the exception
being tested. (Patch by Lennart Regebro; issue 7490.)
The email module’s Message class will
now accept a Unicode-valued payload, automatically converting the
payload to the encoding specified by output_charset.
(Added by R. David Murray; issue 1368247.)
The Fraction class now accepts a single float or
Decimal instance, or two rational numbers, as
arguments to its constructor. (Implemented by Mark Dickinson;
rationals added in issue 5812, and float/decimal in
issue 8294.)
Ordering comparisons (<, <=, >, >=) between
fractions and complex numbers now raise a TypeError.
This fixes an oversight, making the Fraction match the other
numeric types.
New class: FTP_TLS in
the ftplib module provides secure FTP
connections using TLS encapsulation of authentication as well as
subsequent control and data transfers.
(Contributed by Giampaolo Rodola; issue 2054.)
The storbinary() method for binary uploads can now restart
uploads thanks to an added rest parameter (patch by Pablo Mouzo;
issue 6845.)
New class decorator: total_ordering() in the functools
module takes a class that defines an __eq__() method and one of
__lt__(), __le__(), __gt__(), or __ge__(),
and generates the missing comparison methods. Since the
__cmp__() method is being deprecated in Python 3.x,
this decorator makes it easier to define ordered classes.
(Added by Raymond Hettinger; issue 5479.)
New function: cmp_to_key() will take an old-style comparison
function that expects two arguments and return a new callable that
can be used as the key parameter to functions such as
sorted(), min() and max(), etc. The primary
intended use is to help with making code compatible with Python 3.x.
(Added by Raymond Hettinger.)
New function: the gc module’s is_tracked() returns
true if a given instance is tracked by the garbage collector, false
otherwise. (Contributed by Antoine Pitrou; issue 4688.)
The gzip module’s GzipFile now supports the context
management protocol, so you can write withgzip.GzipFile(...)asf:
(contributed by Hagen Fürstenau; issue 3860), and it now implements
the io.BufferedIOBase ABC, so you can wrap it with
io.BufferedReader for faster processing
(contributed by Nir Aides; issue 7471).
It’s also now possible to override the modification time
recorded in a gzipped file by providing an optional timestamp to
the constructor. (Contributed by Jacques Frechet; issue 4272.)
Files in gzip format can be padded with trailing zero bytes; the
gzip module will now consume these trailing bytes. (Fixed by
Tadek Pietraszek and Brian Curtin; issue 2846.)
New attribute: the hashlib module now has an algorithms
attribute containing a tuple naming the supported algorithms.
In Python 2.7, hashlib.algorithms contains
('md5','sha1','sha224','sha256','sha384','sha512').
(Contributed by Carl Chenet; issue 7418.)
The default HTTPResponse class used by the httplib module now
supports buffering, resulting in much faster reading of HTTP responses.
(Contributed by Kristján Valur Jónsson; issue 4879.)
The HTTPConnection and HTTPSConnection classes
now support a source_address parameter, a (host,port) 2-tuple
giving the source address that will be used for the connection.
(Contributed by Eldon Ziegler; issue 3972.)
The ihooks module now supports relative imports. Note that
ihooks is an older module for customizing imports,
superseded by the imputil module added in Python 2.0.
(Relative import support added by Neil Schemenauer.)
The imaplib module now supports IPv6 addresses.
(Contributed by Derek Morr; issue 1655.)
New function: the inspect module’s getcallargs()
takes a callable and its positional and keyword arguments,
and figures out which of the callable’s parameters will receive each argument,
returning a dictionary mapping argument names to their values. For example:
Updated module: The io library has been upgraded to the version shipped with
Python 3.1. For 3.1, the I/O library was entirely rewritten in C
and is 2 to 20 times faster depending on the task being performed. The
original Python version was renamed to the _pyio module.
One minor resulting change: the io.TextIOBase class now
has an errors attribute giving the error setting
used for encoding and decoding errors (one of 'strict', 'replace',
'ignore').
The io.FileIO class now raises an OSError when passed
an invalid file descriptor. (Implemented by Benjamin Peterson;
issue 4991.) The truncate() method now preserves the
file position; previously it would change the file position to the
end of the new file. (Fixed by Pascal Chambon; issue 6939.)
New function: itertools.compress(data,selectors) takes two
iterators. Elements of data are returned if the corresponding
value in selectors is true:
New function: itertools.combinations_with_replacement(iter,r)
returns all the possible r-length combinations of elements from the
iterable iter. Unlike combinations(), individual elements
can be repeated in the generated combinations:
Note that elements are treated as unique depending on their position
in the input, not their actual values.
The itertools.count() function now has a step argument that
allows incrementing by values other than 1. count() also
now allows keyword arguments, and using non-integer values such as
floats or Decimal instances. (Implemented by Raymond
Hettinger; issue 5032.)
Updated module: The json module was upgraded to version 2.0.9 of the
simplejson package, which includes a C extension that makes
encoding and decoding faster.
(Contributed by Bob Ippolito; issue 4136.)
To support the new collections.OrderedDict type, json.load()
now has an optional object_pairs_hook parameter that will be called
with any object literal that decodes to a list of pairs.
(Contributed by Raymond Hettinger; issue 5381.)
The mailbox module’s Maildir class now records the
timestamp on the directories it reads, and only re-reads them if the
modification time has subsequently changed. This improves
performance by avoiding unneeded directory scans. (Fixed by
A.M. Kuchling and Antoine Pitrou; issue 1607951, issue 6896.)
New functions: the math module gained
erf() and erfc() for the error function and the complementary error function,
expm1() which computes e**x-1 with more precision than
using exp() and subtracting 1,
gamma() for the Gamma function, and
lgamma() for the natural log of the Gamma function.
(Contributed by Mark Dickinson and nirinA raseliarison; issue 3366.)
The multiprocessing module’s Manager* classes
can now be passed a callable that will be called whenever
a subprocess is started, along with a set of arguments that will be
passed to the callable.
(Contributed by lekma; issue 5585.)
The Pool class, which controls a pool of worker processes,
now has an optional maxtasksperchild parameter. Worker processes
will perform the specified number of tasks and then exit, causing the
Pool to start a new worker. This is useful if tasks may leak
memory or other resources, or if some tasks will cause the worker to
become very large.
(Contributed by Charles Cazabon; issue 6963.)
The nntplib module now supports IPv6 addresses.
(Contributed by Derek Morr; issue 1664.)
New functions: the os module wraps the following POSIX system
calls: getresgid() and getresuid(), which return the
real, effective, and saved GIDs and UIDs;
setresgid() and setresuid(), which set
real, effective, and saved GIDs and UIDs to new values;
initgroups(), which initialize the group access list
for the current process. (GID/UID functions
contributed by Travis H.; issue 6508. Support for initgroups added
by Jean-Paul Calderone; issue 7333.)
The os.fork() function now re-initializes the import lock in
the child process; this fixes problems on Solaris when fork()
is called from a thread. (Fixed by Zsolt Cserna; issue 7242.)
In the os.path module, the normpath() and
abspath() functions now preserve Unicode; if their input path
is a Unicode string, the return value is also a Unicode string.
(normpath() fixed by Matt Giuca in issue 5827;
abspath() fixed by Ezio Melotti in issue 3426.)
The pydoc module now has help for the various symbols that Python
uses. You can now do help('<<') or help('@'), for example.
(Contributed by David Laban; issue 4739.)
The re module’s split(), sub(), and subn()
now accept an optional flags argument, for consistency with the
other functions in the module. (Added by Gregory P. Smith.)
New function: run_path() in the runpy module
will execute the code at a provided path argument. path can be
the path of a Python source file (example.py), a compiled
bytecode file (example.pyc), a directory
(./package/), or a zip archive (example.zip). If a
directory or zip path is provided, it will be added to the front of
sys.path and the module __main__ will be imported. It’s
expected that the directory or zip contains a __main__.py;
if it doesn’t, some other __main__.py might be imported from
a location later in sys.path. This makes more of the machinery
of runpy available to scripts that want to mimic the way
Python’s command line processes an explicit path name.
(Added by Nick Coghlan; issue 6816.)
New function: in the shutil module, make_archive()
takes a filename, archive type (zip or tar-format), and a directory
path, and creates an archive containing the directory’s contents.
(Added by Tarek Ziadé.)
shutil‘s copyfile() and copytree()
functions now raise a SpecialFileError exception when
asked to copy a named pipe. Previously the code would treat
named pipes like a regular file by opening them for reading, and
this would block indefinitely. (Fixed by Antoine Pitrou; issue 3002.)
The signal module no longer re-installs the signal handler
unless this is truly necessary, which fixes a bug that could make it
impossible to catch the EINTR signal robustly. (Fixed by
Charles-Francois Natali; issue 8354.)
New functions: in the site module, three new functions
return various site- and user-specific paths.
getsitepackages() returns a list containing all
global site-packages directories,
getusersitepackages() returns the path of the user’s
site-packages directory, and
getuserbase() returns the value of the USER_BASE
environment variable, giving the path to a directory that can be used
to store data.
(Contributed by Tarek Ziadé; issue 6693.)
The site module now reports exceptions occurring
when the sitecustomize module is imported, and will no longer
catch and swallow the KeyboardInterrupt exception. (Fixed by
Victor Stinner; issue 3137.)
The create_connection() function
gained a source_address parameter, a (host,port) 2-tuple
giving the source address that will be used for the connection.
(Contributed by Eldon Ziegler; issue 3972.)
The SocketServer module’s TCPServer class now
supports socket timeouts and disabling the Nagle algorithm.
The disable_nagle_algorithm class attribute
defaults to False; if overridden to be True,
new request connections will have the TCP_NODELAY option set to
prevent buffering many small sends into a single TCP packet.
The timeout class attribute can hold
a timeout in seconds that will be applied to the request socket; if
no request is received within that time, handle_timeout()
will be called and handle_request() will return.
(Contributed by Kristján Valur Jónsson; issue 6192 and issue 6267.)
Updated module: the sqlite3 module has been updated to
version 2.6.0 of the pysqlite package. Version 2.6.0 includes a number of bugfixes, and adds
the ability to load SQLite extensions from shared libraries.
Call the enable_load_extension(True) method to enable extensions,
and then call load_extension() to load a particular shared library.
(Updated by Gerhard Häring.)
The ssl module’s ssl.SSLSocket objects now support the
buffer API, which fixed a test suite failure (fix by Antoine Pitrou;
issue 7133) and automatically set
OpenSSL’s SSL_MODE_AUTO_RETRY, which will prevent an error
code being returned from recv() operations that trigger an SSL
renegotiation (fix by Antoine Pitrou; issue 8222).
The ssl.wrap_socket() constructor function now takes a
ciphers argument that’s a string listing the encryption algorithms
to be allowed; the format of the string is described
in the OpenSSL documentation.
(Added by Antoine Pitrou; issue 8322.)
Another change makes the extension load all of OpenSSL’s ciphers and
digest algorithms so that they’re all available. Some SSL
certificates couldn’t be verified, reporting an “unknown algorithm”
error. (Reported by Beda Kosata, and fixed by Antoine Pitrou;
issue 8484.)
The struct module will no longer silently ignore overflow
errors when a value is too large for a particular integer format
code (one of bBhHiIlLqQ); it now always raises a
struct.error exception. (Changed by Mark Dickinson;
issue 1523.) The pack() function will also
attempt to use __index__() to convert and pack non-integers
before trying the __int__() method or reporting an error.
(Changed by Mark Dickinson; issue 8300.)
New function: the subprocess module’s
check_output() runs a command with a specified set of arguments
and returns the command’s output as a string when the command runs without
error, or raises a CalledProcessError exception otherwise.
>>> subprocess.check_output(['df','-h','.'])'Filesystem Size Used Avail Capacity Mounted on\n/dev/disk0s2 52G 49G 3.0G 94% /\n'>>> subprocess.check_output(['df','-h','/bogus']) ...subprocess.CalledProcessError: Command '['df', '-h', '/bogus']' returned non-zero exit status 1
(Contributed by Gregory P. Smith.)
The subprocess module will now retry its internal system calls
on receiving an EINTR signal. (Reported by several people; final
patch by Gregory P. Smith in issue 1068268.)
New function: is_declared_global() in the symtable module
returns true for variables that are explicitly declared to be global,
false for ones that are implicitly global.
(Contributed by Jeremy Hylton.)
The syslog module will now use the value of sys.argv[0] as the
identifier instead of the previous default value of 'python'.
(Changed by Sean Reifschneider; issue 8451.)
The sys.version_info value is now a named tuple, with attributes
named major, minor, micro,
releaselevel, and serial. (Contributed by Ross
Light; issue 4285.)
sys.getwindowsversion() also returns a named tuple,
with attributes named major, minor, build,
platform, service_pack, service_pack_major,
service_pack_minor, suite_mask, and
product_type. (Contributed by Brian Curtin; issue 7766.)
The tarfile module’s default error handling has changed, to
no longer suppress fatal errors. The default error level was previously 0,
which meant that errors would only result in a message being written to the
debug log, but because the debug log is not activated by default,
these errors go unnoticed. The default error level is now 1,
which raises an exception if there’s an error.
(Changed by Lars Gustäbel; issue 7357.)
tarfile now supports filtering the TarInfo
objects being added to a tar file. When you call add(),
you may supply an optional filter argument
that’s a callable. The filter callable will be passed the
TarInfo for every file being added, and can modify and return it.
If the callable returns None, the file will be excluded from the
resulting archive. This is more powerful than the existing
exclude argument, which has therefore been deprecated.
(Added by Lars Gustäbel; issue 6856.)
The TarFile class also now supports the context manager protocol.
(Added by Lars Gustäbel; issue 7232.)
The wait() method of the threading.Event class
now returns the internal flag on exit. This means the method will usually
return true because wait() is supposed to block until the
internal flag becomes true. The return value will only be false if
a timeout was provided and the operation timed out.
(Contributed by Tim Lesher; issue 1674032.)
The Unicode database provided by the unicodedata module is
now used internally to determine which characters are numeric,
whitespace, or represent line breaks. The database also
includes information from the Unihan.txt data file (patch
by Anders Chrigström and Amaury Forgeot d’Arc; issue 1571184)
and has been updated to version 5.2.0 (updated by
Florent Xicluna; issue 8024).
The urlparse module’s urlsplit() now handles
unknown URL schemes in a fashion compliant with RFC 3986: if the
URL is of the form "<something>://...", the text before the
:// is treated as the scheme, even if it’s a made-up scheme that
the module doesn’t know about. This change may break code that
worked around the old behaviour. For example, Python 2.6.4 or 2.5
will return the following:
New class: the WeakSet class in the weakref
module is a set that only holds weak references to its elements; elements
will be removed once there are no references pointing to them.
(Originally implemented in Python 3.x by Raymond Hettinger, and backported
to 2.7 by Michael Foord.)
The ElementTree library, xml.etree, no longer escapes
ampersands and angle brackets when outputting an XML processing
instruction (which looks like <?xml-stylesheethref="#style1"?>)
or comment (which looks like <!--comment-->).
(Patch by Neil Muller; issue 2746.)
The XML-RPC client and server, provided by the xmlrpclib and
SimpleXMLRPCServer modules, have improved performance by
supporting HTTP/1.1 keep-alive and by optionally using gzip encoding
to compress the XML being exchanged. The gzip compression is
controlled by the encode_threshold attribute of
SimpleXMLRPCRequestHandler, which contains a size in bytes;
responses larger than this will be compressed.
(Contributed by Kristján Valur Jónsson; issue 6267.)
The zipfile module’s ZipFile now supports the context
management protocol, so you can write withzipfile.ZipFile(...)asf:.
(Contributed by Brian Curtin; issue 5511.)
zipfile now also supports archiving empty directories and
extracts them correctly. (Fixed by Kuba Wieczorek; issue 4710.)
Reading files out of an archive is faster, and interleaving
read() and readline() now works correctly.
(Contributed by Nir Aides; issue 7610.)
The is_zipfile() function now
accepts a file object, in addition to the path names accepted in earlier
versions. (Contributed by Gabriel Genellina; issue 4756.)
The writestr() method now has an optional compress_type parameter
that lets you override the default compression method specified in the
ZipFile constructor. (Contributed by Ronald Oussoren;
issue 6003.)
Python 3.1 includes the importlib package, a re-implementation
of the logic underlying Python’s import statement.
importlib is useful for implementors of Python interpreters and
to users who wish to write new importers that can participate in the
import process. Python 2.7 doesn’t contain the complete
importlib package, but instead has a tiny subset that contains
a single function, import_module().
import_module(name,package=None) imports a module. name is
a string containing the module or package’s name. It’s possible to do
relative imports by providing a string that begins with a .
character, such as ..utils.errors. For relative imports, the
package argument must be provided and is the name of the package that
will be used as the anchor for
the relative import. import_module() both inserts the imported
module into sys.modules and returns the module object.
Here are some examples:
>>> fromimportlibimportimport_module>>> anydbm=import_module('anydbm')# Standard absolute import>>> anydbm<module 'anydbm' from '/p/python/Lib/anydbm.py'>>>> # Relative import>>> file_util=import_module('..file_util','distutils.command')>>> file_util<module 'distutils.file_util' from '/python/Lib/distutils/file_util.pyc'>
importlib was implemented by Brett Cannon and introduced in
Python 3.1.
The sysconfig module has been pulled out of the Distutils
package, becoming a new top-level module in its own right.
sysconfig provides functions for getting information about
Python’s build process: compiler switches, installation paths, the
platform name, and whether Python is running from its source
directory.
Some of the functions in the module are:
get_config_var() returns variables from Python’s
Makefile and the pyconfig.h file.
get_config_vars() returns a dictionary containing
all of the configuration variables.
getpath() returns the configured path for
a particular type of module: the standard library,
site-specific modules, platform-specific modules, etc.
is_python_build() returns true if you’re running a
binary from a Python source tree, and false otherwise.
Consult the sysconfig documentation for more details and for
a complete list of functions.
The Distutils package and sysconfig are now maintained by Tarek
Ziadé, who has also started a Distutils2 package (source repository at
http://hg.python.org/distutils2/) for developing a next-generation
version of Distutils.
Tcl/Tk 8.5 includes a set of themed widgets that re-implement basic Tk
widgets but have a more customizable appearance and can therefore more
closely resemble the native platform’s widgets. This widget
set was originally called Tile, but was renamed to Ttk (for “themed Tk”)
on being added to Tcl/Tck release 8.5.
The ttk module was written by Guilherme Polo and added in
issue 2983. An alternate version called Tile.py, written by
Martin Franklin and maintained by Kevin Walzer, was proposed for
inclusion in issue 2618, but the authors argued that Guilherme
Polo’s work was more comprehensive.
The unittest module was greatly enhanced; many
new features were added. Most of these features were implemented
by Michael Foord, unless otherwise noted. The enhanced version of
the module is downloadable separately for use with Python versions 2.4 to 2.6,
packaged as the unittest2 package, from
http://pypi.python.org/pypi/unittest2.
When used from the command line, the module can automatically discover
tests. It’s not as fancy as py.test or
nose, but provides a simple way
to run tests kept within a set of package directories. For example,
the following command will search the test/ subdirectory for
any importable test files named test*.py:
python-munittestdiscover-stest
Consult the unittest module documentation for more details.
(Developed in issue 6001.)
The main() function supports some other new options:
-b or --buffer will buffer the standard output
and standard error streams during each test. If the test passes,
any resulting output will be discarded; on failure, the buffered
output will be displayed.
-c or --catch will cause the control-C interrupt
to be handled more gracefully. Instead of interrupting the test
process immediately, the currently running test will be completed
and then the partial results up to the interruption will be reported.
If you’re impatient, a second press of control-C will cause an immediate
interruption.
This control-C handler tries to avoid causing problems when the code
being tested or the tests being run have defined a signal handler of
their own, by noticing that a signal handler was already set and
calling it. If this doesn’t work for you, there’s a
removeHandler() decorator that can be used to mark tests that
should have the control-C handling disabled.
-f or --failfast makes
test execution stop immediately when a test fails instead of
continuing to execute further tests. (Suggested by Cliff Dyer and
implemented by Michael Foord; issue 8074.)
The progress messages now show ‘x’ for expected failures
and ‘u’ for unexpected successes when run in verbose mode.
(Contributed by Benjamin Peterson.)
Test cases can raise the SkipTest exception to skip a
test (issue 1034053).
The error messages for assertEqual(),
assertTrue(), and assertFalse()
failures now provide more information. If you set the
longMessage attribute of your TestCase classes to
True, both the standard error message and any additional message you
provide will be printed for failures. (Added by Michael Foord; issue 5663.)
The assertRaises() method now
returns a context handler when called without providing a callable
object to run. For example, you can write this:
Module- and class-level setup and teardown fixtures are now supported.
Modules can contain setUpModule() and tearDownModule()
functions. Classes can have setUpClass() and
tearDownClass() methods that must be defined as class methods
(using @classmethod or equivalent). These functions and
methods are invoked when the test runner switches to a test case in a
different module or class.
The methods addCleanup() and
doCleanups() were added.
addCleanup() lets you add cleanup functions that
will be called unconditionally (after setUp() if
setUp() fails, otherwise after tearDown()). This allows
for much simpler resource allocation and deallocation during tests
(issue 5679).
A number of new methods were added that provide more specialized
tests. Many of these methods were written by Google engineers
for use in their test suites; Gregory P. Smith, Michael Foord, and
GvR worked on merging them into Python’s version of unittest.
assertIs() and assertIsNot()
take two values and check whether the two values evaluate to the same object or not.
(Added by Michael Foord; issue 2578.)
assertMultiLineEqual() compares two strings, and if they’re
not equal, displays a helpful comparison that highlights the
differences in the two strings. This comparison is now used by
default when Unicode strings are compared with assertEqual().
assertRegexpMatches() and
assertNotRegexpMatches() checks whether the
first argument is a string matching or not matching the regular
expression provided as the second argument (issue 8038).
assertRaisesRegexp() checks whether a particular exception
is raised, and then also checks that the string representation of
the exception matches the provided regular expression.
assertItemsEqual() tests whether two provided sequences
contain the same elements.
assertSetEqual() compares whether two sets are equal, and
only reports the differences between the sets in case of error.
Similarly, assertListEqual() and assertTupleEqual()
compare the specified types and explain any differences without necessarily
printing their full values; these methods are now used by default
when comparing lists and tuples using assertEqual().
More generally, assertSequenceEqual() compares two sequences
and can optionally check whether both sequences are of a
particular type.
assertDictEqual() compares two dictionaries and reports the
differences; it’s now used by default when you compare two dictionaries
using assertEqual(). assertDictContainsSubset() checks whether
all of the key/value pairs in first are found in second.
assertAlmostEqual() and assertNotAlmostEqual() test
whether first and second are approximately equal. This method
can either round their difference to an optionally-specified number
of places (the default is 7) and compare it to zero, or require
the difference to be smaller than a supplied delta value.
A new hook lets you extend the assertEqual() method to handle
new data types. The addTypeEqualityFunc() method takes a type
object and a function. The function will be used when both of the
objects being compared are of the specified type. This function
should compare the two objects and raise an exception if they don’t
match; it’s a good idea for the function to provide additional
information about why the two objects aren’t matching, much as the new
sequence comparison methods do.
unittest.main() now takes an optional exit argument. If
False, main() doesn’t call sys.exit(), allowing
main() to be used from the interactive interpreter.
(Contributed by J. Pablo Fernández; issue 3379.)
With all these changes, the unittest.py was becoming awkwardly
large, so the module was turned into a package and the code split into
several files (by Benjamin Peterson). This doesn’t affect how the
module is imported or used.
The version of the ElementTree library included with Python was updated to
version 1.3. Some of the new features are:
The various parsing functions now take a parser keyword argument
giving an XMLParser instance that will
be used. This makes it possible to override the file’s internal encoding:
Errors in parsing XML now raise a ParseError exception, whose
instances have a position attribute
containing a (line, column) tuple giving the location of the problem.
ElementTree’s code for converting trees to a string has been
significantly reworked, making it roughly twice as fast in many
cases. The ElementTreewrite() and Elementwrite() methods now have a method parameter that can be
“xml” (the default), “html”, or “text”. HTML mode will output empty
elements as <empty></empty> instead of <empty/>, and text
mode will skip over elements and only output the text chunks. If
you set the tag attribute of an element to None but
leave its children in place, the element will be omitted when the
tree is written out, so you don’t need to do more extensive rearrangement
to remove a single element.
Namespace handling has also been improved. All xmlns:<whatever>
declarations are now output on the root element, not scattered throughout
the resulting XML. You can set the default namespace for a tree
by setting the default_namespace attribute and can
register new prefixes with register_namespace(). In XML mode,
you can use the true/false xml_declaration parameter to suppress the
XML declaration.
New Element method: extend() appends the items from a
sequence to the element’s children. Elements themselves behave like
sequences, so it’s easy to move children from one element to
another:
New Element method: iter() yields the children of the
element as a generator. It’s also possible to write forchildinelem: to loop over an element’s children. The existing method
getiterator() is now deprecated, as is getchildren()
which constructs and returns a list of children.
New Element method: itertext() yields all chunks of
text that are descendants of the element. For example:
Deprecated: using an element as a Boolean (i.e., ifelem:) would
return true if the element had any children, or false if there were
no children. This behaviour is confusing – None is false, but
so is a childless element? – so it will now trigger a
FutureWarning. In your code, you should be explicit: write
len(elem)!=0 if you’re interested in the number of children,
or elemisnotNone.
Fredrik Lundh develops ElementTree and produced the 1.3 version;
you can read his article describing 1.3 at
http://effbot.org/zone/elementtree-13-intro.htm.
Florent Xicluna updated the version included with
Python, after discussions on python-dev and in issue 6472.)
Changes to Python’s build process and to the C API include:
The latest release of the GNU Debugger, GDB 7, can be scripted
using Python.
When you begin debugging an executable program P, GDB will look for
a file named P-gdb.py and automatically read it. Dave Malcolm
contributed a python-gdb.py that adds a number of
commands useful when debugging Python itself. For example,
py-up and py-down go up or down one Python stack frame,
which usually corresponds to several C stack frames. py-print
prints the value of a Python variable, and py-bt prints the
Python stack trace. (Added as a result of issue 8032.)
If you use the .gdbinit file provided with Python,
the “pyo” macro in the 2.7 version now works correctly when the thread being
debugged doesn’t hold the GIL; the macro now acquires it before printing.
(Contributed by Victor Stinner; issue 3632.)
Py_AddPendingCall() is now thread-safe, letting any
worker thread submit notifications to the main Python thread. This
is particularly useful for asynchronous IO operations.
(Contributed by Kristján Valur Jónsson; issue 4293.)
New function: PyCode_NewEmpty() creates an empty code object;
only the filename, function name, and first line number are required.
This is useful for extension modules that are attempting to
construct a more useful traceback stack. Previously such
extensions needed to call PyCode_New(), which had many
more arguments. (Added by Jeffrey Yasskin.)
New function: PyErr_NewExceptionWithDoc() creates a new
exception class, just as the existing PyErr_NewException() does,
but takes an extra char* argument containing the docstring for the
new exception class. (Added by ‘lekma’ on the Python bug tracker;
issue 7033.)
New function: PyFrame_GetLineNumber() takes a frame object
and returns the line number that the frame is currently executing.
Previously code would need to get the index of the bytecode
instruction currently executing, and then look up the line number
corresponding to that address. (Added by Jeffrey Yasskin.)
New functions: PyLong_AsLongAndOverflow() and
PyLong_AsLongLongAndOverflow() approximates a Python long
integer as a C long or longlong.
If the number is too large to fit into
the output type, an overflow flag is set and returned to the caller.
(Contributed by Case Van Horsen; issue 7528 and issue 7767.)
New function: stemming from the rewrite of string-to-float conversion,
a new PyOS_string_to_double() function was added. The old
PyOS_ascii_strtod() and PyOS_ascii_atof() functions
are now deprecated.
New function: PySys_SetArgvEx() sets the value of
sys.argv and can optionally update sys.path to include the
directory containing the script named by sys.argv[0] depending
on the value of an updatepath parameter.
This function was added to close a security hole for applications
that embed Python. The old function, PySys_SetArgv(), would
always update sys.path, and sometimes it would add the current
directory. This meant that, if you ran an application embedding
Python in a directory controlled by someone else, attackers could
put a Trojan-horse module in the directory (say, a file named
os.py) that your application would then import and run.
If you maintain a C/C++ application that embeds Python, check
whether you’re calling PySys_SetArgv() and carefully consider
whether the application should be using PySys_SetArgvEx()
with updatepath set to false.
Security issue reported as CVE-2008-5983;
discussed in issue 5753, and fixed by Antoine Pitrou.
New macros: the Python header files now define the following macros:
Py_ISALNUM,
Py_ISALPHA,
Py_ISDIGIT,
Py_ISLOWER,
Py_ISSPACE,
Py_ISUPPER,
Py_ISXDIGIT,
and Py_TOLOWER, Py_TOUPPER.
All of these functions are analogous to the C
standard macros for classifying characters, but ignore the current
locale setting, because in
several places Python needs to analyze characters in a
locale-independent way. (Added by Eric Smith;
issue 5793.)
Removed function: PyEval_CallObject is now only available
as a macro. A function version was being kept around to preserve
ABI linking compatibility, but that was in 1997; it can certainly be
deleted by now. (Removed by Antoine Pitrou; issue 8276.)
New format codes: the PyFormat_FromString(),
PyFormat_FromStringV(), and PyErr_Format() functions now
accept %lld and %llu format codes for displaying
C’s longlong types.
(Contributed by Mark Dickinson; issue 7228.)
The complicated interaction between threads and process forking has
been changed. Previously, the child process created by
os.fork() might fail because the child is created with only a
single thread running, the thread performing the os.fork().
If other threads were holding a lock, such as Python’s import lock,
when the fork was performed, the lock would still be marked as
“held” in the new process. But in the child process nothing would
ever release the lock, since the other threads weren’t replicated,
and the child process would no longer be able to perform imports.
Python 2.7 acquires the import lock before performing an
os.fork(), and will also clean up any locks created using the
threading module. C extension modules that have internal
locks, or that call fork() themselves, will not benefit
from this clean-up.
The Py_Finalize() function now calls the internal
threading._shutdown() function; this prevents some exceptions from
being raised when an interpreter shuts down.
(Patch by Adam Olsen; issue 1722344.)
When using the PyMemberDef structure to define attributes
of a type, Python will no longer let you try to delete or set a
T_STRING_INPLACE attribute.
Global symbols defined by the ctypes module are now prefixed
with Py, or with _ctypes. (Implemented by Thomas
Heller; issue 3102.)
New configure option: the --with-system-expat switch allows
building the pyexpat module to use the system Expat library.
(Contributed by Arfrever Frehtes Taifersar Arahesis; issue 7609.)
New configure option: the
--with-valgrind option will now disable the pymalloc
allocator, which is difficult for the Valgrind memory-error detector
to analyze correctly.
Valgrind will therefore be better at detecting memory leaks and
overruns. (Contributed by James Henstridge; issue 2422.)
New configure option: you can now supply an empty string to
--with-dbmliborder= in order to disable all of the various
DBM modules. (Added by Arfrever Frehtes Taifersar Arahesis;
issue 6491.)
The configure script now checks for floating-point rounding bugs
on certain 32-bit Intel chips and defines a X87_DOUBLE_ROUNDING
preprocessor definition. No code currently uses this definition,
but it’s available if anyone wishes to use it.
(Added by Mark Dickinson; issue 2937.)
configure also now sets a LDCXXSHARED Makefile
variable for supporting C++ linking. (Contributed by Arfrever
Frehtes Taifersar Arahesis; issue 1222585.)
The build process now creates the necessary files for pkg-config
support. (Contributed by Clinton Roy; issue 3585.)
The build process now supports Subversion 1.7. (Contributed by
Arfrever Frehtes Taifersar Arahesis; issue 6094.)
Python 3.1 adds a new C datatype, PyCapsule, for providing a
C API to an extension module. A capsule is essentially the holder of
a C void* pointer, and is made available as a module attribute; for
example, the socket module’s API is exposed as socket.CAPI,
and unicodedata exposes ucnhash_CAPI. Other extensions
can import the module, access its dictionary to get the capsule
object, and then get the void* pointer, which will usually point
to an array of pointers to the module’s various API functions.
There is an existing data type already used for this,
PyCObject, but it doesn’t provide type safety. Evil code
written in pure Python could cause a segmentation fault by taking a
PyCObject from module A and somehow substituting it for the
PyCObject in module B. Capsules know their own name,
and getting the pointer requires providing the name:
void *vtable;
if (!PyCapsule_IsValid(capsule, "mymodule.CAPI") {
PyErr_SetString(PyExc_ValueError, "argument type invalid");
return NULL;
}
vtable = PyCapsule_GetPointer(capsule, "mymodule.CAPI");
You are assured that vtable points to whatever you’re expecting.
If a different capsule was passed in, PyCapsule_IsValid() would
detect the mismatched name and return false. Refer to
Providing a C API for an Extension Module for more information on using these objects.
Python 2.7 now uses capsules internally to provide various
extension-module APIs, but the PyCObject_AsVoidPtr() was
modified to handle capsules, preserving compile-time compatibility
with the CObject interface. Use of
PyCObject_AsVoidPtr() will signal a
PendingDeprecationWarning, which is silent by default.
Implemented in Python 3.1 and backported to 2.7 by Larry Hastings;
discussed in issue 5630.
The msvcrt module now contains some constants from
the crtassem.h header file:
CRT_ASSEMBLY_VERSION,
VC_ASSEMBLY_PUBLICKEYTOKEN,
and LIBRARIES_ASSEMBLY_NAME_PREFIX.
(Contributed by David Cournapeau; issue 4365.)
The _winreg module for accessing the registry now implements
the CreateKeyEx() and DeleteKeyEx() functions, extended
versions of previously-supported functions that take several extra
arguments. The DisableReflectionKey(),
EnableReflectionKey(), and QueryReflectionKey() were also
tested and documented.
(Implemented by Brian Curtin: issue 7347.)
The new _beginthreadex() API is used to start threads, and
the native thread-local storage functions are now used.
(Contributed by Kristján Valur Jónsson; issue 3582.)
The os.kill() function now works on Windows. The signal value
can be the constants CTRL_C_EVENT,
CTRL_BREAK_EVENT, or any integer. The first two constants
will send Control-C and Control-Break keystroke events to
subprocesses; any other value will use the TerminateProcess()
API. (Contributed by Miki Tebeka; issue 1220212.)
The os.listdir() function now correctly fails
for an empty path. (Fixed by Hirokazu Yamamoto; issue 5913.)
The mimelib module will now read the MIME database from
the Windows registry when initializing.
(Patch by Gabriel Genellina; issue 4969.)
The path /Library/Python/2.7/site-packages is now appended to
sys.path, in order to share added packages between the system
installation and a user-installed copy of the same version.
(Changed by Ronald Oussoren; issue 4865.)
FreeBSD 7.1’s SO_SETFIB constant, used with
getsockopt()/setsockopt() to select an
alternate routing table, is now available in the socket
module. (Added by Kyle VanderBeek; issue 8235.)
Two benchmark scripts, iobench and ccbench, were
added to the Tools directory. iobench measures the
speed of the built-in file I/O objects returned by open()
while performing various operations, and ccbench is a
concurrency benchmark that tries to measure computing throughput,
thread switching latency, and IO processing bandwidth when
performing several tasks using a varying number of threads.
The Tools/i18n/msgfmt.py script now understands plural
forms in .po files. (Fixed by Martin von Löwis;
issue 5464.)
When importing a module from a .pyc or .pyo file
with an existing .py counterpart, the co_filename
attributes of the resulting code objects are overwritten when the
original filename is obsolete. This can happen if the file has been
renamed, moved, or is accessed through different paths. (Patch by
Ziga Seilnacht and Jean-Paul Calderone; issue 1180193.)
The regrtest.py script now takes a --randseed=
switch that takes an integer that will be used as the random seed
for the -r option that executes tests in random order.
The -r option also reports the seed that was used
(Added by Collin Winter.)
Another regrtest.py switch is -j, which
takes an integer specifying how many tests run in parallel. This
allows reducing the total runtime on multi-core machines.
This option is compatible with several other options, including the
-R switch which is known to produce long runtimes.
(Added by Antoine Pitrou, issue 6152.) This can also be used
with a new -F switch that runs selected tests in a loop
until they fail. (Added by Antoine Pitrou; issue 7312.)
When executed as a script, the py_compile.py module now
accepts '-' as an argument, which will read standard input for
the list of filenames to be compiled. (Contributed by Piotr
Ożarowski; issue 8233.)
This section lists previously described changes and other bugfixes
that may require changes to your code:
The range() function processes its arguments more
consistently; it will now call __int__() on non-float,
non-integer arguments that are supplied to it. (Fixed by Alexander
Belopolsky; issue 1533.)
The string format() method changed the default precision used
for floating-point and complex numbers from 6 decimal
places to 12, which matches the precision used by str().
(Changed by Eric Smith; issue 5920.)
Because of an optimization for the with statement, the special
methods __enter__() and __exit__() must belong to the object’s
type, and cannot be directly attached to the object’s instance. This
affects new-style classes (derived from object) and C extension
types. (issue 6101.)
Due to a bug in Python 2.6, the exc_value parameter to
__exit__() methods was often the string representation of the
exception, not an instance. This was fixed in 2.7, so exc_value
will be an instance as expected. (Fixed by Florent Xicluna;
issue 7853.)
When a restricted set of attributes were set using __slots__,
deleting an unset attribute would not raise AttributeError
as you would expect. Fixed by Benjamin Peterson; issue 7604.)
In the standard library:
Operations with datetime instances that resulted in a year
falling outside the supported range didn’t always raise
OverflowError. Such errors are now checked more carefully
and will now raise the exception. (Reported by Mark Leander, patch
by Anand B. Pillai and Alexander Belopolsky; issue 7150.)
When using Decimal instances with a string’s
format() method, the default alignment was previously
left-alignment. This has been changed to right-alignment, which might
change the output of your programs.
(Changed by Mark Dickinson; issue 6857.)
Comparisons involving a signaling NaN value (or sNAN) now signal
InvalidOperation instead of silently returning a true or
false value depending on the comparison operator. Quiet NaN values
(or NaN) are now hashable. (Fixed by Mark Dickinson;
issue 7279.)
The ElementTree library, xml.etree, no longer escapes
ampersands and angle brackets when outputting an XML processing
instruction (which looks like <?xml-stylesheet href=”#style1”?>)
or comment (which looks like <!– comment –>).
(Patch by Neil Muller; issue 2746.)
The readline() method of StringIO objects now does
nothing when a negative length is requested, as other file-like
objects do. (issue 7348).
The syslog module will now use the value of sys.argv[0] as the
identifier instead of the previous default value of 'python'.
(Changed by Sean Reifschneider; issue 8451.)
The tarfile module’s default error handling has changed, to
no longer suppress fatal errors. The default error level was previously 0,
which meant that errors would only result in a message being written to the
debug log, but because the debug log is not activated by default,
these errors go unnoticed. The default error level is now 1,
which raises an exception if there’s an error.
(Changed by Lars Gustäbel; issue 7357.)
The urlparse module’s urlsplit() now handles
unknown URL schemes in a fashion compliant with RFC 3986: if the
URL is of the form "<something>://...", the text before the
:// is treated as the scheme, even if it’s a made-up scheme that
the module doesn’t know about. This change may break code that
worked around the old behaviour. For example, Python 2.6.4 or 2.5
will return the following:
(Python 2.7 actually produces slightly different output, since it
returns a named tuple instead of a standard tuple.)
For C extensions:
C extensions that use integer format codes with the PyArg_Parse*
family of functions will now raise a TypeError exception
instead of triggering a DeprecationWarning (issue 5080).
Use the new PyOS_string_to_double() function instead of the old
PyOS_ascii_strtod() and PyOS_ascii_atof() functions,
which are now deprecated.
For applications that embed Python:
The PySys_SetArgvEx() function was added, letting
applications close a security hole when the existing
PySys_SetArgv() function was used. Check whether you’re
calling PySys_SetArgv() and carefully consider whether the
application should be using PySys_SetArgvEx() with
updatepath set to false.
The author would like to thank the following people for offering
suggestions, corrections and assistance with various drafts of this
article: Nick Coghlan, Philip Jenvey, Ryan Lovett, R. David Murray,
Hugh Secker-Walker.
This article explains the new features in Python 2.6, released on October 1
2008. The release schedule is described in PEP 361.
The major theme of Python 2.6 is preparing the migration path to
Python 3.0, a major redesign of the language. Whenever possible,
Python 2.6 incorporates new features and syntax from 3.0 while
remaining compatible with existing code by not removing older features
or syntax. When it’s not possible to do that, Python 2.6 tries to do
what it can, adding compatibility functions in a
future_builtins module and a -3 switch to warn about
usages that will become unsupported in 3.0.
Some significant new packages have been added to the standard library,
such as the multiprocessing and json modules, but
there aren’t many new features that aren’t related to Python 3.0 in
some way.
Python 2.6 also sees a number of improvements and bugfixes throughout
the source. A search through the change logs finds there were 259
patches applied and 612 bugs fixed between Python 2.5 and 2.6. Both
figures are likely to be underestimates.
This article doesn’t attempt to provide a complete specification of
the new features, but instead provides a convenient overview. For
full details, you should refer to the documentation for Python 2.6. If
you want to understand the rationale for the design and
implementation, refer to the PEP for a particular new feature.
Whenever possible, “What’s New in Python” links to the bug/patch item
for each change.
The development cycle for Python versions 2.6 and 3.0 was
synchronized, with the alpha and beta releases for both versions being
made on the same days. The development of 3.0 has influenced many
features in 2.6.
Python 3.0 is a far-ranging redesign of Python that breaks
compatibility with the 2.x series. This means that existing Python
code will need some conversion in order to run on
Python 3.0. However, not all the changes in 3.0 necessarily break
compatibility. In cases where new features won’t cause existing code
to break, they’ve been backported to 2.6 and are described in this
document in the appropriate place. Some of the 3.0-derived features
are:
A __complex__() method for converting objects to a complex number.
Alternate syntax for catching exceptions: exceptTypeErrorasexc.
The addition of functools.reduce() as a synonym for the built-in
reduce() function.
Python 3.0 adds several new built-in functions and changes the
semantics of some existing builtins. Functions that are new in 3.0
such as bin() have simply been added to Python 2.6, but existing
builtins haven’t been changed; instead, the future_builtins
module has versions with the new 3.0 semantics. Code written to be
compatible with 3.0 can do fromfuture_builtinsimporthex,map as
necessary.
A new command-line switch, -3, enables warnings
about features that will be removed in Python 3.0. You can run code
with this switch to see how much work will be necessary to port
code to 3.0. The value of this switch is available
to Python code as the boolean variable sys.py3kwarning,
and to C extension code as Py_Py3kWarningFlag.
See also
The 3xxx series of PEPs, which contains proposals for Python 3.0.
PEP 3000 describes the development process for Python 3.0.
Start with PEP 3100 that describes the general goals for Python
3.0, and then explore the higher-numbered PEPS that propose
specific features.
While 2.6 was being developed, the Python development process
underwent two significant changes: we switched from SourceForge’s
issue tracker to a customized Roundup installation, and the
documentation was converted from LaTeX to reStructuredText.
For a long time, the Python developers had been growing increasingly
annoyed by SourceForge’s bug tracker. SourceForge’s hosted solution
doesn’t permit much customization; for example, it wasn’t possible to
customize the life cycle of issues.
The infrastructure committee of the Python Software Foundation
therefore posted a call for issue trackers, asking volunteers to set
up different products and import some of the bugs and patches from
SourceForge. Four different trackers were examined: Jira,
Launchpad,
Roundup, and
Trac.
The committee eventually settled on Jira
and Roundup as the two candidates. Jira is a commercial product that
offers no-cost hosted instances to free-software projects; Roundup
is an open-source project that requires volunteers
to administer it and a server to host it.
After posting a call for volunteers, a new Roundup installation was
set up at http://bugs.python.org. One installation of Roundup can
host multiple trackers, and this server now also hosts issue trackers
for Jython and for the Python web site. It will surely find
other uses in the future. Where possible,
this edition of “What’s New in Python” links to the bug/patch
item for each change.
Hosting of the Python bug tracker is kindly provided by
Upfront Systems
of Stellenbosch, South Africa. Martin von Loewis put a
lot of effort into importing existing bugs and patches from
SourceForge; his scripts for this import operation are at
http://svn.python.org/view/tracker/importer/ and may be useful to
other projects wishing to move from SourceForge to Roundup.
New Documentation Format: reStructuredText Using Sphinx¶
The Python documentation was written using LaTeX since the project
started around 1989. In the 1980s and early 1990s, most documentation
was printed out for later study, not viewed online. LaTeX was widely
used because it provided attractive printed output while remaining
straightforward to write once the basic rules of the markup were
learned.
Today LaTeX is still used for writing publications destined for
printing, but the landscape for programming tools has shifted. We no
longer print out reams of documentation; instead, we browse through it
online and HTML has become the most important format to support.
Unfortunately, converting LaTeX to HTML is fairly complicated and Fred
L. Drake Jr., the long-time Python documentation editor, spent a lot
of time maintaining the conversion process. Occasionally people would
suggest converting the documentation into SGML and later XML, but
performing a good conversion is a major task and no one ever committed
the time required to finish the job.
During the 2.6 development cycle, Georg Brandl put a lot of effort
into building a new toolchain for processing the documentation. The
resulting package is called Sphinx, and is available from
http://sphinx.pocoo.org/.
Sphinx concentrates on HTML output, producing attractively styled and
modern HTML; printed output is still supported through conversion to
LaTeX. The input format is reStructuredText, a markup syntax
supporting custom extensions and directives that is commonly used in
the Python community.
Sphinx is a standalone package that can be used for writing, and
almost two dozen other projects
(listed on the Sphinx web site)
have adopted Sphinx as their documentation tool.
The previous version, Python 2.5, added the ‘with‘
statement as an optional feature, to be enabled by a from__future__importwith_statement directive. In 2.6 the statement no longer needs to
be specially enabled; this means that with is now always a
keyword. The rest of this section is a copy of the corresponding
section from the “What’s New in Python 2.5” document; if you’re
familiar with the ‘with‘ statement
from Python 2.5, you can skip this section.
The ‘with‘ statement clarifies code that previously would use
try...finally blocks to ensure that clean-up code is executed. In this
section, I’ll discuss the statement as it will commonly be used. In the next
section, I’ll examine the implementation details and show how to write objects
for use with this statement.
The ‘with‘ statement is a control-flow structure whose basic
structure is:
withexpression[asvariable]:with-block
The expression is evaluated, and it should result in an object that supports the
context management protocol (that is, has __enter__() and __exit__()
methods).
The object’s __enter__() is called before with-block is executed and
therefore can run set-up code. It also may return a value that is bound to the
name variable, if given. (Note carefully that variable is not assigned
the result of expression.)
After execution of the with-block is finished, the object’s __exit__()
method is called, even if the block raised an exception, and can therefore run
clean-up code.
Some standard Python objects now support the context management protocol and can
be used with the ‘with‘ statement. File objects are one example:
After this statement has executed, the file object in f will have been
automatically closed, even if the for loop raised an exception part-
way through the block.
Note
In this case, f is the same object created by open(), because
file.__enter__() returns self.
The threading module’s locks and condition variables also support the
‘with‘ statement:
lock=threading.Lock()withlock:# Critical section of code...
The lock is acquired before the block is executed and always released once the
block is complete.
The localcontext() function in the decimal module makes it easy
to save and restore the current decimal context, which encapsulates the desired
precision and rounding characteristics for computations:
fromdecimalimportDecimal,Context,localcontext# Displays with default precision of 28 digitsv=Decimal('578')printv.sqrt()withlocalcontext(Context(prec=16)):# All code in this block uses a precision of 16 digits.# The original context is restored on exiting the block.printv.sqrt()
Under the hood, the ‘with‘ statement is fairly complicated. Most
people will only use ‘with‘ in company with existing objects and
don’t need to know these details, so you can skip the rest of this section if
you like. Authors of new objects will need to understand the details of the
underlying implementation and should keep reading.
A high-level explanation of the context management protocol is:
The expression is evaluated and should result in an object called a “context
manager”. The context manager must have __enter__() and __exit__()
methods.
The context manager’s __enter__() method is called. The value returned
is assigned to VAR. If no asVAR clause is present, the value is simply
discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the context manager’s __exit__() method
is called with three arguments, the exception details (type,value,traceback,
the same values returned by sys.exc_info(), which can also be None
if no exception occurred). The method’s return value controls whether an exception
is re-raised: any false value re-raises the exception, and True will result
in suppressing it. You’ll only rarely want to suppress the exception, because
if you do the author of the code containing the ‘with‘ statement will
never realize anything went wrong.
If BLOCK didn’t raise an exception, the __exit__() method is still
called, but type, value, and traceback are all None.
Let’s think through an example. I won’t present detailed code but will only
sketch the methods necessary for a database that supports transactions.
(For people unfamiliar with database terminology: a set of changes to the
database are grouped into a transaction. Transactions can be either committed,
meaning that all the changes are written into the database, or rolled back,
meaning that the changes are all discarded and the database is unchanged. See
any database textbook for more information.)
Let’s assume there’s an object representing a database connection. Our goal will
be to let the user write code like this:
db_connection=DatabaseConnection()withdb_connectionascursor:cursor.execute('insert into ...')cursor.execute('delete from ...')# ... more operations ...
The transaction should be committed if the code in the block runs flawlessly or
rolled back if there’s an exception. Here’s the basic interface for
DatabaseConnection that I’ll assume:
classDatabaseConnection:# Database interfacedefcursor(self):"Returns a cursor object and starts a new transaction"defcommit(self):"Commits current transaction"defrollback(self):"Rolls back current transaction"
The __enter__() method is pretty easy, having only to start a new
transaction. For this application the resulting cursor object would be a useful
result, so the method will return it. The user can then add ascursor to
their ‘with‘ statement to bind the cursor to a variable name.
classDatabaseConnection:...def__enter__(self):# Code to start a new transactioncursor=self.cursor()returncursor
The __exit__() method is the most complicated because it’s where most of
the work has to be done. The method has to check if an exception occurred. If
there was no exception, the transaction is committed. The transaction is rolled
back if there was an exception.
In the code below, execution will just fall off the end of the function,
returning the default value of None. None is false, so the exception
will be re-raised automatically. If you wished, you could be more explicit and
add a return statement at the marked location.
classDatabaseConnection:...def__exit__(self,type,value,tb):iftbisNone:# No exception, so commitself.commit()else:# Exception occurred, so rollback.self.rollback()# return False
The contextlib module provides some functions and a decorator that
are useful when writing objects for use with the ‘with‘ statement.
The decorator is called contextmanager(), and lets you write a single
generator function instead of defining a new class. The generator should yield
exactly one value. The code up to the yield will be executed as the
__enter__() method, and the value yielded will be the method’s return
value that will get bound to the variable in the ‘with‘ statement’s
as clause, if any. The code after the yield will be
executed in the __exit__() method. Any exception raised in the block will
be raised by the yield statement.
Using this decorator, our database example from the previous section
could be written as:
The contextlib module also has a nested(mgr1,mgr2,...) function
that combines a number of context managers so you don’t need to write nested
‘with‘ statements. In this example, the single ‘with‘
statement both starts a database transaction and acquires a thread lock:
Finally, the closing() function returns its argument so that it can be
bound to a variable, and calls the argument’s .close() method at the end
of the block.
PEP written by Guido van Rossum and Nick Coghlan; implemented by Mike Bland,
Guido van Rossum, and Neal Norwitz. The PEP shows the code generated for a
‘with‘ statement, which can be helpful in learning how the statement
works.
PEP 366: Explicit Relative Imports From a Main Module¶
Python’s -m switch allows running a module as a script.
When you ran a module that was located inside a package, relative
imports didn’t work correctly.
The fix for Python 2.6 adds a __package__ attribute to
modules. When this attribute is present, relative imports will be
relative to the value of this attribute instead of the
__name__ attribute.
PEP 302-style importers can then set __package__ as necessary.
The runpy module that implements the -m switch now
does this, so relative imports will now work correctly in scripts
running from inside a package.
When you run Python, the module search path sys.path usually
includes a directory whose path ends in "site-packages". This
directory is intended to hold locally-installed packages available to
all users using a machine or a particular site installation.
Python 2.6 introduces a convention for user-specific site directories.
The directory varies depending on the platform:
Unix and Mac OS X: ~/.local/
Windows: %APPDATA%/Python
Within this directory, there will be version-specific subdirectories,
such as lib/python2.6/site-packages on Unix/Mac OS and
Python26/site-packages on Windows.
If you don’t like the default directory, it can be overridden by an
environment variable. PYTHONUSERBASE sets the root
directory used for all Python versions supporting this feature. On
Windows, the directory for application-specific data can be changed by
setting the APPDATA environment variable. You can also
modify the site.py file for your Python installation.
The feature can be disabled entirely by running Python with the
-s option or setting the PYTHONNOUSERSITE
environment variable.
The new multiprocessing package lets Python programs create new
processes that will perform a computation and return a result to the
parent. The parent and child processes can communicate using queues
and pipes, synchronize their operations using locks and semaphores,
and can share simple arrays of data.
The multiprocessing module started out as an exact emulation of
the threading module using processes instead of threads. That
goal was discarded along the path to Python 2.6, but the general
approach of the module is still similar. The fundamental class
is the Process, which is passed a callable object and
a collection of arguments. The start() method
sets the callable running in a subprocess, after which you can call
the is_alive() method to check whether the subprocess is still running
and the join() method to wait for the process to exit.
Here’s a simple example where the subprocess will calculate a
factorial. The function doing the calculation is written strangely so
that it takes significantly longer when the input argument is a
multiple of 4.
importtimefrommultiprocessingimportProcess,Queuedeffactorial(queue,N):"Compute a factorial."# If N is a multiple of 4, this function will take much longer.if(N%4)==0:time.sleep(.05*N/4)# Calculate the resultfact=1Lforiinrange(1,N+1):fact=fact*i# Put the result on the queuequeue.put(fact)if__name__=='__main__':queue=Queue()N=5p=Process(target=factorial,args=(queue,N))p.start()p.join()result=queue.get()print'Factorial',N,'=',result
A Queue is used to communicate the input parameter N and
the result. The Queue object is stored in a global variable.
The child process will use the value of the variable when the child
was created; because it’s a Queue, parent and child can use
the object to communicate. (If the parent were to change the value of
the global variable, the child’s value would be unaffected, and vice
versa.)
Two other classes, Pool and Manager, provide
higher-level interfaces. Pool will create a fixed number of
worker processes, and requests can then be distributed to the workers
by calling apply() or apply_async() to add a single request,
and map() or map_async() to add a number of
requests. The following code uses a Pool to spread requests
across 5 worker processes and retrieve a list of results:
frommultiprocessingimportPooldeffactorial(N,dictionary):"Compute a factorial."...p=Pool(5)result=p.map(factorial,range(1,1000,10))forvinresult:printv
The other high-level interface, the Manager class, creates a
separate server process that can hold master copies of Python data
structures. Other processes can then access and modify these data
structures using proxy objects. The following example creates a
shared dictionary by calling the dict() method; the worker
processes then insert values into the dictionary. (Locking is not
done for you automatically, which doesn’t matter in this example.
Manager‘s methods also include Lock(), RLock(),
and Semaphore() to create shared locks.)
importtimefrommultiprocessingimportPool,Managerdeffactorial(N,dictionary):"Compute a factorial."# Calculate the resultfact=1Lforiinrange(1,N+1):fact=fact*i# Store result in dictionarydictionary[N]=factif__name__=='__main__':p=Pool(5)mgr=Manager()d=mgr.dict()# Create shared dictionary# Run tasks using the poolforNinrange(1,1000,10):p.apply_async(factorial,(N,d))# Mark pool as closed -- no more tasks can be added.p.close()# Wait for tasks to exitp.join()# Output resultsfork,vinsorted(d.items()):printk,v
In Python 3.0, the % operator is supplemented by a more powerful string
formatting method, format(). Support for the str.format() method
has been backported to Python 2.6.
In 2.6, both 8-bit and Unicode strings have a .format() method that
treats the string as a template and takes the arguments to be formatted.
The formatting template uses curly brackets ({, }) as special characters:
>>> # Substitute positional argument 0 into the string.>>> "User ID: {0}".format("root")'User ID: root'>>> # Use the named keyword arguments>>> "User ID: {uid} Last seen: {last_login}".format(... uid="root",... last_login="5 Mar 2008 07:20")'User ID: root Last seen: 5 Mar 2008 07:20'
Curly brackets can be escaped by doubling them:
>>> "Empty dict: {{}}".format()"Empty dict: {}"
Field names can be integers indicating positional arguments, such as
{0}, {1}, etc. or names of keyword arguments. You can also
supply compound field names that read attributes or access dictionary keys:
Note that when using dictionary-style notation such as [.mp4], you
don’t need to put any quotation marks around the string; it will look
up the value using .mp4 as the key. Strings beginning with a
number will be converted to an integer. You can’t write more
complicated expressions inside a format string.
So far we’ve shown how to specify which field to substitute into the
resulting string. The precise formatting used is also controllable by
adding a colon followed by a format specifier. For example:
>>> # Field 0: left justify, pad to 15 characters>>> # Field 1: right justify, pad to 6 characters>>> fmt='{0:15} ${1:>6}'>>> fmt.format('Registration',35)'Registration $ 35'>>> fmt.format('Tutorial',50)'Tutorial $ 50'>>> fmt.format('Banquet',125)'Banquet $ 125'
Format specifiers can reference other fields through nesting:
The alignment of a field within the desired width can be specified:
Character
Effect
< (default)
Left-align
>
Right-align
^
Center
=
(For numeric types only) Pad after the sign.
Format specifiers can also include a presentation type, which
controls how the value is formatted. For example, floating-point numbers
can be formatted as a general number or in exponential notation:
A variety of presentation types are available. Consult the 2.6
documentation for a complete list; here’s a sample:
b
Binary. Outputs the number in base 2.
c
Character. Converts the integer to the corresponding Unicode character
before printing.
d
Decimal Integer. Outputs the number in base 10.
o
Octal format. Outputs the number in base 8.
x
Hex format. Outputs the number in base 16, using lower-case letters for
the digits above 9.
e
Exponent notation. Prints the number in scientific notation using the
letter ‘e’ to indicate the exponent.
g
General format. This prints the number as a fixed-point number, unless
the number is too large, in which case it switches to ‘e’ exponent
notation.
n
Number. This is the same as ‘g’ (for floats) or ‘d’ (for integers),
except that it uses the current locale setting to insert the appropriate
number separator characters.
%
Percentage. Multiplies the number by 100 and displays in fixed (‘f’)
format, followed by a percent sign.
Classes and types can define a __format__() method to control how they’re
formatted. It receives a single argument, the format specifier:
The print statement becomes the print() function in Python 3.0.
Making print() a function makes it possible to replace the function
by doing defprint(...) or importing a new function from somewhere else.
Python 2.6 has a __future__ import that removes print as language
syntax, letting you use the functional form instead. For example:
>>> from__future__importprint_function>>> print('# of entries',len(dictionary),file=sys.stderr)
The signature of the new function is:
defprint(*args,sep=' ',end='\n',file=None)
The parameters are:
args: positional arguments whose values will be printed out.
sep: the separator, which will be printed between arguments.
end: the ending text, which will be printed after all of the
arguments have been output.
file: the file object to which the output will be sent.
One error that Python programmers occasionally make
is writing the following code:
try:...exceptTypeError,ValueError:# Wrong!...
The author is probably trying to catch both TypeError and
ValueError exceptions, but this code actually does something
different: it will catch TypeError and bind the resulting
exception object to the local name "ValueError". The
ValueError exception will not be caught at all. The correct
code specifies a tuple of exceptions:
try:...except(TypeError,ValueError):...
This error happens because the use of the comma here is ambiguous:
does it indicate two different nodes in the parse tree, or a single
node that’s a tuple?
Python 3.0 makes this unambiguous by replacing the comma with the word
“as”. To catch an exception and store the exception object in the
variable exc, you must write:
try:...exceptTypeErrorasexc:...
Python 3.0 will only support the use of “as”, and therefore interprets
the first example as catching two different exceptions. Python 2.6
supports both the comma and “as”, so existing code will continue to
work. We therefore suggest using “as” when writing new Python code
that will only be executed with 2.6.
Python 3.0 adopts Unicode as the language’s fundamental string type and
denotes 8-bit literals differently, either as b'string'
or using a bytes constructor. For future compatibility,
Python 2.6 adds bytes as a synonym for the str type,
and it also supports the b'' notation.
The 2.6 str differs from 3.0’s bytes type in various
ways; most notably, the constructor is completely different. In 3.0,
bytes([65,66,67]) is 3 elements long, containing the bytes
representing ABC; in 2.6, bytes([65,66,67]) returns the
12-byte string representing the str() of the list.
The primary use of bytes in 2.6 will be to write tests of
object type such as isinstance(x,bytes). This will help the 2to3
converter, which can’t tell whether 2.x code intends strings to
contain either characters or 8-bit bytes; you can now
use either bytes or str to represent your intention
exactly, and the resulting code will also be correct in Python 3.0.
There’s also a __future__ import that causes all string literals
to become Unicode strings. This means that \u escape sequences
can be used to include Unicode characters:
At the C level, Python 3.0 will rename the existing 8-bit
string type, called PyStringObject in Python 2.x,
to PyBytesObject. Python 2.6 uses #define
to support using the names PyBytesObject(),
PyBytes_Check(), PyBytes_FromStringAndSize(),
and all the other functions and macros used with strings.
Instances of the bytes type are immutable just
as strings are. A new bytearray type stores a mutable
sequence of bytes:
Byte arrays support most of the methods of string types, such as
startswith()/endswith(), find()/rfind(),
and some of the methods of lists, such as append(),
pop(), and reverse().
Python’s built-in file objects support a number of methods, but
file-like objects don’t necessarily support all of them. Objects that
imitate files usually support read() and write(), but they
may not support readline(), for example. Python 3.0 introduces
a layered I/O library in the io module that separates buffering
and text-handling features from the fundamental read and write
operations.
There are three levels of abstract base classes provided by
the io module:
RawIOBase defines raw I/O operations: read(),
readinto(),
write(), seek(), tell(), truncate(),
and close().
Most of the methods of this class will often map to a single system call.
There are also readable(), writable(), and seekable()
methods for determining what operations a given object will allow.
Python 3.0 has concrete implementations of this class for files and
sockets, but Python 2.6 hasn’t restructured its file and socket objects
in this way.
BufferedIOBase is an abstract base class that
buffers data in memory to reduce the number of
system calls used, making I/O processing more efficient.
It supports all of the methods of RawIOBase,
and adds a raw attribute holding the underlying raw object.
There are five concrete classes implementing this ABC.
BufferedWriter and BufferedReader are for objects
that support write-only or read-only usage that have a seek()
method for random access. BufferedRandom objects support
read and write access upon the same underlying stream, and
BufferedRWPair is for objects such as TTYs that have both
read and write operations acting upon unconnected streams of data.
The BytesIO class supports reading, writing, and seeking
over an in-memory buffer.
TextIOBase: Provides functions for reading and writing
strings (remember, strings will be Unicode in Python 3.0),
and supporting universal newlines. TextIOBase defines
the readline() method and supports iteration upon
objects.
There are two concrete implementations. TextIOWrapper
wraps a buffered I/O object, supporting all of the methods for
text I/O and adding a buffer attribute for access
to the underlying object. StringIO simply buffers
everything in memory without ever writing anything to disk.
(In Python 2.6, io.StringIO is implemented in
pure Python, so it’s pretty slow. You should therefore stick with the
existing StringIO module or cStringIO for now. At some
point Python 3.0’s io module will be rewritten into C for speed,
and perhaps the C implementation will be backported to the 2.x releases.)
In Python 2.6, the underlying implementations haven’t been
restructured to build on top of the io module’s classes. The
module is being provided to make it easier to write code that’s
forward-compatible with 3.0, and to save developers the effort of writing
their own implementations of buffering and text I/O.
PEP written by Daniel Stutzbach, Mike Verdone, and Guido van Rossum.
Code by Guido van Rossum, Georg Brandl, Walter Doerwald,
Jeremy Hylton, Martin von Loewis, Tony Lownds, and others.
The buffer protocol is a C-level API that lets Python types
exchange pointers into their internal representations. A
memory-mapped file can be viewed as a buffer of characters, for
example, and this lets another module such as re
treat memory-mapped files as a string of characters to be searched.
The primary users of the buffer protocol are numeric-processing
packages such as NumPy, which expose the internal representation
of arrays so that callers can write data directly into an array instead
of going through a slower API. This PEP updates the buffer protocol in light of experience
from NumPy development, adding a number of new features
such as indicating the shape of an array or locking a memory region.
The most important new C API function is
PyObject_GetBuffer(PyObject*obj,Py_buffer*view,intflags), which
takes an object and a set of flags, and fills in the
Py_buffer structure with information
about the object’s memory representation. Objects
can use this operation to lock memory in place
while an external caller could be modifying the contents,
so there’s a corresponding PyBuffer_Release(Py_buffer*view) to
indicate that the external caller is done.
The flags argument to PyObject_GetBuffer() specifies
constraints upon the memory returned. Some examples are:
PyBUF_WRITABLE indicates that the memory must be writable.
PyBUF_LOCK requests a read-only or exclusive lock on the memory.
PyBUF_C_CONTIGUOUS and PyBUF_F_CONTIGUOUS
requests a C-contiguous (last dimension varies the fastest) or
Fortran-contiguous (first dimension varies the fastest) array layout.
Two new argument codes for PyArg_ParseTuple(),
s* and z*, return locked buffer objects for a parameter.
Some object-oriented languages such as Java support interfaces,
declaring that a class has a given set of methods or supports a given
access protocol. Abstract Base Classes (or ABCs) are an equivalent
feature for Python. The ABC support consists of an abc module
containing a metaclass called ABCMeta, special handling of
this metaclass by the isinstance() and issubclass()
builtins, and a collection of basic ABCs that the Python developers
think will be widely useful. Future versions of Python will probably
add more ABCs.
Let’s say you have a particular class and wish to know whether it supports
dictionary-style access. The phrase “dictionary-style” is vague, however.
It probably means that accessing items with obj[1] works.
Does it imply that setting items with obj[2]=value works?
Or that the object will have keys(), values(), and items()
methods? What about the iterative variants such as iterkeys()? copy()
and update()? Iterating over the object with iter()?
The Python 2.6 collections module includes a number of
different ABCs that represent these distinctions. Iterable
indicates that a class defines __iter__(), and
Container means the class defines a __contains__()
method and therefore supports xiny expressions. The basic
dictionary interface of getting items, setting items, and
keys(), values(), and items(), is defined by the
MutableMapping ABC.
You can derive your own classes from a particular ABC
to indicate they support that ABC’s interface:
For classes that you write, deriving from the ABC is probably clearer.
The register() method is useful when you’ve written a new
ABC that can describe an existing type or class, or if you want
to declare that some third-party class implements an ABC.
For example, if you defined a PrintableType ABC,
it’s legal to do:
Classes should obey the semantics specified by an ABC, but
Python can’t check this; it’s up to the class author to
understand the ABC’s requirements and to implement the code accordingly.
To check whether an object supports a particular interface, you can
now write:
deffunc(d):ifnotisinstance(d,collections.MutableMapping):raiseValueError("Mapping object expected, not %r"%d)
Don’t feel that you must now begin writing lots of checks as in the
above example. Python has a strong tradition of duck-typing, where
explicit type-checking is never done and code simply calls methods on
an object, trusting that those methods will be there and raising an
exception if they aren’t. Be judicious in checking for ABCs and only
do it where it’s absolutely necessary.
You can write your own ABCs by using abc.ABCMeta as the
metaclass in a class definition:
In the Drawable ABC above, the draw_doubled() method
renders the object at twice its size and can be implemented in terms
of other methods described in Drawable. Classes implementing
this ABC therefore don’t need to provide their own implementation
of draw_doubled(), though they can do so. An implementation
of draw() is necessary, though; the ABC can’t provide
a useful generic implementation.
You can apply the @abstractmethod decorator to methods such as
draw() that must be implemented; Python will then raise an
exception for classes that don’t define the method.
Note that the exception is only raised when you actually
try to create an instance of a subclass lacking the method:
>>> classCircle(Drawable):... pass...>>> c=Circle()Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: Can't instantiate abstract class Circle with abstract methods draw>>>
Abstract data attributes can be declared using the
@abstractproperty decorator:
Python 3.0 changes the syntax for octal (base-8) integer literals,
prefixing them with “0o” or “0O” instead of a leading zero, and adds
support for binary (base-2) integer literals, signalled by a “0b” or
“0B” prefix.
Python 2.6 doesn’t drop support for a leading 0 signalling
an octal number, but it does add support for “0o” and “0b”:
>>> 0o21,2*8+1(17, 17)>>> 0b10111147
The oct() builtin still returns numbers
prefixed with a leading zero, and a new bin()
builtin returns the binary representation for a number:
The int() and long() builtins will now accept the “0o”
and “0b” prefixes when base-8 or base-2 are requested, or when the
base argument is zero (signalling that the base used should be
determined from the string):
Python 3.0 adds several abstract base classes for numeric types
inspired by Scheme’s numeric tower. These classes were backported to
2.6 as the numbers module.
The most general ABC is Number. It defines no operations at
all, and only exists to allow checking if an object is a number by
doing isinstance(obj,Number).
Complex is a subclass of Number. Complex numbers
can undergo the basic operations of addition, subtraction,
multiplication, division, and exponentiation, and you can retrieve the
real and imaginary parts and obtain a number’s conjugate. Python’s built-in
complex type is an implementation of Complex.
Real further derives from Complex, and adds
operations that only work on real numbers: floor(), trunc(),
rounding, taking the remainder mod N, floor division,
and comparisons.
Rational numbers derive from Real, have
numerator and denominator properties, and can be
converted to floats. Python 2.6 adds a simple rational-number class,
Fraction, in the fractions module. (It’s called
Fraction instead of Rational to avoid
a name clash with numbers.Rational.)
Integral numbers derive from Rational, and
can be shifted left and right with << and >>,
combined using bitwise operations such as & and |,
and can be used as array indexes and slice boundaries.
In Python 3.0, the PEP slightly redefines the existing builtins
round(), math.floor(), math.ceil(), and adds a new
one, math.trunc(), that’s been backported to Python 2.6.
math.trunc() rounds toward zero, returning the closest
Integral that’s between the function’s argument and zero.
To fill out the hierarchy of numeric types, the fractions
module provides a rational-number class. Rational numbers store their
values as a numerator and denominator forming a fraction, and can
exactly represent numbers such as 2/3 that floating-point numbers
can only approximate.
The Fraction constructor takes two Integral values
that will be the numerator and denominator of the resulting fraction.
For converting floating-point numbers to rationals,
the float type now has an as_integer_ratio() method that returns
the numerator and denominator for a fraction that evaluates to the same
floating-point value:
Note that values that can only be approximated by floating-point
numbers, such as 1./3, are not simplified to the number being
approximated; the fraction attempts to match the floating-point value
exactly.
The fractions module is based upon an implementation by Sjoerd
Mullender that was in Python’s Demo/classes/ directory for a
long time. This implementation was significantly updated by Jeffrey
Yasskin.
Some smaller changes made to the core Python language are:
Directories and zip archives containing a __main__.py file
can now be executed directly by passing their name to the
interpreter. The directory or zip archive is automatically inserted
as the first entry in sys.path. (Suggestion and initial patch by
Andy Chu, subsequently revised by Phillip J. Eby and Nick Coghlan;
issue 1739468.)
The hasattr() function was catching and ignoring all errors,
under the assumption that they meant a __getattr__() method
was failing somehow and the return value of hasattr() would
therefore be False. This logic shouldn’t be applied to
KeyboardInterrupt and SystemExit, however; Python 2.6
will no longer discard such exceptions when hasattr()
encounters them. (Fixed by Benjamin Peterson; issue 2196.)
When calling a function using the ** syntax to provide keyword
arguments, you are no longer required to use a Python dictionary;
any mapping will now work:
Previously this would have been a syntax error.
(Contributed by Amaury Forgeot d’Arc; issue 3473.)
A new builtin, next(iterator,[default]) returns the next item
from the specified iterator. If the default argument is supplied,
it will be returned if iterator has been exhausted; otherwise,
the StopIteration exception will be raised. (Backported
in issue 2719.)
Tuples now have index() and count() methods matching the
list type’s index() and count() methods:
The built-in types now have improved support for extended slicing syntax,
accepting various combinations of (start,stop,step).
Previously, the support was partial and certain corner cases wouldn’t work.
(Implemented by Thomas Wouters.)
Properties now have three attributes, getter, setter
and deleter, that are decorators providing useful shortcuts
for adding a getter, setter or deleter function to an existing
property. You would use them like this:
Several methods of the built-in set types now accept multiple iterables:
intersection(),
intersection_update(),
union(), update(),
difference() and difference_update().
>>> s=set('1234567890')>>> s.intersection('abc123','cdf246')# Intersection between all inputsset(['2'])>>> s.difference('246','789')set(['1', '0', '3', '5'])
(Contributed by Raymond Hettinger.)
Many floating-point features were added. The float() function
will now turn the string nan into an
IEEE 754 Not A Number value, and +inf and -inf into
positive or negative infinity. This works on any platform with
IEEE 754 semantics. (Contributed by Christian Heimes; issue 1635.)
Other functions in the math module, isinf() and
isnan(), return true if their floating-point argument is
infinite or Not A Number. (issue 1640)
Conversion functions were added to convert floating-point numbers
into hexadecimal strings (issue 3008). These functions
convert floats to and from a string representation without
introducing rounding errors from the conversion between decimal and
binary. Floats have a hex() method that returns a string
representation, and the float.fromhex() method converts a string
back into a number:
A numerical nicety: when creating a complex number from two floats
on systems that support signed zeros (-0 and +0), the
complex() constructor will now preserve the sign
of the zero. (Fixed by Mark T. Dickinson; issue 1507.)
Classes that inherit a __hash__() method from a parent class
can set __hash__=None to indicate that the class isn’t
hashable. This will make hash(obj) raise a TypeError
and the class will not be indicated as implementing the
Hashable ABC.
You should do this when you’ve defined a __cmp__() or
__eq__() method that compares objects by their value rather
than by identity. All objects have a default hash method that uses
id(obj) as the hash value. There’s no tidy way to remove the
__hash__() method inherited from a parent class, so
assigning None was implemented as an override. At the
C level, extensions can set tp_hash to
PyObject_HashNotImplemented().
(Fixed by Nick Coghlan and Amaury Forgeot d’Arc; issue 2235.)
Generator objects now have a gi_code attribute that refers to
the original code object backing the generator.
(Contributed by Collin Winter; issue 1473257.)
The compile() built-in function now accepts keyword arguments
as well as positional parameters. (Contributed by Thomas Wouters;
issue 1444529.)
The complex() constructor now accepts strings containing
parenthesized complex numbers, meaning that complex(repr(cplx))
will now round-trip values. For example, complex('(3+4j)')
now returns the value (3+4j). (issue 1491866)
The string translate() method now accepts None as the
translation table parameter, which is treated as the identity
transformation. This makes it easier to carry out operations
that only delete characters. (Contributed by Bengt Richter and
implemented by Raymond Hettinger; issue 1193128.)
The built-in dir() function now checks for a __dir__()
method on the objects it receives. This method must return a list
of strings containing the names of valid attributes for the object,
and lets the object control the value that dir() produces.
Objects that have __getattr__() or __getattribute__()
methods can use this to advertise pseudo-attributes they will honor.
(issue 1591665)
Instance method objects have new attributes for the object and function
comprising the method; the new synonym for im_self is
__self__, and im_func is also available as __func__.
The old names are still supported in Python 2.6, but are gone in 3.0.
An obscure change: when you use the locals() function inside a
class statement, the resulting dictionary no longer returns free
variables. (Free variables, in this case, are variables referenced in the
class statement that aren’t attributes of the class.)
The warnings module has been rewritten in C. This makes
it possible to invoke warnings from the parser, and may also
make the interpreter’s startup faster.
(Contributed by Neal Norwitz and Brett Cannon; issue 1631171.)
Type objects now have a cache of methods that can reduce
the work required to find the correct method implementation
for a particular class; once cached, the interpreter doesn’t need to
traverse base classes to figure out the right method to call.
The cache is cleared if a base class or the class itself is modified,
so the cache should remain correct even in the face of Python’s dynamic
nature.
(Original optimization implemented by Armin Rigo, updated for
Python 2.6 by Kevin Jacobs; issue 1700288.)
By default, this change is only applied to types that are included with
the Python core. Extension modules may not necessarily be compatible with
this cache,
so they must explicitly add Py_TPFLAGS_HAVE_VERSION_TAG
to the module’s tp_flags field to enable the method cache.
(To be compatible with the method cache, the extension module’s code
must not directly access and modify the tp_dict member of
any of the types it implements. Most modules don’t do this,
but it’s impossible for the Python interpreter to determine that.
See issue 1878 for some discussion.)
Function calls that use keyword arguments are significantly faster
by doing a quick pointer comparison, usually saving the time of a
full string comparison. (Contributed by Raymond Hettinger, after an
initial implementation by Antoine Pitrou; issue 1819.)
All of the functions in the struct module have been rewritten in
C, thanks to work at the Need For Speed sprint.
(Contributed by Raymond Hettinger.)
Some of the standard built-in types now set a bit in their type
objects. This speeds up checking whether an object is a subclass of
one of these types. (Contributed by Neal Norwitz.)
Unicode strings now use faster code for detecting
whitespace and line breaks; this speeds up the split() method
by about 25% and splitlines() by 35%.
(Contributed by Antoine Pitrou.) Memory usage is reduced
by using pymalloc for the Unicode string’s data.
The with statement now stores the __exit__() method on the stack,
producing a small speedup. (Implemented by Jeffrey Yasskin.)
To reduce memory usage, the garbage collector will now clear internal
free lists when garbage-collecting the highest generation of objects.
This may return memory to the operating system sooner.
Two command-line options have been reserved for use by other Python
implementations. The -J switch has been reserved for use by
Jython for Jython-specific options, such as switches that are passed to
the underlying JVM. -X has been reserved for options
specific to a particular implementation of Python such as CPython,
Jython, or IronPython. If either option is used with Python 2.6, the
interpreter will report that the option isn’t currently used.
Python can now be prevented from writing .pyc or .pyo
files by supplying the -B switch to the Python interpreter,
or by setting the PYTHONDONTWRITEBYTECODE environment
variable before running the interpreter. This setting is available to
Python programs as the sys.dont_write_bytecode variable, and
Python code can change the value to modify the interpreter’s
behaviour. (Contributed by Neal Norwitz and Georg Brandl.)
The encoding used for standard input, output, and standard error can
be specified by setting the PYTHONIOENCODING environment
variable before running the interpreter. The value should be a string
in the form <encoding> or <encoding>:<errorhandler>.
The encoding part specifies the encoding’s name, e.g. utf-8 or
latin-1; the optional errorhandler part specifies
what to do with characters that can’t be handled by the encoding,
and should be one of “error”, “ignore”, or “replace”. (Contributed
by Martin von Loewis.)
As in every release, Python’s standard library received a number of
enhancements and bug fixes. Here’s a partial list of the most notable
changes, sorted alphabetically by module name. Consult the
Misc/NEWS file in the source tree for a more complete list of
changes, or look through the Subversion logs for all the details.
The asyncore and asynchat modules are
being actively maintained again, and a number of patches and bugfixes
were applied. (Maintained by Josiah Carlson; see issue 1736190 for
one patch.)
The bsddb module also has a new maintainer, Jesús Cea Avion, and the package
is now available as a standalone package. The web page for the package is
www.jcea.es/programacion/pybsddb.htm.
The plan is to remove the package from the standard library
in Python 3.0, because its pace of releases is much more frequent than
Python’s.
The bsddb.dbshelve module now uses the highest pickling protocol
available, instead of restricting itself to protocol 1.
(Contributed by W. Barnes.)
The cgi module will now read variables from the query string
of an HTTP POST request. This makes it possible to use form actions
with URLs that include query strings such as
“/cgi-bin/add.py?category=1”. (Contributed by Alexandre Fiori and
Nubis; issue 1817.)
The parse_qs() and parse_qsl() functions have been
relocated from the cgi module to the urlparse module.
The versions still available in the cgi module will
trigger PendingDeprecationWarning messages in 2.6
(issue 600362).
The cmath module underwent extensive revision,
contributed by Mark Dickinson and Christian Heimes.
Five new functions were added:
polar() converts a complex number to polar form, returning
the modulus and argument of the complex number.
rect() does the opposite, turning a modulus, argument pair
back into the corresponding complex number.
phase() returns the argument (also called the angle) of a complex
number.
isnan() returns True if either
the real or imaginary part of its argument is a NaN.
isinf() returns True if either the real or imaginary part of
its argument is infinite.
The revisions also improved the numerical soundness of the
cmath module. For all functions, the real and imaginary
parts of the results are accurate to within a few units of least
precision (ulps) whenever possible. See issue 1381 for the
details. The branch cuts for asinh(), atanh(): and
atan() have also been corrected.
The tests for the module have been greatly expanded; nearly 2000 new
test cases exercise the algebraic functions.
On IEEE 754 platforms, the cmath module now handles IEEE 754
special values and floating-point exceptions in a manner consistent
with Annex ‘G’ of the C99 standard.
A new data type in the collections module: namedtuple(typename,fieldnames) is a factory function that creates subclasses of the standard tuple
whose fields are accessible by name as well as index. For example:
>>> var_type=collections.namedtuple('variable',... 'id name type size')>>> # Names are separated by spaces or commas.>>> # 'id, name, type, size' would also work.>>> var_type._fields('id', 'name', 'type', 'size')>>> var=var_type(1,'frequency','int',4)>>> printvar[0],var.id# Equivalent1 1>>> printvar[2],var.type# Equivalentint int>>> var._asdict(){'size': 4, 'type': 'int', 'id': 1, 'name': 'frequency'}>>> v2=var._replace(name='amplitude')>>> v2variable(id=1, name='amplitude', type='int', size=4)
Several places in the standard library that returned tuples have
been modified to return namedtuple instances. For example,
the Decimal.as_tuple() method now returns a named tuple with
sign, digits, and exponent fields.
(Contributed by Raymond Hettinger.)
Another change to the collections module is that the
deque type now supports an optional maxlen parameter;
if supplied, the deque’s size will be restricted to no more
than maxlen items. Adding more items to a full deque causes
old items to be discarded.
The Cookie module’s Morsel objects now support an
httponly attribute. In some browsers. cookies with this attribute
set cannot be accessed or manipulated by JavaScript code.
(Contributed by Arvin Schnell; issue 1638033.)
A new window method in the curses module,
chgat(), changes the display attributes for a certain number of
characters on a single line. (Contributed by Fabian Kreutz.)
# Boldface text starting at y=0,x=21# and affecting the rest of the line.stdscr.chgat(0,21,curses.A_BOLD)
The Textbox class in the curses.textpad module
now supports editing in insert mode as well as overwrite mode.
Insert mode is enabled by supplying a true value for the insert_mode
parameter when creating the Textbox instance.
The datetime module’s strftime() methods now support a
%f format code that expands to the number of microseconds in the
object, zero-padded on
the left to six places. (Contributed by Skip Montanaro; issue 1158.)
The decimal module was updated to version 1.66 of
the General Decimal Specification. New features
include some methods for some basic mathematical functions such as
exp() and log10():
The as_tuple() method of Decimal objects now returns a
named tuple with sign, digits, and exponent fields.
(Implemented by Facundo Batista and Mark Dickinson. Named tuple
support added by Raymond Hettinger.)
The difflib module’s SequenceMatcher class
now returns named tuples representing matches,
with a, b, and size attributes.
(Contributed by Raymond Hettinger.)
An optional timeout parameter, specifying a timeout measured in
seconds, was added to the ftplib.FTP class constructor as
well as the connect() method. (Added by Facundo Batista.)
Also, the FTP class’s storbinary() and
storlines() now take an optional callback parameter that
will be called with each block of data after the data has been sent.
(Contributed by Phil Schwartz; issue 1221598.)
The reduce() built-in function is also available in the
functools module. In Python 3.0, the builtin has been
dropped and reduce() is only available from functools;
currently there are no plans to drop the builtin in the 2.x series.
(Patched by Christian Heimes; issue 1739906.)
When possible, the getpass module will now use
/dev/tty to print a prompt message and read the password,
falling back to standard error and standard input. If the
password may be echoed to the terminal, a warning is printed before
the prompt is displayed. (Contributed by Gregory P. Smith.)
The glob.glob() function can now return Unicode filenames if
a Unicode path was used and Unicode filenames are matched within the
directory. (issue 1001604)
A new function in the heapq module, merge(iter1,iter2,...),
takes any number of iterables returning data in sorted
order, and returns a new generator that returns the contents of all
the iterators, also in sorted order. For example:
Another new function, heappushpop(heap,item),
pushes item onto heap, then pops off and returns the smallest item.
This is more efficient than making a call to heappush() and then
heappop().
heapq is now implemented to only use less-than comparison,
instead of the less-than-or-equal comparison it previously used.
This makes heapq‘s usage of a type match the
list.sort() method.
(Contributed by Raymond Hettinger.)
An optional timeout parameter, specifying a timeout measured in
seconds, was added to the httplib.HTTPConnection and
HTTPSConnection class constructors. (Added by Facundo
Batista.)
Most of the inspect module’s functions, such as
getmoduleinfo() and getargs(), now return named tuples.
In addition to behaving like tuples, the elements of the return value
can also be accessed as attributes.
(Contributed by Raymond Hettinger.)
Some new functions in the module include
isgenerator(), isgeneratorfunction(),
and isabstract().
The itertools module gained several new functions.
izip_longest(iter1,iter2,...[,fillvalue]) makes tuples from
each of the elements; if some of the iterables are shorter than
others, the missing values are set to fillvalue. For example:
product(iter1,iter2,...,[repeat=N]) returns the Cartesian product
of the supplied iterables, a set of tuples containing
every possible combination of the elements returned from each iterable.
The optional repeat keyword argument is used for taking the
product of an iterable or a set of iterables with themselves,
repeated N times. With a single iterable argument, N-tuples
are returned:
permutations(iter[,r]) returns all the permutations of length r of
the iterable’s elements. If r is not specified, it will default to the
number of elements produced by the iterable.
itertools.chain(*iterables) is an existing function in
itertools that gained a new constructor in Python 2.6.
itertools.chain.from_iterable(iterable) takes a single
iterable that should return other iterables. chain() will
then return all the elements of the first iterable, then
all the elements of the second, and so on.
The logging module’s FileHandler class
and its subclasses WatchedFileHandler, RotatingFileHandler,
and TimedRotatingFileHandler now
have an optional delay parameter to their constructors. If delay
is true, opening of the log file is deferred until the first
emit() call is made. (Contributed by Vinay Sajip.)
TimedRotatingFileHandler also has a utc constructor
parameter. If the argument is true, UTC time will be used
in determining when midnight occurs and in generating filenames;
otherwise local time will be used.
Several new functions were added to the math module:
isinf() and isnan() determine whether a given float
is a (positive or negative) infinity or a NaN (Not a Number), respectively.
copysign() copies the sign bit of an IEEE 754 number,
returning the absolute value of x combined with the sign bit of
y. For example, math.copysign(1,-0.0) returns -1.0.
(Contributed by Christian Heimes.)
factorial() computes the factorial of a number.
(Contributed by Raymond Hettinger; issue 2138.)
fsum() adds up the stream of numbers from an iterable,
and is careful to avoid loss of precision through using partial sums.
(Contributed by Jean Brouwers, Raymond Hettinger, and Mark Dickinson;
issue 2819.)
log1p() returns the natural logarithm of 1+x
(base e).
trunc() rounds a number toward zero, returning the closest
Integral that’s between the function’s argument and zero.
Added as part of the backport of
PEP 3141’s type hierarchy for numbers.
The math module has been improved to give more consistent
behaviour across platforms, especially with respect to handling of
floating-point exceptions and IEEE 754 special values.
Whenever possible, the module follows the recommendations of the C99
standard about 754’s special values. For example, sqrt(-1.)
should now give a ValueError across almost all platforms,
while sqrt(float('NaN')) should return a NaN on all IEEE 754
platforms. Where Annex ‘F’ of the C99 standard recommends signaling
‘divide-by-zero’ or ‘invalid’, Python will raise ValueError.
Where Annex ‘F’ of the C99 standard recommends signaling ‘overflow’,
Python will raise OverflowError. (See issue 711019 and
issue 1640.)
(Contributed by Christian Heimes and Mark Dickinson.)
mmap objects now have a rfind() method that searches for a
substring beginning at the end of the string and searching
backwards. The find() method also gained an end parameter
giving an index at which to stop searching.
(Contributed by John Lenton.)
The operator module gained a
methodcaller() function that takes a name and an optional
set of arguments, returning a callable that will call
the named function on any arguments passed to it. For example:
>>> # Equivalent to lambda s: s.replace('old', 'new')>>> replacer=operator.methodcaller('replace','old','new')>>> replacer('old wine in old bottles')'new wine in new bottles'
(Contributed by Georg Brandl, after a suggestion by Gregory Petrosyan.)
The attrgetter() function now accepts dotted names and performs
the corresponding attribute lookups:
(Contributed by Georg Brandl, after a suggestion by Barry Warsaw.)
The os module now wraps several new system calls.
fchmod(fd,mode) and fchown(fd,uid,gid) change the mode
and ownership of an opened file, and lchmod(path,mode) changes
the mode of a symlink. (Contributed by Georg Brandl and Christian
Heimes.)
chflags() and lchflags() are wrappers for the
corresponding system calls (where they’re available), changing the
flags set on a file. Constants for the flag values are defined in
the stat module; some possible values include
UF_IMMUTABLE to signal the file may not be changed and
UF_APPEND to indicate that data can only be appended to the
file. (Contributed by M. Levinson.)
os.closerange(low,high) efficiently closes all file descriptors
from low to high, ignoring any errors and not including high itself.
This function is now used by the subprocess module to make starting
processes faster. (Contributed by Georg Brandl; issue 1663329.)
The os.environ object’s clear() method will now unset the
environment variables using os.unsetenv() in addition to clearing
the object’s keys. (Contributed by Martin Horcicka; issue 1181.)
The os.walk() function now has a followlinks parameter. If
set to True, it will follow symlinks pointing to directories and
visit the directory’s contents. For backward compatibility, the
parameter’s default value is false. Note that the function can fall
into an infinite recursion if there’s a symlink that points to a
parent directory. (issue 1273829)
In the os.path module, the splitext() function
has been changed to not split on leading period characters.
This produces better results when operating on Unix’s dot-files.
For example, os.path.splitext('.ipython')
now returns ('.ipython','') instead of ('','.ipython').
(issue 1115886)
A new function, os.path.relpath(path,start='.'), returns a relative path
from the start path, if it’s supplied, or from the current
working directory to the destination path. (Contributed by
Richard Barran; issue 1339796.)
On Windows, os.path.expandvars() will now expand environment variables
given in the form “%var%”, and “~user” will be expanded into the
user’s home directory path. (Contributed by Josiah Carlson;
issue 957650.)
The Python debugger provided by the pdb module
gained a new command: “run” restarts the Python program being debugged
and can optionally take new command-line arguments for the program.
(Contributed by Rocky Bernstein; issue 1393667.)
The pdb.post_mortem() function, used to begin debugging a
traceback, will now use the traceback returned by sys.exc_info()
if no traceback is supplied. (Contributed by Facundo Batista;
issue 1106316.)
The pickletools module now has an optimize() function
that takes a string containing a pickle and removes some unused
opcodes, returning a shorter pickle that contains the same data structure.
(Contributed by Raymond Hettinger.)
A get_data() function was added to the pkgutil
module that returns the contents of resource files included
with an installed Python package. For example:
The pyexpat module’s Parser objects now allow setting
their buffer_size attribute to change the size of the buffer
used to hold character data.
(Contributed by Achim Gaedke; issue 1137.)
The Queue module now provides queue variants that retrieve entries
in different orders. The PriorityQueue class stores
queued items in a heap and retrieves them in priority order,
and LifoQueue retrieves the most recently added entries first,
meaning that it behaves like a stack.
(Contributed by Raymond Hettinger.)
The random module’s Random objects can
now be pickled on a 32-bit system and unpickled on a 64-bit
system, and vice versa. Unfortunately, this change also means
that Python 2.6’s Random objects can’t be unpickled correctly
on earlier versions of Python.
(Contributed by Shawn Ligocki; issue 1727780.)
The new triangular(low,high,mode) function returns random
numbers following a triangular distribution. The returned values
are between low and high, not including high itself, and
with mode as the most frequently occurring value
in the distribution. (Contributed by Wladmir van der Laan and
Raymond Hettinger; issue 1681432.)
Long regular expression searches carried out by the re
module will check for signals being delivered, so
time-consuming searches can now be interrupted.
(Contributed by Josh Hoyt and Ralf Schmitt; issue 846388.)
The regular expression module is implemented by compiling bytecodes
for a tiny regex-specific virtual machine. Untrusted code
could create malicious strings of bytecode directly and cause crashes,
so Python 2.6 includes a verifier for the regex bytecode.
(Contributed by Guido van Rossum from work for Google App Engine;
issue 3487.)
The rlcompleter module’s Completer.complete() method
will now ignore exceptions triggered while evaluating a name.
(Fixed by Lorenz Quack; issue 2250.)
The sched module’s scheduler instances now
have a read-only queue attribute that returns the
contents of the scheduler’s queue, represented as a list of
named tuples with the fields (time,priority,action,argument).
(Contributed by Raymond Hettinger; issue 1861.)
The select module now has wrapper functions
for the Linux epoll() and BSD kqueue() system calls.
modify() method was added to the existing poll
objects; pollobj.modify(fd,eventmask) takes a file descriptor
or file object and an event mask, modifying the recorded event mask
for that file.
(Contributed by Christian Heimes; issue 1657.)
The shutil.copytree() function now has an optional ignore argument
that takes a callable object. This callable will receive each directory path
and a list of the directory’s contents, and returns a list of names that
will be ignored, not copied.
The shutil module also provides an ignore_patterns()
function for use with this new parameter. ignore_patterns()
takes an arbitrary number of glob-style patterns and returns a
callable that will ignore any files and directories that match any
of these patterns. The following example copies a directory tree,
but skips both .svn directories and Emacs backup files,
which have names ending with ‘~’:
Integrating signal handling with GUI handling event loops
like those used by Tkinter or GTk+ has long been a problem; most
software ends up polling, waking up every fraction of a second to check
if any GUI events have occurred.
The signal module can now make this more efficient.
Calling signal.set_wakeup_fd(fd) sets a file descriptor
to be used; when a signal is received, a byte is written to that
file descriptor. There’s also a C-level function,
PySignal_SetWakeupFd(), for setting the descriptor.
Event loops will use this by opening a pipe to create two descriptors,
one for reading and one for writing. The writable descriptor
will be passed to set_wakeup_fd(), and the readable descriptor
will be added to the list of descriptors monitored by the event loop via
select() or poll().
On receiving a signal, a byte will be written and the main event loop
will be woken up, avoiding the need to poll.
The siginterrupt() function is now available from Python code,
and allows changing whether signals can interrupt system calls or not.
(Contributed by Ralf Schmitt.)
The setitimer() and getitimer() functions have also been
added (where they’re available). setitimer()
allows setting interval timers that will cause a signal to be
delivered to the process after a specified time, measured in
wall-clock time, consumed process time, or combined process+system
time. (Contributed by Guilherme Polo; issue 2240.)
The smtplib module now supports SMTP over SSL thanks to the
addition of the SMTP_SSL class. This class supports an
interface identical to the existing SMTP class.
(Contributed by Monty Taylor.) Both class constructors also have an
optional timeout parameter that specifies a timeout for the
initial connection attempt, measured in seconds. (Contributed by
Facundo Batista.)
An implementation of the LMTP protocol (RFC 2033) was also added
to the module. LMTP is used in place of SMTP when transferring
e-mail between agents that don’t manage a mail queue. (LMTP
implemented by Leif Hedstrom; issue 957003.)
SMTP.starttls() now complies with RFC 3207 and forgets any
knowledge obtained from the server not obtained from the TLS
negotiation itself. (Patch contributed by Bill Fenner;
issue 829951.)
The socket module now supports TIPC (http://tipc.sf.net),
a high-performance non-IP-based protocol designed for use in clustered
environments. TIPC addresses are 4- or 5-tuples.
(Contributed by Alberto Bertogli; issue 1646.)
A new function, create_connection(), takes an address and
connects to it using an optional timeout value, returning the
connected socket object. This function also looks up the address’s
type and connects to it using IPv4 or IPv6 as appropriate. Changing
your code to use create_connection() instead of
socket(socket.AF_INET,...) may be all that’s required to make
your code work with IPv6.
The base classes in the SocketServer module now support
calling a handle_timeout() method after a span of inactivity
specified by the server’s timeout attribute. (Contributed
by Michael Pomraning.) The serve_forever() method
now takes an optional poll interval measured in seconds,
controlling how often the server will check for a shutdown request.
(Contributed by Pedro Werneck and Jeffrey Yasskin;
issue 742598, issue 1193577.)
The sqlite3 module, maintained by Gerhard Haering,
has been updated from version 2.3.2 in Python 2.5 to
version 2.4.1.
The struct module now supports the C99 _Bool type,
using the format character '?'.
(Contributed by David Remahl.)
The Popen objects provided by the subprocess module
now have terminate(), kill(), and send_signal() methods.
On Windows, send_signal() only supports the SIGTERM
signal, and all these methods are aliases for the Win32 API function
TerminateProcess().
(Contributed by Christian Heimes.)
A new variable in the sys module, float_info, is an
object containing information derived from the float.h file
about the platform’s floating-point support. Attributes of this
object include mant_dig (number of digits in the mantissa),
epsilon (smallest difference between 1.0 and the next
largest value representable), and several others. (Contributed by
Christian Heimes; issue 1534.)
Another new variable, dont_write_bytecode, controls whether Python
writes any .pyc or .pyo files on importing a module.
If this variable is true, the compiled files are not written. The
variable is initially set on start-up by supplying the -B
switch to the Python interpreter, or by setting the
PYTHONDONTWRITEBYTECODE environment variable before
running the interpreter. Python code can subsequently
change the value of this variable to control whether bytecode files
are written or not.
(Contributed by Neal Norwitz and Georg Brandl.)
Information about the command-line arguments supplied to the Python
interpreter is available by reading attributes of a named
tuple available as sys.flags. For example, the verbose
attribute is true if Python
was executed in verbose mode, debug is true in debugging mode, etc.
These attributes are all read-only.
(Contributed by Christian Heimes.)
A new function, getsizeof(), takes a Python object and returns
the amount of memory used by the object, measured in bytes. Built-in
objects return correct results; third-party extensions may not,
but can define a __sizeof__() method to return the
object’s size.
(Contributed by Robert Schuppenies; issue 2898.)
It’s now possible to determine the current profiler and tracer functions
by calling sys.getprofile() and sys.gettrace().
(Contributed by Georg Brandl; issue 1648.)
The tarfile module now supports POSIX.1-2001 (pax) tarfiles in
addition to the POSIX.1-1988 (ustar) and GNU tar formats that were
already supported. The default format is GNU tar; specify the
format parameter to open a file using a different format:
The new encoding and errors parameters specify an encoding and
an error handling scheme for character conversions. 'strict',
'ignore', and 'replace' are the three standard ways Python can
handle errors,;
'utf-8' is a special value that replaces bad characters with
their UTF-8 representation. (Character conversions occur because the
PAX format supports Unicode filenames, defaulting to UTF-8 encoding.)
The TarFile.add() method now accepts an exclude argument that’s
a function that can be used to exclude certain filenames from
an archive.
The function must take a filename and return true if the file
should be excluded or false if it should be archived.
The function is applied to both the name initially passed to add()
and to the names of files in recursively-added directories.
(All changes contributed by Lars Gustäbel).
An optional timeout parameter was added to the
telnetlib.Telnet class constructor, specifying a timeout
measured in seconds. (Added by Facundo Batista.)
The tempfile.NamedTemporaryFile class usually deletes
the temporary file it created when the file is closed. This
behaviour can now be changed by passing delete=False to the
constructor. (Contributed by Damien Miller; issue 1537850.)
A new class, SpooledTemporaryFile, behaves like
a temporary file but stores its data in memory until a maximum size is
exceeded. On reaching that limit, the contents will be written to
an on-disk temporary file. (Contributed by Dustin J. Mitchell.)
The NamedTemporaryFile and SpooledTemporaryFile classes
both work as context managers, so you can write
withtempfile.NamedTemporaryFile()astmp:....
(Contributed by Alexander Belopolsky; issue 2021.)
The test.test_support module gained a number
of context managers useful for writing tests.
EnvironmentVarGuard() is a
context manager that temporarily changes environment variables and
automatically restores them to their old values.
Another context manager, TransientResource, can surround calls
to resources that may or may not be available; it will catch and
ignore a specified list of exceptions. For example,
a network test may ignore certain failures when connecting to an
external web site:
Finally, check_warnings() resets the warning module’s
warning filters and returns an object that will record all warning
messages triggered (issue 3781):
withtest_support.check_warnings()aswrec:warnings.simplefilter("always")# ... code that triggers a warning ...assertstr(wrec.message)=="function is outdated"assertlen(wrec.warnings)==1,"Multiple warnings raised"
(Contributed by Brett Cannon.)
The textwrap module can now preserve existing whitespace
at the beginnings and ends of the newly-created lines
by specifying drop_whitespace=False
as an argument:
>>> S="""This sentence has a bunch of... extra whitespace.""">>> printtextwrap.fill(S,width=15)This sentencehas a bunchof extrawhitespace.>>> printtextwrap.fill(S,drop_whitespace=False,width=15)This sentence has a bunch of extra whitespace.>>>
The threading module API is being changed to use properties
such as daemon instead of setDaemon() and
isDaemon() methods, and some methods have been renamed to use
underscores instead of camel-case; for example, the
activeCount() method is renamed to active_count(). Both
the 2.6 and 3.0 versions of the module support the same properties
and renamed methods, but don’t remove the old methods. No date has been set
for the deprecation of the old APIs in Python 3.x; the old APIs won’t
be removed in any 2.x version.
(Carried out by several people, most notably Benjamin Peterson.)
The threading module’s Thread objects
gained an ident property that returns the thread’s
identifier, a nonzero integer. (Contributed by Gregory P. Smith;
issue 2871.)
The timeit module now accepts callables as well as strings
for the statement being timed and for the setup code.
Two convenience functions were added for creating
Timer instances:
repeat(stmt,setup,time,repeat,number) and
timeit(stmt,setup,time,number) create an instance and call
the corresponding method. (Contributed by Erik Demaine;
issue 1533909.)
The Tkinter module now accepts lists and tuples for options,
separating the elements by spaces before passing the resulting value to
Tcl/Tk.
(Contributed by Guilherme Polo; issue 2906.)
The turtle module for turtle graphics was greatly enhanced by
Gregor Lingl. New features in the module include:
Better animation of turtle movement and rotation.
Control over turtle movement using the new delay(),
tracer(), and speed() methods.
The ability to set new shapes for the turtle, and to
define a new coordinate system.
Turtles now have an undo() method that can roll back actions.
Simple support for reacting to input events such as mouse and keyboard
activity, making it possible to write simple games.
A turtle.cfg file can be used to customize the starting appearance
of the turtle’s screen.
The module’s docstrings can be replaced by new docstrings that have been
translated into another language.
An optional timeout parameter was added to the
urllib.urlopen() function and the
urllib.ftpwrapper class constructor, as well as the
urllib2.urlopen() function. The parameter specifies a timeout
measured in seconds. For example:
The Unicode database provided by the unicodedata module
has been updated to version 5.1.0. (Updated by
Martin von Loewis; issue 3811.)
The warnings module’s formatwarning() and showwarning()
gained an optional line argument that can be used to supply the
line of source code. (Added as part of issue 1631171, which re-implemented
part of the warnings module in C code.)
A new function, catch_warnings(), is a context manager
intended for testing purposes that lets you temporarily modify the
warning filters and then restore their original values (issue 3781).
The XML-RPC SimpleXMLRPCServer and DocXMLRPCServer
classes can now be prevented from immediately opening and binding to
their socket by passing True as the bind_and_activate
constructor parameter. This can be used to modify the instance’s
allow_reuse_address attribute before calling the
server_bind() and server_activate() methods to
open the socket and begin listening for connections.
(Contributed by Peter Parente; issue 1599845.)
SimpleXMLRPCServer also has a _send_traceback_header
attribute; if true, the exception and formatted traceback are returned
as HTTP headers “X-Exception” and “X-Traceback”. This feature is
for debugging purposes only and should not be used on production servers
because the tracebacks might reveal passwords or other sensitive
information. (Contributed by Alan McIntyre as part of his
project for Google’s Summer of Code 2007.)
The xmlrpclib module no longer automatically converts
datetime.date and datetime.time to the
xmlrpclib.DateTime type; the conversion semantics were
not necessarily correct for all applications. Code using
xmlrpclib should convert date and time
instances. (issue 1330538) The code can also handle
dates before 1900 (contributed by Ralf Schmitt; issue 2014)
and 64-bit integers represented by using <i8> in XML-RPC responses
(contributed by Riku Lindblad; issue 2985).
The zipfile module’s ZipFile class now has
extract() and extractall() methods that will unpack
a single file or all the files in the archive to the current directory, or
to a specified directory:
z=zipfile.ZipFile('python-251.zip')# Unpack a single file, writing it relative# to the /tmp directory.z.extract('Python/sysmodule.c','/tmp')# Unpack all the files in the archive.z.extractall()
The open(), read() and extract() methods can now
take either a filename or a ZipInfo object. This is useful when an
archive accidentally contains a duplicated filename.
(Contributed by Graham Horler; issue 1775025.)
Finally, zipfile now supports using Unicode filenames
for archived files. (Contributed by Alexey Borzenkov; issue 1734346.)
The ast module provides an Abstract Syntax Tree
representation of Python code, and Armin Ronacher
contributed a set of helper functions that perform a variety of
common tasks. These will be useful for HTML templating
packages, code analyzers, and similar tools that process
Python code.
The parse() function takes an expression and returns an AST.
The dump() function outputs a representation of a tree, suitable
for debugging:
importastt=ast.parse("""d = {}for i in 'abcdefghijklm': d[i + i] = ord(i) - ord('a') + 1print d""")printast.dump(t)
The literal_eval() method takes a string or an AST
representing a literal expression, parses and evaluates it, and
returns the resulting value. A literal expression is a Python
expression containing only strings, numbers, dictionaries,
etc. but no statements or function calls. If you need to
evaluate an expression but cannot accept the security risk of using an
eval() call, literal_eval() will handle it safely:
The module also includes NodeVisitor and
NodeTransformer classes for traversing and modifying an AST,
and functions for common transformations such as changing line
numbers.
Python 3.0 makes many changes to the repertoire of built-in
functions, and most of the changes can’t be introduced in the Python
2.x series because they would break compatibility.
The future_builtins module provides versions
of these built-in functions that can be imported when writing
3.0-compatible code.
The functions in this module currently include:
ascii(obj): equivalent to repr(). In Python 3.0,
repr() will return a Unicode string, while ascii() will
return a pure ASCII bytestring.
filter(predicate,iterable),
map(func,iterable1,...): the 3.0 versions
return iterators, unlike the 2.x builtins which return lists.
hex(value), oct(value): instead of calling the
__hex__() or __oct__() methods, these versions will
call the __index__() method and convert the result to hexadecimal
or octal. oct() will use the new 0o notation for its
result.
The new json module supports the encoding and decoding of Python types in
JSON (Javascript Object Notation). JSON is a lightweight interchange format
often used in web applications. For more information about JSON, see
http://www.json.org.
json comes with support for decoding and encoding most built-in Python
types. The following example encodes and decodes a dictionary:
>>> importjson>>> data={"spam":"foo","parrot":42}>>> in_json=json.dumps(data)# Encode the data>>> in_json'{"parrot": 42, "spam": "foo"}'>>> json.loads(in_json)# Decode into a Python object{"spam" : "foo", "parrot" : 42}
It’s also possible to write your own decoders and encoders to support
more types. Pretty-printing of the JSON strings is also supported.
json (originally called simplejson) was written by Bob
Ippolito.
The .plist format is commonly used on Mac OS X to
store basic data types (numbers, strings, lists,
and dictionaries) by serializing them into an XML-based format.
It resembles the XML-RPC serialization of data types.
Despite being primarily used on Mac OS X, the format
has nothing Mac-specific about it and the Python implementation works
on any platform that Python supports, so the plistlib module
has been promoted to the standard library.
Using the module is simple:
importsysimportplistlibimportdatetime# Create data structuredata_struct=dict(lastAccessed=datetime.datetime.now(),version=1,categories=('Personal','Shared','Private'))# Create string containing XML.plist_str=plistlib.writePlistToString(data_struct)new_struct=plistlib.readPlistFromString(plist_str)printdata_structprintnew_struct# Write data structure to a file and read it back.plistlib.writePlist(data_struct,'/tmp/customizations.plist')new_struct=plistlib.readPlist('/tmp/customizations.plist')# read/writePlist accepts file-like objects as well as paths.plistlib.writePlist(data_struct,sys.stdout)
Thomas Heller continued to maintain and enhance the
ctypes module.
ctypes now supports a c_bool datatype
that represents the C99 bool type. (Contributed by David Remahl;
issue 1649190.)
The ctypes string, buffer and array types have improved
support for extended slicing syntax,
where various combinations of (start,stop,step) are supplied.
(Implemented by Thomas Wouters.)
All ctypes data types now support
from_buffer() and from_buffer_copy()
methods that create a ctypes instance based on a
provided buffer object. from_buffer_copy() copies
the contents of the object,
while from_buffer() will share the same memory area.
A new calling convention tells ctypes to clear the errno or
Win32 LastError variables at the outset of each wrapped call.
(Implemented by Thomas Heller; issue 1798.)
You can now retrieve the Unix errno variable after a function
call. When creating a wrapped function, you can supply
use_errno=True as a keyword parameter to the DLL() function
and then call the module-level methods set_errno() and
get_errno() to set and retrieve the error value.
The Win32 LastError variable is similarly supported by
the DLL(), OleDLL(), and WinDLL() functions.
You supply use_last_error=True as a keyword parameter
and then call the module-level methods set_last_error()
and get_last_error().
The byref() function, used to retrieve a pointer to a ctypes
instance, now has an optional offset parameter that is a byte
count that will be added to the returned pointer.
Bill Janssen made extensive improvements to Python 2.6’s support for
the Secure Sockets Layer by adding a new module, ssl, that’s
built atop the OpenSSL library.
This new module provides more control over the protocol negotiated,
the X.509 certificates used, and has better support for writing SSL
servers (as opposed to clients) in Python. The existing SSL support
in the socket module hasn’t been removed and continues to work,
though it will be removed in Python 3.0.
To use the new module, you must first create a TCP connection in the
usual way and then pass it to the ssl.wrap_socket() function.
It’s possible to specify whether a certificate is required, and to
obtain certificate info by calling the getpeercert() method.
String exceptions have been removed. Attempting to use them raises a
TypeError.
Changes to the Exception interface
as dictated by PEP 352 continue to be made. For 2.6,
the message attribute is being deprecated in favor of the
args attribute.
(3.0-warning mode) Python 3.0 will feature a reorganized standard
library that will drop many outdated modules and rename others.
Python 2.6 running in 3.0-warning mode will warn about these modules
when they are imported.
The list of deprecated modules is:
audiodev,
bgenlocations,
buildtools,
bundlebuilder,
Canvas,
compiler,
dircache,
dl,
fpformat,
gensuitemodule,
ihooks,
imageop,
imgfile,
linuxaudiodev,
mhlib,
mimetools,
multifile,
new,
pure,
statvfs,
sunaudiodev,
test.testall, and
toaiff.
The gopherlib module has been removed.
The MimeWriter module and mimify module
have been deprecated; use the email
package instead.
The md5 module has been deprecated; use the hashlib module
instead.
The posixfile module has been deprecated; fcntl.lockf()
provides better locking.
The popen2 module has been deprecated; use the subprocess
module.
The rgbimg module has been removed.
The sets module has been deprecated; it’s better to
use the built-in set and frozenset types.
The sha module has been deprecated; use the hashlib module
instead.
Changes to Python’s build process and to the C API include:
Python now must be compiled with C89 compilers (after 19
years!). This means that the Python source tree has dropped its
own implementations of memmove() and strerror(), which
are in the C89 standard library.
Python 2.6 can be built with Microsoft Visual Studio 2008 (version
9.0), and this is the new default compiler. See the
PCbuild directory for the build files. (Implemented by
Christian Heimes.)
On Mac OS X, Python 2.6 can be compiled as a 4-way universal build.
The configure script
can take a --with-universal-archs=[32-bit|64-bit|all]
switch, controlling whether the binaries are built for 32-bit
architectures (x86, PowerPC), 64-bit (x86-64 and PPC-64), or both.
(Contributed by Ronald Oussoren.)
The BerkeleyDB module now has a C API object, available as
bsddb.db.api. This object can be used by other C extensions
that wish to use the bsddb module for their own purposes.
(Contributed by Duncan Grisby.)
Python’s use of the C stdio library is now thread-safe, or at least
as thread-safe as the underlying library is. A long-standing potential
bug occurred if one thread closed a file object while another thread
was reading from or writing to the object. In 2.6 file objects
have a reference count, manipulated by the
PyFile_IncUseCount() and PyFile_DecUseCount()
functions. File objects can’t be closed unless the reference count
is zero. PyFile_IncUseCount() should be called while the GIL
is still held, before carrying out an I/O operation using the
FILE* pointer, and PyFile_DecUseCount() should be called
immediately after the GIL is re-acquired.
(Contributed by Antoine Pitrou and Gregory P. Smith.)
Importing modules simultaneously in two different threads no longer
deadlocks; it will now raise an ImportError. A new API
function, PyImport_ImportModuleNoBlock(), will look for a
module in sys.modules first, then try to import it after
acquiring an import lock. If the import lock is held by another
thread, an ImportError is raised.
(Contributed by Christian Heimes.)
Several functions return information about the platform’s
floating-point support. PyFloat_GetMax() returns
the maximum representable floating point value,
and PyFloat_GetMin() returns the minimum
positive value. PyFloat_GetInfo() returns an object
containing more information from the float.h file, such as
"mant_dig" (number of digits in the mantissa), "epsilon"
(smallest difference between 1.0 and the next largest value
representable), and several others.
(Contributed by Christian Heimes; issue 1534.)
C functions and methods that use
PyComplex_AsCComplex() will now accept arguments that
have a __complex__() method. In particular, the functions in the
cmath module will now accept objects with this method.
This is a backport of a Python 3.0 change.
(Contributed by Mark Dickinson; issue 1675423.)
Python’s C API now includes two functions for case-insensitive string
comparisons, PyOS_stricmp(char*,char*)
and PyOS_strnicmp(char*,char*,Py_ssize_t).
(Contributed by Christian Heimes; issue 1635.)
Many C extensions define their own little macro for adding
integers and strings to the module’s dictionary in the
init* function. Python 2.6 finally defines standard macros
for adding values to a module, PyModule_AddStringMacro
and PyModule_AddIntMacro(). (Contributed by
Christian Heimes.)
Some macros were renamed in both 3.0 and 2.6 to make it clearer that
they are macros,
not functions. Py_Size() became Py_SIZE(),
Py_Type() became Py_TYPE(), and
Py_Refcnt() became Py_REFCNT().
The mixed-case macros are still available
in Python 2.6 for backward compatibility.
(issue 1629)
Distutils now places C extensions it builds in a
different directory when running on a debug version of Python.
(Contributed by Collin Winter; issue 1530959.)
Several basic data types, such as integers and strings, maintain
internal free lists of objects that can be re-used. The data
structures for these free lists now follow a naming convention: the
variable is always named free_list, the counter is always named
numfree, and a macro Py<typename>_MAXFREELIST is
always defined.
A new Makefile target, “make patchcheck”, prepares the Python source tree
for making a patch: it fixes trailing whitespace in all modified
.py files, checks whether the documentation has been changed,
and reports whether the Misc/ACKS and Misc/NEWS files
have been updated.
(Contributed by Brett Cannon.)
Another new target, “make profile-opt”, compiles a Python binary
using GCC’s profile-guided optimization. It compiles Python with
profiling enabled, runs the test suite to obtain a set of profiling
results, and then compiles using these results for optimization.
(Contributed by Gregory P. Smith.)
The support for Windows 95, 98, ME and NT4 has been dropped.
Python 2.6 requires at least Windows 2000 SP4.
The new default compiler on Windows is Visual Studio 2008 (version
9.0). The build directories for Visual Studio 2003 (version 7.1) and
2005 (version 8.0) were moved into the PC/ directory. The new
PCbuild directory supports cross compilation for X64, debug
builds and Profile Guided Optimization (PGO). PGO builds are roughly
10% faster than normal builds. (Contributed by Christian Heimes
with help from Amaury Forgeot d’Arc and Martin von Loewis.)
The msvcrt module now supports
both the normal and wide char variants of the console I/O
API. The getwch() function reads a keypress and returns a Unicode
value, as does the getwche() function. The putwch() function
takes a Unicode character and writes it to the console.
(Contributed by Christian Heimes.)
os.path.expandvars() will now expand environment variables in
the form “%var%”, and “~user” will be expanded into the user’s home
directory path. (Contributed by Josiah Carlson; issue 957650.)
The socket module’s socket objects now have an
ioctl() method that provides a limited interface to the
WSAIoctl() system interface.
The _winreg module now has a function,
ExpandEnvironmentStrings(),
that expands environment variable references such as %NAME%
in an input string. The handle objects provided by this
module now support the context protocol, so they can be used
in with statements. (Contributed by Christian Heimes.)
_winreg also has better support for x64 systems,
exposing the DisableReflectionKey(), EnableReflectionKey(),
and QueryReflectionKey() functions, which enable and disable
registry reflection for 32-bit processes running on 64-bit systems.
(issue 1753245)
The msilib module’s Record object
gained GetInteger() and GetString() methods that
return field values as an integer or a string.
(Contributed by Floris Bruynooghe; issue 2125.)
When compiling a framework build of Python, you can now specify the
framework name to be used by providing the
--with-framework-name= option to the
configure script.
The macfs module has been removed. This in turn required the
macostools.touched() function to be removed because it depended on the
macfs module. (issue 1490190)
Many other Mac OS modules have been deprecated and will removed in
Python 3.0:
_builtinSuites,
aepack,
aetools,
aetypes,
applesingle,
appletrawmain,
appletrunner,
argvemulator,
Audio_mac,
autoGIL,
Carbon,
cfmfile,
CodeWarrior,
ColorPicker,
EasyDialogs,
Explorer,
Finder,
FrameWork,
findertools,
ic,
icglue,
icopen,
macerrors,
MacOS,
macfs,
macostools,
macresource,
MiniAEFrame,
Nav,
Netscape,
OSATerminology,
pimp,
PixMapWrapper,
StdSuites,
SystemEvents,
Terminal, and
terminalcommand.
A number of old IRIX-specific modules were deprecated and will
be removed in Python 3.0:
al and AL,
cd,
cddb,
cdplayer,
CL and cl,
DEVICE,
ERRNO,
FILE,
FL and fl,
flp,
fm,
GET,
GLWS,
GL and gl,
IN,
IOCTL,
jpeg,
panelparser,
readcd,
SV and sv,
torgb,
videoreader, and
WAIT.
This section lists previously described changes and other bugfixes
that may require changes to your code:
Classes that aren’t supposed to be hashable should
set __hash__=None in their definitions to indicate
the fact.
String exceptions have been removed. Attempting to use them raises a
TypeError.
The __init__() method of collections.deque
now clears any existing contents of the deque
before adding elements from the iterable. This change makes the
behavior match list.__init__().
object.__init__() previously accepted arbitrary arguments and
keyword arguments, ignoring them. In Python 2.6, this is no longer
allowed and will result in a TypeError. This will affect
__init__() methods that end up calling the corresponding
method on object (perhaps through using super()).
See issue 1683368 for discussion.
The Decimal constructor now accepts leading and trailing
whitespace when passed a string. Previously it would raise an
InvalidOperation exception. On the other hand, the
create_decimal() method of Context objects now
explicitly disallows extra whitespace, raising a
ConversionSyntax exception.
Due to an implementation accident, if you passed a file path to
the built-in __import__() function, it would actually import
the specified file. This was never intended to work, however, and
the implementation now explicitly checks for this case and raises
an ImportError.
C API: the PyImport_Import() and PyImport_ImportModule()
functions now default to absolute imports, not relative imports.
This will affect C extensions that import other modules.
C API: extension data types that shouldn’t be hashable
should define their tp_hash slot to
PyObject_HashNotImplemented().
The socket module exception socket.error now inherits
from IOError. Previously it wasn’t a subclass of
StandardError but now it is, through IOError.
(Implemented by Gregory P. Smith; issue 1706815.)
The xmlrpclib module no longer automatically converts
datetime.date and datetime.time to the
xmlrpclib.DateTime type; the conversion semantics were
not necessarily correct for all applications. Code using
xmlrpclib should convert date and time
instances. (issue 1330538)
(3.0-warning mode) The Exception class now warns
when accessed using slicing or index access; having
Exception behave like a tuple is being phased out.
(3.0-warning mode) inequality comparisons between two dictionaries
or two objects that don’t implement comparison methods are reported
as warnings. dict1==dict2 still works, but dict1<dict2
is being phased out.
Comparisons between cells, which are an implementation detail of Python’s
scoping rules, also cause warnings because such comparisons are forbidden
entirely in 3.0.
The author would like to thank the following people for offering
suggestions, corrections and assistance with various drafts of this
article: Georg Brandl, Steve Brown, Nick Coghlan, Ralph Corderoy,
Jim Jewett, Kent Johnson, Chris Lambacher, Martin Michlmayr,
Antoine Pitrou, Brian Warner.
This article explains the new features in Python 2.5. The final release of
Python 2.5 is scheduled for August 2006; PEP 356 describes the planned
release schedule.
The changes in Python 2.5 are an interesting mix of language and library
improvements. The library enhancements will be more important to Python’s user
community, I think, because several widely-useful packages were added. New
modules include ElementTree for XML processing (xml.etree),
the SQLite database module (sqlite), and the ctypes
module for calling C functions.
The language changes are of middling significance. Some pleasant new features
were added, but most of them aren’t features that you’ll use every day.
Conditional expressions were finally added to the language using a novel syntax;
see section PEP 308: Conditional Expressions. The new ‘with‘ statement will make
writing cleanup code easier (section PEP 343: The ‘with’ statement). Values can now be passed
into generators (section PEP 342: New Generator Features). Imports are now visible as either
absolute or relative (section PEP 328: Absolute and Relative Imports). Some corner cases of exception
handling are handled better (section PEP 341: Unified try/except/finally). All these improvements
are worthwhile, but they’re improvements to one specific language feature or
another; none of them are broad modifications to Python’s semantics.
As well as the language and library additions, other improvements and bugfixes
were made throughout the source tree. A search through the SVN change logs
finds there were 353 patches applied and 458 bugs fixed between Python 2.4 and
2.5. (Both figures are likely to be underestimates.)
This article doesn’t try to be a complete specification of the new features;
instead changes are briefly introduced using helpful examples. For full
details, you should always refer to the documentation for Python 2.5 at
http://docs.python.org. If you want to understand the complete implementation
and design rationale, refer to the PEP for a particular new feature.
Comments, suggestions, and error reports for this document are welcome; please
e-mail them to the author or open a bug in the Python bug tracker.
For a long time, people have been requesting a way to write conditional
expressions, which are expressions that return value A or value B depending on
whether a Boolean value is true or false. A conditional expression lets you
write a single assignment statement that has the same effect as the following:
ifcondition:x=true_valueelse:x=false_value
There have been endless tedious discussions of syntax on both python-dev and
comp.lang.python. A vote was even held that found the majority of voters wanted
conditional expressions in some form, but there was no syntax that was preferred
by a clear majority. Candidates included C’s cond?true_v:false_v, ifcondthentrue_velsefalse_v, and 16 other variations.
Guido van Rossum eventually chose a surprising syntax:
x=true_valueifconditionelsefalse_value
Evaluation is still lazy as in existing Boolean expressions, so the order of
evaluation jumps around a bit. The condition expression in the middle is
evaluated first, and the true_value expression is evaluated only if the
condition was true. Similarly, the false_value expression is only evaluated
when the condition is false.
This syntax may seem strange and backwards; why does the condition go in the
middle of the expression, and not in the front as in C’s c?x:y? The
decision was checked by applying the new syntax to the modules in the standard
library and seeing how the resulting code read. In many cases where a
conditional expression is used, one value seems to be the ‘common case’ and one
value is an ‘exceptional case’, used only on rarer occasions when the condition
isn’t met. The conditional syntax makes this pattern a bit more obvious:
contents=((doc+'\n')ifdocelse'')
I read the above statement as meaning “here contents is usually assigned a
value of doc+'\n'; sometimes doc is empty, in which special case an empty
string is returned.” I doubt I will use conditional expressions very often
where there isn’t a clear common and uncommon case.
There was some discussion of whether the language should require surrounding
conditional expressions with parentheses. The decision was made to not
require parentheses in the Python language’s grammar, but as a matter of style I
think you should always use them. Consider these two statements:
# First version -- no parenslevel=1ifloggingelse0# Second version -- with parenslevel=(1ifloggingelse0)
In the first version, I think a reader’s eye might group the statement into
‘level = 1’, ‘if logging’, ‘else 0’, and think that the condition decides
whether the assignment to level is performed. The second version reads
better, in my opinion, because it makes it clear that the assignment is always
performed and the choice is being made between two values.
Another reason for including the brackets: a few odd combinations of list
comprehensions and lambdas could look like incorrect conditional expressions.
See PEP 308 for some examples. If you put parentheses around your
conditional expressions, you won’t run into this case.
The functools module is intended to contain tools for functional-style
programming.
One useful tool in this module is the partial() function. For programs
written in a functional style, you’ll sometimes want to construct variants of
existing functions that have some of the parameters filled in. Consider a
Python function f(a,b,c); you could create a new function g(b,c) that
was equivalent to f(1,b,c). This is called “partial function
application”.
partial() takes the arguments (function,arg1,arg2,...kwarg1=value1,kwarg2=value2). The resulting object is callable, so you can just call it to
invoke function with the filled-in arguments.
Here’s a small but realistic example:
importfunctoolsdeflog(message,subsystem):"Write the contents of 'message' to the specified subsystem."print'%s: %s'%(subsystem,message)...server_log=functools.partial(log,subsystem='server')server_log('Unable to open socket')
Here’s another example, from a program that uses PyGTK. Here a context-
sensitive pop-up menu is being constructed dynamically. The callback provided
for the menu option is a partially applied version of the open_item()
method, where the first argument has been provided.
Another function in the functools module is the
update_wrapper(wrapper,wrapped)() function that helps you write well-
behaved decorators. update_wrapper() copies the name, module, and
docstring attribute to a wrapper function so that tracebacks inside the wrapped
function are easier to understand. For example, you might write:
wraps() is a decorator that can be used inside your own decorators to copy
the wrapped function’s information. An alternate version of the previous
example would be:
PEP proposed and written by Peter Harris; implemented by Hye-Shik Chang and Nick
Coghlan, with adaptations by Raymond Hettinger.
PEP 314: Metadata for Python Software Packages v1.1¶
Some simple dependency support was added to Distutils. The setup()
function now has requires, provides, and obsoletes keyword
parameters. When you build a source distribution using the sdist command,
the dependency information will be recorded in the PKG-INFO file.
Another new keyword parameter is download_url, which should be set to a URL
for the package’s source code. This means it’s now possible to look up an entry
in the package index, determine the dependencies for a package, and download the
required packages.
Another new enhancement to the Python package index at
http://cheeseshop.python.org is storing source and binary archives for a
package. The new upload Distutils command will upload a package to
the repository.
Before a package can be uploaded, you must be able to build a distribution using
the sdist Distutils command. Once that works, you can run pythonsetup.pyupload to add your package to the PyPI archive. Optionally you can
GPG-sign the package by supplying the --sign and --identity
options.
Package uploading was implemented by Martin von Löwis and Richard Jones.
See also
PEP 314 - Metadata for Python Software Packages v1.1
PEP proposed and written by A.M. Kuchling, Richard Jones, and Fred Drake;
implemented by Richard Jones and Fred Drake.
The simpler part of PEP 328 was implemented in Python 2.4: parentheses could now
be used to enclose the names imported from a module using the from...import... statement, making it easier to import many different names.
The more complicated part has been implemented in Python 2.5: importing a module
can be specified to use absolute or package-relative imports. The plan is to
move toward making absolute imports the default in future versions of Python.
Let’s say you have a package directory like this:
pkg/pkg/__init__.pypkg/main.pypkg/string.py
This defines a package named pkg containing the pkg.main and
pkg.string submodules.
Consider the code in the main.py module. What happens if it executes
the statement importstring? In Python 2.4 and earlier, it will first look
in the package’s directory to perform a relative import, finds
pkg/string.py, imports the contents of that file as the
pkg.string module, and that module is bound to the name string in the
pkg.main module’s namespace.
That’s fine if pkg.string was what you wanted. But what if you wanted
Python’s standard string module? There’s no clean way to ignore
pkg.string and look for the standard module; generally you had to look at
the contents of sys.modules, which is slightly unclean. Holger Krekel’s
py.std package provides a tidier way to perform imports from the standard
library, importpy;py.std.string.join(), but that package isn’t available
on all Python installations.
Reading code which relies on relative imports is also less clear, because a
reader may be confused about which module, string or pkg.string,
is intended to be used. Python users soon learned not to duplicate the names of
standard library modules in the names of their packages’ submodules, but you
can’t protect against having your submodule’s name being used for a new module
added in a future version of Python.
In Python 2.5, you can switch import‘s behaviour to absolute imports
using a from__future__importabsolute_import directive. This absolute-
import behaviour will become the default in a future version (probably Python
2.7). Once absolute imports are the default, importstring will always
find the standard library’s version. It’s suggested that users should begin
using absolute imports as much as possible, so it’s preferable to begin writing
frompkgimportstring in your code.
Relative imports are still possible by adding a leading period to the module
name when using the from...import form:
# Import names from pkg.stringfrom.stringimportname1,name2# Import pkg.stringfrom.importstring
This imports the string module relative to the current package, so in
pkg.main this will import name1 and name2 from pkg.string.
Additional leading periods perform the relative import starting from the parent
of the current package. For example, code in the A.B.C module can do:
The -m switch added in Python 2.4 to execute a module as a script
gained a few more abilities. Instead of being implemented in C code inside the
Python interpreter, the switch now uses an implementation in a new module,
runpy.
The runpy module implements a more sophisticated import mechanism so that
it’s now possible to run modules in a package such as pychecker.checker.
The module also supports alternative import mechanisms such as the
zipimport module. This means you can add a .zip archive’s path to
sys.path and then use the -m switch to execute code from the
archive.
Until Python 2.5, the try statement came in two flavours. You could
use a finally block to ensure that code is always executed, or one or
more except blocks to catch specific exceptions. You couldn’t
combine both except blocks and a finally block, because
generating the right bytecode for the combined version was complicated and it
wasn’t clear what the semantics of the combined statement should be.
Guido van Rossum spent some time working with Java, which does support the
equivalent of combining except blocks and a finally block,
and this clarified what the statement should mean. In Python 2.5, you can now
write:
The code in block-1 is executed. If the code raises an exception, the various
except blocks are tested: if the exception is of class
Exception1, handler-1 is executed; otherwise if it’s of class
Exception2, handler-2 is executed, and so forth. If no exception is
raised, the else-block is executed.
No matter what happened previously, the final-block is executed once the code
block is complete and any raised exceptions handled. Even if there’s an error in
an exception handler or the else-block and a new exception is raised, the code
in the final-block is still run.
Python 2.5 adds a simple way to pass values into a generator. As introduced in
Python 2.3, generators only produce output; once a generator’s code was invoked
to create an iterator, there was no way to pass any new information into the
function when its execution is resumed. Sometimes the ability to pass in some
information would be useful. Hackish solutions to this include making the
generator’s code look at a global variable and then changing the global
variable’s value, or passing in some mutable object that callers then modify.
To refresh your memory of basic generators, here’s a simple example:
defcounter(maximum):i=0whilei<maximum:yieldii+=1
When you call counter(10), the result is an iterator that returns the values
from 0 up to 9. On encountering the yield statement, the iterator
returns the provided value and suspends the function’s execution, preserving the
local variables. Execution resumes on the following call to the iterator’s
next() method, picking up after the yield statement.
In Python 2.3, yield was a statement; it didn’t return any value. In
2.5, yield is now an expression, returning a value that can be
assigned to a variable or otherwise operated on:
val=(yieldi)
I recommend that you always put parentheses around a yield expression
when you’re doing something with the returned value, as in the above example.
The parentheses aren’t always necessary, but it’s easier to always add them
instead of having to remember when they’re needed.
(PEP 342 explains the exact rules, which are that a yield-expression must always be parenthesized except when it occurs at the top-level
expression on the right-hand side of an assignment. This means you can write
val=yieldi but have to use parentheses when there’s an operation, as in
val=(yieldi)+12.)
Values are sent into a generator by calling its send(value)() method. The
generator’s code is then resumed and the yield expression returns the
specified value. If the regular next() method is called, the
yield returns None.
Here’s the previous example, modified to allow changing the value of the
internal counter.
defcounter(maximum):i=0whilei<maximum:val=(yieldi)# If value provided, change counterifvalisnotNone:i=valelse:i+=1
And here’s an example of changing the counter:
>>> it=counter(10)>>> printit.next()0>>> printit.next()1>>> printit.send(8)8>>> printit.next()9>>> printit.next()Traceback (most recent call last):
File "t.py", line 15, in ?printit.next()StopIteration
yield will usually return None, so you should always check
for this case. Don’t just use its value in expressions unless you’re sure that
the send() method will be the only method used to resume your generator
function.
In addition to send(), there are two other new methods on generators:
throw(type,value=None,traceback=None)() is used to raise an exception
inside the generator; the exception is raised by the yield expression
where the generator’s execution is paused.
close() raises a new GeneratorExit exception inside the generator
to terminate the iteration. On receiving this exception, the generator’s code
must either raise GeneratorExit or StopIteration. Catching the
GeneratorExit exception and returning a value is illegal and will trigger
a RuntimeError; if the function raises some other exception, that
exception is propagated to the caller. close() will also be called by
Python’s garbage collector when the generator is garbage-collected.
If you need to run cleanup code when a GeneratorExit occurs, I suggest
using a try:...finally: suite instead of catching GeneratorExit.
The cumulative effect of these changes is to turn generators from one-way
producers of information into both producers and consumers.
Generators also become coroutines, a more generalized form of subroutines.
Subroutines are entered at one point and exited at another point (the top of the
function, and a return statement), but coroutines can be entered,
exited, and resumed at many different points (the yield statements).
We’ll have to figure out patterns for using coroutines effectively in Python.
The addition of the close() method has one side effect that isn’t obvious.
close() is called when a generator is garbage-collected, so this means the
generator’s code gets one last chance to run before the generator is destroyed.
This last chance means that try...finally statements in generators can now
be guaranteed to work; the finally clause will now always get a
chance to run. The syntactic restriction that you couldn’t mix yield
statements with a try...finally suite has therefore been removed. This
seems like a minor bit of language trivia, but using generators and
try...finally is actually necessary in order to implement the
with statement described by PEP 343. I’ll look at this new statement
in the following section.
Another even more esoteric effect of this change: previously, the
gi_frame attribute of a generator was always a frame object. It’s now
possible for gi_frame to be None once the generator has been
exhausted.
The ‘with‘ statement clarifies code that previously would use
try...finally blocks to ensure that clean-up code is executed. In this
section, I’ll discuss the statement as it will commonly be used. In the next
section, I’ll examine the implementation details and show how to write objects
for use with this statement.
The ‘with‘ statement is a new control-flow structure whose basic
structure is:
withexpression[asvariable]:with-block
The expression is evaluated, and it should result in an object that supports the
context management protocol (that is, has __enter__() and __exit__()
methods.
The object’s __enter__() is called before with-block is executed and
therefore can run set-up code. It also may return a value that is bound to the
name variable, if given. (Note carefully that variable is not assigned
the result of expression.)
After execution of the with-block is finished, the object’s __exit__()
method is called, even if the block raised an exception, and can therefore run
clean-up code.
To enable the statement in Python 2.5, you need to add the following directive
to your module:
from__future__importwith_statement
The statement will always be enabled in Python 2.6.
Some standard Python objects now support the context management protocol and can
be used with the ‘with‘ statement. File objects are one example:
After this statement has executed, the file object in f will have been
automatically closed, even if the for loop raised an exception part-
way through the block.
Note
In this case, f is the same object created by open(), because
file.__enter__() returns self.
The threading module’s locks and condition variables also support the
‘with‘ statement:
lock=threading.Lock()withlock:# Critical section of code...
The lock is acquired before the block is executed and always released once the
block is complete.
The new localcontext() function in the decimal module makes it easy
to save and restore the current decimal context, which encapsulates the desired
precision and rounding characteristics for computations:
fromdecimalimportDecimal,Context,localcontext# Displays with default precision of 28 digitsv=Decimal('578')printv.sqrt()withlocalcontext(Context(prec=16)):# All code in this block uses a precision of 16 digits.# The original context is restored on exiting the block.printv.sqrt()
Under the hood, the ‘with‘ statement is fairly complicated. Most
people will only use ‘with‘ in company with existing objects and
don’t need to know these details, so you can skip the rest of this section if
you like. Authors of new objects will need to understand the details of the
underlying implementation and should keep reading.
A high-level explanation of the context management protocol is:
The expression is evaluated and should result in an object called a “context
manager”. The context manager must have __enter__() and __exit__()
methods.
The context manager’s __enter__() method is called. The value returned
is assigned to VAR. If no 'asVAR' clause is present, the value is simply
discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the __exit__(type,value,traceback)()
is called with the exception details, the same values returned by
sys.exc_info(). The method’s return value controls whether the exception
is re-raised: any false value re-raises the exception, and True will result
in suppressing it. You’ll only rarely want to suppress the exception, because
if you do the author of the code containing the ‘with‘ statement will
never realize anything went wrong.
If BLOCK didn’t raise an exception, the __exit__() method is still
called, but type, value, and traceback are all None.
Let’s think through an example. I won’t present detailed code but will only
sketch the methods necessary for a database that supports transactions.
(For people unfamiliar with database terminology: a set of changes to the
database are grouped into a transaction. Transactions can be either committed,
meaning that all the changes are written into the database, or rolled back,
meaning that the changes are all discarded and the database is unchanged. See
any database textbook for more information.)
Let’s assume there’s an object representing a database connection. Our goal will
be to let the user write code like this:
db_connection=DatabaseConnection()withdb_connectionascursor:cursor.execute('insert into ...')cursor.execute('delete from ...')# ... more operations ...
The transaction should be committed if the code in the block runs flawlessly or
rolled back if there’s an exception. Here’s the basic interface for
DatabaseConnection that I’ll assume:
classDatabaseConnection:# Database interfacedefcursor(self):"Returns a cursor object and starts a new transaction"defcommit(self):"Commits current transaction"defrollback(self):"Rolls back current transaction"
The __enter__() method is pretty easy, having only to start a new
transaction. For this application the resulting cursor object would be a useful
result, so the method will return it. The user can then add ascursor to
their ‘with‘ statement to bind the cursor to a variable name.
classDatabaseConnection:...def__enter__(self):# Code to start a new transactioncursor=self.cursor()returncursor
The __exit__() method is the most complicated because it’s where most of
the work has to be done. The method has to check if an exception occurred. If
there was no exception, the transaction is committed. The transaction is rolled
back if there was an exception.
In the code below, execution will just fall off the end of the function,
returning the default value of None. None is false, so the exception
will be re-raised automatically. If you wished, you could be more explicit and
add a return statement at the marked location.
classDatabaseConnection:...def__exit__(self,type,value,tb):iftbisNone:# No exception, so commitself.commit()else:# Exception occurred, so rollback.self.rollback()# return False
The new contextlib module provides some functions and a decorator that
are useful for writing objects for use with the ‘with‘ statement.
The decorator is called contextmanager(), and lets you write a single
generator function instead of defining a new class. The generator should yield
exactly one value. The code up to the yield will be executed as the
__enter__() method, and the value yielded will be the method’s return
value that will get bound to the variable in the ‘with‘ statement’s
as clause, if any. The code after the yield will be
executed in the __exit__() method. Any exception raised in the block will
be raised by the yield statement.
Our database example from the previous section could be written using this
decorator as:
The contextlib module also has a nested(mgr1,mgr2,...)() function
that combines a number of context managers so you don’t need to write nested
‘with‘ statements. In this example, the single ‘with‘
statement both starts a database transaction and acquires a thread lock:
PEP written by Guido van Rossum and Nick Coghlan; implemented by Mike Bland,
Guido van Rossum, and Neal Norwitz. The PEP shows the code generated for a
‘with‘ statement, which can be helpful in learning how the statement
works.
Exception classes can now be new-style classes, not just classic classes, and
the built-in Exception class and all the standard built-in exceptions
(NameError, ValueError, etc.) are now new-style classes.
The inheritance hierarchy for exceptions has been rearranged a bit. In 2.5, the
inheritance relationships are:
BaseException# New in Python 2.5|-KeyboardInterrupt|-SystemExit|-Exception|-(allothercurrentbuilt-inexceptions)
This rearrangement was done because people often want to catch all exceptions
that indicate program errors. KeyboardInterrupt and SystemExit
aren’t errors, though, and usually represent an explicit action such as the user
hitting Control-C or code calling sys.exit(). A bare except: will
catch all exceptions, so you commonly need to list KeyboardInterrupt and
SystemExit in order to re-raise them. The usual pattern is:
In Python 2.5, you can now write exceptException to achieve the same
result, catching all the exceptions that usually indicate errors but leaving
KeyboardInterrupt and SystemExit alone. As in previous versions,
a bare except: still catches all exceptions.
The goal for Python 3.0 is to require any class raised as an exception to derive
from BaseException or some descendant of BaseException, and future
releases in the Python 2.x series may begin to enforce this constraint.
Therefore, I suggest you begin making all your exception classes derive from
Exception now. It’s been suggested that the bare except: form should
be removed in Python 3.0, but Guido van Rossum hasn’t decided whether to do this
or not.
Raising of strings as exceptions, as in the statement raise"Erroroccurred", is deprecated in Python 2.5 and will trigger a warning. The aim is
to be able to remove the string-exception feature in a few releases.
A wide-ranging change to Python’s C API, using a new Py_ssize_t type
definition instead of int, will permit the interpreter to handle more
data on 64-bit platforms. This change doesn’t affect Python’s capacity on 32-bit
platforms.
Various pieces of the Python interpreter used C’s int type to store
sizes or counts; for example, the number of items in a list or tuple were stored
in an int. The C compilers for most 64-bit platforms still define
int as a 32-bit type, so that meant that lists could only hold up to
2**31-1 = 2147483647 items. (There are actually a few different
programming models that 64-bit C compilers can use – see
http://www.unix.org/version2/whatsnew/lp64_wp.html for a discussion – but the
most commonly available model leaves int as 32 bits.)
A limit of 2147483647 items doesn’t really matter on a 32-bit platform because
you’ll run out of memory before hitting the length limit. Each list item
requires space for a pointer, which is 4 bytes, plus space for a
PyObject representing the item. 2147483647*4 is already more bytes
than a 32-bit address space can contain.
It’s possible to address that much memory on a 64-bit platform, however. The
pointers for a list that size would only require 16 GiB of space, so it’s not
unreasonable that Python programmers might construct lists that large.
Therefore, the Python interpreter had to be changed to use some type other than
int, and this will be a 64-bit type on 64-bit platforms. The change
will cause incompatibilities on 64-bit machines, so it was deemed worth making
the transition now, while the number of 64-bit users is still relatively small.
(In 5 or 10 years, we may all be on 64-bit machines, and the transition would
be more painful then.)
This change most strongly affects authors of C extension modules. Python
strings and container types such as lists and tuples now use
Py_ssize_t to store their size. Functions such as
PyList_Size() now return Py_ssize_t. Code in extension modules
may therefore need to have some variables changed to Py_ssize_t.
The PyArg_ParseTuple() and Py_BuildValue() functions have a new
conversion code, n, for Py_ssize_t. PyArg_ParseTuple()‘s
s# and t# still output int by default, but you can define the
macro PY_SSIZE_T_CLEAN before including Python.h to make
them return Py_ssize_t.
PEP 353 has a section on conversion guidelines that extension authors should
read to learn about supporting 64-bit platforms.
The NumPy developers had a problem that could only be solved by adding a new
special method, __index__(). When using slice notation, as in
[start:stop:step], the values of the start, stop, and step indexes
must all be either integers or long integers. NumPy defines a variety of
specialized integer types corresponding to unsigned and signed integers of 8,
16, 32, and 64 bits, but there was no way to signal that these types could be
used as slice indexes.
Slicing can’t just use the existing __int__() method because that method
is also used to implement coercion to integers. If slicing used
__int__(), floating-point numbers would also become legal slice indexes
and that’s clearly an undesirable behaviour.
Instead, a new special method called __index__() was added. It takes no
arguments and returns an integer giving the slice index to use. For example:
classC:def__index__(self):returnself.value
The return value must be either a Python integer or long integer. The
interpreter will check that the type returned is correct, and raises a
TypeError if this requirement isn’t met.
A corresponding nb_index slot was added to the C-level
PyNumberMethods structure to let C extensions implement this protocol.
PyNumber_Index(obj)() can be used in extension code to call the
__index__() function and retrieve its result.
See also
PEP 357 - Allowing Any Object to be Used for Slicing
Here are all of the changes that Python 2.5 makes to the core Python language.
The dict type has a new hook for letting subclasses provide a default
value when a key isn’t contained in the dictionary. When a key isn’t found, the
dictionary’s __missing__(key)() method will be called. This hook is used
to implement the new defaultdict class in the collections
module. The following example defines a dictionary that returns zero for any
missing key:
classzerodict(dict):def__missing__(self,key):return0d=zerodict({1:1,2:2})printd[1],d[2]# Prints 1, 2printd[3],d[4]# Prints 0, 0
Both 8-bit and Unicode strings have new partition(sep)() and
rpartition(sep)() methods that simplify a common use case.
The find(S)() method is often used to get an index which is then used to
slice the string and obtain the pieces that are before and after the separator.
partition(sep)() condenses this pattern into a single method call that
returns a 3-tuple containing the substring before the separator, the separator
itself, and the substring after the separator. If the separator isn’t found,
the first element of the tuple is the entire string and the other two elements
are empty. rpartition(sep)() also returns a 3-tuple but starts searching
from the end of the string; the r stands for ‘reverse’.
(Implemented by Georg Brandl following a suggestion by Tom Lynn.)
The min() and max() built-in functions gained a key keyword
parameter analogous to the key argument for sort(). This parameter
supplies a function that takes a single argument and is called for every value
in the list; min()/max() will return the element with the
smallest/largest return value from this function. For example, to find the
longest string in a list, you can do:
L=['medium','longest','short']# Prints 'longest'printmax(L,key=len)# Prints 'short', because lexicographically 'short' has the largest valueprintmax(L)
(Contributed by Steven Bethard and Raymond Hettinger.)
Two new built-in functions, any() and all(), evaluate whether an
iterator contains any true or false values. any() returns True
if any value returned by the iterator is true; otherwise it will return
False. all() returns True only if all of the values
returned by the iterator evaluate as true. (Suggested by Guido van Rossum, and
implemented by Raymond Hettinger.)
The result of a class’s __hash__() method can now be either a long
integer or a regular integer. If a long integer is returned, the hash of that
value is taken. In earlier versions the hash value was required to be a
regular integer, but in 2.5 the id() built-in was changed to always
return non-negative numbers, and users often seem to use id(self) in
__hash__() methods (though this is discouraged).
ASCII is now the default encoding for modules. It’s now a syntax error if a
module contains string literals with 8-bit characters but doesn’t have an
encoding declaration. In Python 2.4 this triggered a warning, not a syntax
error. See PEP 263 for how to declare a module’s encoding; for example, you
might add a line like this near the top of the source file:
# -*- coding: latin1 -*-
A new warning, UnicodeWarning, is triggered when you attempt to
compare a Unicode string and an 8-bit string that can’t be converted to Unicode
using the default ASCII encoding. The result of the comparison is false:
>>> chr(128)==unichr(128)# Can't convert chr(128) to Unicode__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequalFalse>>> chr(127)==unichr(127)# chr(127) can be convertedTrue
Previously this would raise a UnicodeDecodeError exception, but in 2.5
this could result in puzzling problems when accessing a dictionary. If you
looked up unichr(128) and chr(128) was being used as a key, you’d get a
UnicodeDecodeError exception. Other changes in 2.5 resulted in this
exception being raised instead of suppressed by the code in dictobject.c
that implements dictionaries.
Raising an exception for such a comparison is strictly correct, but the change
might have broken code, so instead UnicodeWarning was introduced.
(Implemented by Marc-André Lemburg.)
One error that Python programmers sometimes make is forgetting to include an
__init__.py module in a package directory. Debugging this mistake can be
confusing, and usually requires running Python with the -v switch to
log all the paths searched. In Python 2.5, a new ImportWarning warning is
triggered when an import would have picked up a directory as a package but no
__init__.py was found. This warning is silently ignored by default;
provide the -Wd option when running the Python executable to display
the warning message. (Implemented by Thomas Wouters.)
The list of base classes in a class definition can now be empty. As an
example, this is now legal:
In the interactive interpreter, quit and exit have long been strings so
that new users get a somewhat helpful message when they try to quit:
>>> quit'Use Ctrl-D (i.e. EOF) to exit.'
In Python 2.5, quit and exit are now objects that still produce string
representations of themselves, but are also callable. Newbies who try quit()
or exit() will now exit the interpreter as they expect. (Implemented by
Georg Brandl.)
The Python executable now accepts the standard long options --help
and --version; on Windows, it also accepts the /? option
for displaying a help message. (Implemented by Georg Brandl.)
Several of the optimizations were developed at the NeedForSpeed sprint, an event
held in Reykjavik, Iceland, from May 21–28 2006. The sprint focused on speed
enhancements to the CPython implementation and was funded by EWT LLC with local
support from CCP Games. Those optimizations added at this sprint are specially
marked in the following list.
When they were introduced in Python 2.4, the built-in set and
frozenset types were built on top of Python’s dictionary type. In 2.5
the internal data structure has been customized for implementing sets, and as a
result sets will use a third less memory and are somewhat faster. (Implemented
by Raymond Hettinger.)
The speed of some Unicode operations, such as finding substrings, string
splitting, and character map encoding and decoding, has been improved.
(Substring search and splitting improvements were added by Fredrik Lundh and
Andrew Dalke at the NeedForSpeed sprint. Character maps were improved by Walter
Dörwald and Martin von Löwis.)
The long(str,base)() function is now faster on long digit strings
because fewer intermediate results are calculated. The peak is for strings of
around 800–1000 digits where the function is 6 times faster. (Contributed by
Alan McIntyre and committed at the NeedForSpeed sprint.)
It’s now illegal to mix iterating over a file with forlineinfile and
calling the file object’s read()/readline()/readlines()
methods. Iteration uses an internal buffer and the read*() methods
don’t use that buffer. Instead they would return the data following the
buffer, causing the data to appear out of order. Mixing iteration and these
methods will now trigger a ValueError from the read*() method.
(Implemented by Thomas Wouters.)
The struct module now compiles structure format strings into an
internal representation and caches this representation, yielding a 20% speedup.
(Contributed by Bob Ippolito at the NeedForSpeed sprint.)
The re module got a 1 or 2% speedup by switching to Python’s allocator
functions instead of the system’s malloc() and free().
(Contributed by Jack Diederich at the NeedForSpeed sprint.)
The code generator’s peephole optimizer now performs simple constant folding
in expressions. If you write something like a=2+3, the code generator
will do the arithmetic and produce code corresponding to a=5. (Proposed
and implemented by Raymond Hettinger.)
Function calls are now faster because code objects now keep the most recently
finished frame (a “zombie frame”) in an internal field of the code object,
reusing it the next time the code object is invoked. (Original patch by Michael
Hudson, modified by Armin Rigo and Richard Jones; committed at the NeedForSpeed
sprint.) Frame objects are also slightly smaller, which may improve cache
locality and reduce memory usage a bit. (Contributed by Neal Norwitz.)
Python’s built-in exceptions are now new-style classes, a change that speeds
up instantiation considerably. Exception handling in Python 2.5 is therefore
about 30% faster than in 2.4. (Contributed by Richard Jones, Georg Brandl and
Sean Reifschneider at the NeedForSpeed sprint.)
Importing now caches the paths tried, recording whether they exist or not so
that the interpreter makes fewer open() and stat() calls on
startup. (Contributed by Martin von Löwis and Georg Brandl.)
The standard library received many enhancements and bug fixes in Python 2.5.
Here’s a partial list of the most notable changes, sorted alphabetically by
module name. Consult the Misc/NEWS file in the source tree for a more
complete list of changes, or look through the SVN logs for all the details.
The audioop module now supports the a-LAW encoding, and the code for
u-LAW encoding has been improved. (Contributed by Lars Immisch.)
The codecs module gained support for incremental codecs. The
codec.lookup() function now returns a CodecInfo instance instead
of a tuple. CodecInfo instances behave like a 4-tuple to preserve
backward compatibility but also have the attributes encode,
decode, incrementalencoder, incrementaldecoder,
streamwriter, and streamreader. Incremental codecs can receive
input and produce output in multiple chunks; the output is the same as if the
entire input was fed to the non-incremental codec. See the codecs module
documentation for details. (Designed and implemented by Walter Dörwald.)
The collections module gained a new type, defaultdict, that
subclasses the standard dict type. The new type mostly behaves like a
dictionary but constructs a default value when a key isn’t present,
automatically adding it to the dictionary for the requested key value.
The first argument to defaultdict‘s constructor is a factory function
that gets called whenever a key is requested but not found. This factory
function receives no arguments, so you can use built-in type constructors such
as list() or int(). For example, you can make an index of words
based on their initial letter like this:
words="""Nel mezzo del cammin di nostra vitami ritrovai per una selva oscurache la diritta via era smarrita""".lower().split()index=defaultdict(list)forwinwords:init_letter=w[0]index[init_letter].append(w)
The deque double-ended queue type supplied by the collections
module now has a remove(value)() method that removes the first occurrence
of value in the queue, raising ValueError if the value isn’t found.
(Contributed by Raymond Hettinger.)
New module: The contextlib module contains helper functions for use
with the new ‘with‘ statement. See section The contextlib module
for more about this module.
New module: The cProfile module is a C implementation of the existing
profile module that has much lower overhead. The module’s interface is
the same as profile: you run cProfile.run('main()') to profile a
function, can save profile data to a file, etc. It’s not yet known if the
Hotshot profiler, which is also written in C but doesn’t match the
profile module’s interface, will continue to be maintained in future
versions of Python. (Contributed by Armin Rigo.)
Also, the pstats module for analyzing the data measured by the profiler
now supports directing the output to any file object by supplying a stream
argument to the Stats constructor. (Contributed by Skip Montanaro.)
The csv module, which parses files in comma-separated value format,
received several enhancements and a number of bugfixes. You can now set the
maximum size in bytes of a field by calling the
csv.field_size_limit(new_limit)() function; omitting the new_limit
argument will return the currently-set limit. The reader class now has
a line_num attribute that counts the number of physical lines read from
the source; records can span multiple physical lines, so line_num is not
the same as the number of records read.
The CSV parser is now stricter about multi-line quoted fields. Previously, if a
line ended within a quoted field without a terminating newline character, a
newline would be inserted into the returned field. This behavior caused problems
when reading files that contained carriage return characters within fields, so
the code was changed to return the field without inserting newlines. As a
consequence, if newlines embedded within fields are important, the input should
be split into lines in a manner that preserves the newline characters.
(Contributed by Skip Montanaro and Andrew McNamara.)
The datetime class in the datetime module now has a
strptime(string,format)() method for parsing date strings, contributed
by Josh Spoerri. It uses the same format characters as time.strptime() and
time.strftime():
The SequenceMatcher.get_matching_blocks() method in the difflib
module now guarantees to return a minimal list of blocks describing matching
subsequences. Previously, the algorithm would occasionally break a block of
matching elements into two list entries. (Enhancement by Tim Peters.)
The doctest module gained a SKIP option that keeps an example from
being executed at all. This is intended for code snippets that are usage
examples intended for the reader and aren’t actually test cases.
An encoding parameter was added to the testfile() function and the
DocFileSuite class to specify the file’s encoding. This makes it
easier to use non-ASCII characters in tests contained within a docstring.
(Contributed by Bjorn Tillenius.)
The email package has been updated to version 4.0. (Contributed by
Barry Warsaw.)
The fileinput module was made more flexible. Unicode filenames are now
supported, and a mode parameter that defaults to "r" was added to the
input() function to allow opening files in binary or universal-newline
mode. Another new parameter, openhook, lets you use a function other than
open() to open the input files. Once you’re iterating over the set of
files, the FileInput object’s new fileno() returns the file
descriptor for the currently opened file. (Contributed by Georg Brandl.)
In the gc module, the new get_count() function returns a 3-tuple
containing the current collection counts for the three GC generations. This is
accounting information for the garbage collector; when these counts reach a
specified threshold, a garbage collection sweep will be made. The existing
gc.collect() function now takes an optional generation argument of 0, 1,
or 2 to specify which generation to collect. (Contributed by Barry Warsaw.)
The nsmallest() and nlargest() functions in the heapq
module now support a key keyword parameter similar to the one provided by
the min()/max() functions and the sort() methods. For
example:
>>> importheapq>>> L=["short",'medium','longest','longer still']>>> heapq.nsmallest(2,L)# Return two lowest elements, lexicographically['longer still', 'longest']>>> heapq.nsmallest(2,L,key=len)# Return two shortest elements['short', 'medium']
(Contributed by Raymond Hettinger.)
The itertools.islice() function now accepts None for the start and
step arguments. This makes it more compatible with the attributes of slice
objects, so that you can now write the following:
The format() function in the locale module has been modified and
two new functions were added, format_string() and currency().
The format() function’s val parameter could previously be a string as
long as no more than one %char specifier appeared; now the parameter must be
exactly one %char specifier with no surrounding text. An optional monetary
parameter was also added which, if True, will use the locale’s rules for
formatting currency in placing a separator between groups of three digits.
To format strings with multiple %char specifiers, use the new
format_string() function that works like format() but also supports
mixing %char specifiers with arbitrary text.
A new currency() function was also added that formats a number according
to the current locale’s settings.
(Contributed by Georg Brandl.)
The mailbox module underwent a massive rewrite to add the capability to
modify mailboxes in addition to reading them. A new set of classes that include
mbox, MH, and Maildir are used to read mailboxes, and
have an add(message)() method to add messages, remove(key)() to
remove messages, and lock()/unlock() to lock/unlock the mailbox.
The following example converts a maildir-format mailbox into an mbox-format
one:
importmailbox# 'factory=None' uses email.Message.Message as the class representing# individual messages.src=mailbox.Maildir('maildir',factory=None)dest=mailbox.mbox('/tmp/mbox')formsginsrc:dest.add(msg)
(Contributed by Gregory K. Johnson. Funding was provided by Google’s 2005
Summer of Code.)
New module: the msilib module allows creating Microsoft Installer
.msi files and CAB files. Some support for reading the .msi
database is also included. (Contributed by Martin von Löwis.)
The nis module now supports accessing domains other than the system
default domain by supplying a domain argument to the nis.match() and
nis.maps() functions. (Contributed by Ben Bell.)
The operator module’s itemgetter() and attrgetter()
functions now support multiple fields. A call such as
operator.attrgetter('a','b') will return a function that retrieves the
a and b attributes. Combining this new feature with the
sort() method’s key parameter lets you easily sort lists using
multiple fields. (Contributed by Raymond Hettinger.)
The optparse module was updated to version 1.5.1 of the Optik library.
The OptionParser class gained an epilog attribute, a string
that will be printed after the help message, and a destroy() method to
break reference cycles created by the object. (Contributed by Greg Ward.)
The os module underwent several changes. The stat_float_times
variable now defaults to true, meaning that os.stat() will now return time
values as floats. (This doesn’t necessarily mean that os.stat() will
return times that are precise to fractions of a second; not all systems support
such precision.)
Two new functions, wait3() and wait4(), were added. They’re similar
the waitpid() function which waits for a child process to exit and returns
a tuple of the process ID and its exit status, but wait3() and
wait4() return additional information. wait3() doesn’t take a
process ID as input, so it waits for any child process to exit and returns a
3-tuple of process-id, exit-status, resource-usage as returned from the
resource.getrusage() function. wait4(pid)() does take a process ID.
(Contributed by Chad J. Schroeder.)
On FreeBSD, the os.stat() function now returns times with nanosecond
resolution, and the returned object now has st_gen and
st_birthtime. The st_flags attribute is also available, if the
platform supports it. (Contributed by Antti Louko and Diego Pettenò.)
The Python debugger provided by the pdb module can now store lists of
commands to execute when a breakpoint is reached and execution stops. Once
breakpoint #1 has been created, enter commands1 and enter a series of
commands to be executed, finishing the list with end. The command list can
include commands that resume execution, such as continue or next.
(Contributed by Grégoire Dooms.)
The pickle and cPickle modules no longer accept a return value
of None from the __reduce__() method; the method must return a tuple
of arguments instead. The ability to return None was deprecated in Python
2.4, so this completes the removal of the feature.
The pkgutil module, containing various utility functions for finding
packages, was enhanced to support PEP 302’s import hooks and now also works for
packages stored in ZIP-format archives. (Contributed by Phillip J. Eby.)
The pybench benchmark suite by Marc-André Lemburg is now included in the
Tools/pybench directory. The pybench suite is an improvement on the
commonly used pystone.py program because pybench provides a more
detailed measurement of the interpreter’s speed. It times particular operations
such as function calls, tuple slicing, method lookups, and numeric operations,
instead of performing many different operations and reducing the result to a
single number as pystone.py does.
The pyexpat module now uses version 2.0 of the Expat parser.
(Contributed by Trent Mick.)
The Queue class provided by the Queue module gained two new
methods. join() blocks until all items in the queue have been retrieved
and all processing work on the items have been completed. Worker threads call
the other new method, task_done(), to signal that processing for an item
has been completed. (Contributed by Raymond Hettinger.)
The old regex and regsub modules, which have been deprecated
ever since Python 2.0, have finally been deleted. Other deleted modules:
statcache, tzparse, whrandom.
Also deleted: the lib-old directory, which includes ancient modules
such as dircmp and ni, was removed. lib-old wasn’t on the
default sys.path, so unless your programs explicitly added the directory to
sys.path, this removal shouldn’t affect your code.
The rlcompleter module is no longer dependent on importing the
readline module and therefore now works on non-Unix platforms. (Patch
from Robert Kiendl.)
The SimpleXMLRPCServer and DocXMLRPCServer classes now have a
rpc_paths attribute that constrains XML-RPC operations to a limited set
of URL paths; the default is to allow only '/' and '/RPC2'. Setting
rpc_paths to None or an empty tuple disables this path checking.
The socket module now supports AF_NETLINK sockets on Linux,
thanks to a patch from Philippe Biondi. Netlink sockets are a Linux-specific
mechanism for communications between a user-space process and kernel code; an
introductory article about them is at http://www.linuxjournal.com/article/7356.
In Python code, netlink addresses are represented as a tuple of 2 integers,
(pid,group_mask).
Two new methods on socket objects, recv_into(buffer)() and
recvfrom_into(buffer)(), store the received data in an object that
supports the buffer protocol instead of returning the data as a string. This
means you can put the data directly into an array or a memory-mapped file.
Socket objects also gained getfamily(), gettype(), and
getproto() accessor methods to retrieve the family, type, and protocol
values for the socket.
New module: the spwd module provides functions for accessing the shadow
password database on systems that support shadow passwords.
The struct is now faster because it compiles format strings into
Struct objects with pack() and unpack() methods. This is
similar to how the re module lets you create compiled regular expression
objects. You can still use the module-level pack() and unpack()
functions; they’ll create Struct objects and cache them. Or you can
use Struct instances directly:
You can also pack and unpack data to and from buffer objects directly using the
pack_into(buffer,offset,v1,v2,...)() and unpack_from(buffer,offset)() methods. This lets you store data directly into an array or a memory-
mapped file.
(Struct objects were implemented by Bob Ippolito at the NeedForSpeed
sprint. Support for buffer objects was added by Martin Blais, also at the
NeedForSpeed sprint.)
The Python developers switched from CVS to Subversion during the 2.5
development process. Information about the exact build version is available as
the sys.subversion variable, a 3-tuple of (interpreter-name,branch-name,revision-range). For example, at the time of writing my copy of 2.5 was
reporting ('CPython','trunk','45313:45315').
This information is also available to C extensions via the
Py_GetBuildInfo() function that returns a string of build information
like this: "trunk:45355:45356M,Apr132006,07:42:19". (Contributed by
Barry Warsaw.)
Another new function, sys._current_frames(), returns the current stack
frames for all running threads as a dictionary mapping thread identifiers to the
topmost stack frame currently active in that thread at the time the function is
called. (Contributed by Tim Peters.)
The TarFile class in the tarfile module now has an
extractall() method that extracts all members from the archive into the
current working directory. It’s also possible to set a different directory as
the extraction target, and to unpack only a subset of the archive’s members.
The compression used for a tarfile opened in stream mode can now be autodetected
using the mode 'r|*'. (Contributed by Lars Gustäbel.)
The threading module now lets you set the stack size used when new
threads are created. The stack_size([*size*])() function returns the
currently configured stack size, and supplying the optional size parameter
sets a new value. Not all platforms support changing the stack size, but
Windows, POSIX threading, and OS/2 all do. (Contributed by Andrew MacIntyre.)
The unicodedata module has been updated to use version 4.1.0 of the
Unicode character database. Version 3.2.0 is required by some specifications,
so it’s still available as unicodedata.ucd_3_2_0.
New module: the uuid module generates universally unique identifiers
(UUIDs) according to RFC 4122. The RFC defines several different UUID
versions that are generated from a starting string, from system properties, or
purely randomly. This module contains a UUID class and functions
named uuid1(), uuid3(), uuid4(), and uuid5() to
generate different versions of UUID. (Version 2 UUIDs are not specified in
RFC 4122 and are not supported by this module.)
>>> importuuid>>> # make a UUID based on the host ID and current time>>> uuid.uuid1()UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')>>> # make a UUID using an MD5 hash of a namespace UUID and a name>>> uuid.uuid3(uuid.NAMESPACE_DNS,'python.org')UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')>>> # make a random UUID>>> uuid.uuid4()UUID('16fd2706-8baf-433b-82eb-8c7fada847da')>>> # make a UUID using a SHA-1 hash of a namespace UUID and a name>>> uuid.uuid5(uuid.NAMESPACE_DNS,'python.org')UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
(Contributed by Ka-Ping Yee.)
The weakref module’s WeakKeyDictionary and
WeakValueDictionary types gained new methods for iterating over the
weak references contained in the dictionary. iterkeyrefs() and
keyrefs() methods were added to WeakKeyDictionary, and
itervaluerefs() and valuerefs() were added to
WeakValueDictionary. (Contributed by Fred L. Drake, Jr.)
The webbrowser module received a number of enhancements. It’s now
usable as a script with python-mwebbrowser, taking a URL as the argument;
there are a number of switches to control the behaviour (-n for a new
browser window, -t for a new tab). New module-level functions,
open_new() and open_new_tab(), were added to support this. The
module’s open() function supports an additional feature, an autoraise
parameter that signals whether to raise the open window when possible. A number
of additional browsers were added to the supported list such as Firefox, Opera,
Konqueror, and elinks. (Contributed by Oleg Broytmann and Georg Brandl.)
The xmlrpclib module now supports returning datetime objects
for the XML-RPC date type. Supply use_datetime=True to the loads()
function or the Unmarshaller class to enable this feature. (Contributed
by Skip Montanaro.)
The zipfile module now supports the ZIP64 version of the format,
meaning that a .zip archive can now be larger than 4 GiB and can contain
individual files larger than 4 GiB. (Contributed by Ronald Oussoren.)
The zlib module’s Compress and Decompress objects now
support a copy() method that makes a copy of the object’s internal state
and returns a new Compress or Decompress object.
(Contributed by Chris AtLee.)
The ctypes package, written by Thomas Heller, has been added to the
standard library. ctypes lets you call arbitrary functions in shared
libraries or DLLs. Long-time users may remember the dl module, which
provides functions for loading shared libraries and calling functions in them.
The ctypes package is much fancier.
To load a shared library or DLL, you must create an instance of the
CDLL class and provide the name or path of the shared library or DLL.
Once that’s done, you can call arbitrary functions by accessing them as
attributes of the CDLL object.
importctypeslibc=ctypes.CDLL('libc.so.6')result=libc.printf("Line of output\n")
Type constructors for the various C types are provided: c_int(),
c_float(), c_double(), c_char_p() (equivalent to char*), and so forth. Unlike Python’s types, the C versions are all mutable; you
can assign to their value attribute to change the wrapped value. Python
integers and strings will be automatically converted to the corresponding C
types, but for other types you must call the correct type constructor. (And I
mean must; getting it wrong will often result in the interpreter crashing
with a segmentation fault.)
You shouldn’t use c_char_p() with a Python string when the C function will
be modifying the memory area, because Python strings are supposed to be
immutable; breaking this rule will cause puzzling bugs. When you need a
modifiable memory area, use create_string_buffer():
s="this is a string"buf=ctypes.create_string_buffer(s)libc.strfry(buf)
C functions are assumed to return integers, but you can set the restype
attribute of the function object to change this:
ctypes also provides a wrapper for Python’s C API as the
ctypes.pythonapi object. This object does not release the global
interpreter lock before calling a function, because the lock must be held when
calling into the interpreter’s code. There’s a py_object() type
constructor that will create a PyObject* pointer. A simple usage:
importctypesd={}ctypes.pythonapi.PyObject_SetItem(ctypes.py_object(d),ctypes.py_object("abc"),ctypes.py_object(1))# d is now {'abc', 1}.
Don’t forget to use py_object(); if it’s omitted you end up with a
segmentation fault.
ctypes has been around for a while, but people still write and
distribution hand-coded extension modules because you can’t rely on
ctypes being present. Perhaps developers will begin to write Python
wrappers atop a library accessed through ctypes instead of extension
modules, now that ctypes is included with core Python.
A subset of Fredrik Lundh’s ElementTree library for processing XML has been
added to the standard library as xml.etree. The available modules are
ElementTree, ElementPath, and ElementInclude from
ElementTree 1.2.6. The cElementTree accelerator module is also
included.
The rest of this section will provide a brief overview of using ElementTree.
Full documentation for ElementTree is available at
http://effbot.org/zone/element-index.htm.
ElementTree represents an XML document as a tree of element nodes. The text
content of the document is stored as the text and tail
attributes of (This is one of the major differences between ElementTree and
the Document Object Model; in the DOM there are many different types of node,
including TextNode.)
The most commonly used parsing function is parse(), that takes either a
string (assumed to contain a filename) or a file-like object and returns an
ElementTree instance:
Once you have an ElementTree instance, you can call its getroot()
method to get the root Element node.
There’s also an XML() function that takes a string literal and returns an
Element node (not an ElementTree). This function provides a
tidy way to incorporate XML fragments, approaching the convenience of an XML
literal:
Each XML element supports some dictionary-like and some list-like access
methods. Dictionary-like operations are used to access attribute values, and
list-like operations are used to access child nodes.
Operation
Result
elem[n]
Returns n’th child element.
elem[m:n]
Returns list of m’th through n’th child
elements.
len(elem)
Returns number of child elements.
list(elem)
Returns list of child elements.
elem.append(elem2)
Adds elem2 as a child.
elem.insert(index,elem2)
Inserts elem2 at the specified location.
delelem[n]
Deletes n’th child element.
elem.keys()
Returns list of attribute names.
elem.get(name)
Returns value of attribute name.
elem.set(name,value)
Sets new value for attribute name.
elem.attrib
Retrieves the dictionary containing
attributes.
delelem.attrib[name]
Deletes attribute name.
Comments and processing instructions are also represented as Element
nodes. To check if a node is a comment or processing instructions:
To generate XML output, you should call the ElementTree.write() method.
Like parse(), it can take either a string or a file-like object:
# Encoding is US-ASCIItree.write('output.xml')# Encoding is UTF-8f=open('output.xml','w')tree.write(f,encoding='utf-8')
(Caution: the default encoding used for output is ASCII. For general XML work,
where an element’s name may contain arbitrary Unicode characters, ASCII isn’t a
very useful encoding because it will raise an exception if an element’s name
contains any characters with values greater than 127. Therefore, it’s best to
specify a different encoding such as UTF-8 that can handle any Unicode
character.)
This section is only a partial description of the ElementTree interfaces. Please
read the package’s official documentation for more details.
A new hashlib module, written by Gregory P. Smith, has been added to
replace the md5 and sha modules. hashlib adds support for
additional secure hashes (SHA-224, SHA-256, SHA-384, and SHA-512). When
available, the module uses OpenSSL for fast platform optimized implementations
of algorithms.
The old md5 and sha modules still exist as wrappers around hashlib
to preserve backwards compatibility. The new module’s interface is very close
to that of the old modules, but not identical. The most significant difference
is that the constructor functions for creating new hashing objects are named
differently.
# Old versionsh=md5.md5()h=md5.new()# New versionh=hashlib.md5()# Old versionsh=sha.sha()h=sha.new()# New versionh=hashlib.sha1()# Hash that weren't previously availableh=hashlib.sha224()h=hashlib.sha256()h=hashlib.sha384()h=hashlib.sha512()# Alternative formh=hashlib.new('md5')# Provide algorithm as a string
Once a hash object has been created, its methods are the same as before:
update(string)() hashes the specified string into the current digest
state, digest() and hexdigest() return the digest value as a binary
string or a string of hex digits, and copy() returns a new hashing object
with the same digest state.
The pysqlite module (http://www.pysqlite.org), a wrapper for the SQLite embedded
database, has been added to the standard library under the package name
sqlite3.
SQLite is a C library that provides a lightweight disk-based database that
doesn’t require a separate server process and allows accessing the database
using a nonstandard variant of the SQL query language. Some applications can use
SQLite for internal data storage. It’s also possible to prototype an
application using SQLite and then port the code to a larger database such as
PostgreSQL or Oracle.
pysqlite was written by Gerhard Häring and provides a SQL interface compliant
with the DB-API 2.0 specification described by PEP 249.
If you’re compiling the Python source yourself, note that the source tree
doesn’t include the SQLite code, only the wrapper module. You’ll need to have
the SQLite libraries and headers installed before compiling Python, and the
build process will compile the module when the necessary headers are available.
To use the module, you must first create a Connection object that
represents the database. Here the data will be stored in the
/tmp/example file:
conn=sqlite3.connect('/tmp/example')
You can also supply the special name :memory: to create a database in RAM.
Once you have a Connection, you can create a Cursor object
and call its execute() method to perform SQL commands:
c=conn.cursor()# Create tablec.execute('''create table stocks(date text, trans text, symbol text, qty real, price real)''')# Insert a row of datac.execute("""insert into stocks values ('2006-01-05','BUY','RHAT',100,35.14)""")
Usually your SQL operations will need to use values from Python variables. You
shouldn’t assemble your query using Python’s string operations because doing so
is insecure; it makes your program vulnerable to an SQL injection attack.
Instead, use the DB-API’s parameter substitution. Put ? as a placeholder
wherever you want to use a value, and then provide a tuple of values as the
second argument to the cursor’s execute() method. (Other database modules
may use a different placeholder, such as %s or :1.) For example:
# Never do this -- insecure!symbol='IBM'c.execute("... where symbol = '%s'"%symbol)# Do this insteadt=(symbol,)c.execute('select * from stocks where symbol=?',t)# Larger examplefortin(('2006-03-28','BUY','IBM',1000,45.00),('2006-04-05','BUY','MSOFT',1000,72.00),('2006-04-06','SELL','IBM',500,53.00),):c.execute('insert into stocks values (?,?,?,?,?)',t)
To retrieve data after executing a SELECT statement, you can either treat the
cursor as an iterator, call the cursor’s fetchone() method to retrieve a
single matching row, or call fetchall() to get a list of the matching
rows.
This example uses the iterator form:
>>> c=conn.cursor()>>> c.execute('select * from stocks order by price')>>> forrowinc:... printrow...(u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000001)(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)>>>
For more information about the SQL dialect supported by SQLite, see
http://www.sqlite.org.
The Web Server Gateway Interface (WSGI) v1.0 defines a standard interface
between web servers and Python web applications and is described in PEP 333.
The wsgiref package is a reference implementation of the WSGI
specification.
The package includes a basic HTTP server that will run a WSGI application; this
server is useful for debugging but isn’t intended for production use. Setting
up a server takes only a few lines of code:
Changes to Python’s build process and to the C API include:
The Python source tree was converted from CVS to Subversion, in a complex
migration procedure that was supervised and flawlessly carried out by Martin von
Löwis. The procedure was developed as PEP 347.
Coverity, a company that markets a source code analysis tool called Prevent,
provided the results of their examination of the Python source code. The
analysis found about 60 bugs that were quickly fixed. Many of the bugs were
refcounting problems, often occurring in error-handling code. See
http://scan.coverity.com for the statistics.
The largest change to the C API came from PEP 353, which modifies the
interpreter to use a Py_ssize_t type definition instead of
int. See the earlier section PEP 353: Using ssize_t as the index type for a discussion of this
change.
The design of the bytecode compiler has changed a great deal, no longer
generating bytecode by traversing the parse tree. Instead the parse tree is
converted to an abstract syntax tree (or AST), and it is the abstract syntax
tree that’s traversed to produce the bytecode.
It’s possible for Python code to obtain AST objects by using the
compile() built-in and specifying _ast.PyCF_ONLY_AST as the value of
the flags parameter:
from_astimportPyCF_ONLY_ASTast=compile("""a=0for i in range(10): a += i""","<string>",'exec',PyCF_ONLY_AST)assignment=ast.body[0]for_loop=ast.body[1]
No official documentation has been written for the AST code yet, but PEP 339
discusses the design. To start learning about the code, read the definition of
the various AST nodes in Parser/Python.asdl. A Python script reads this
file and generates a set of C structure definitions in
Include/Python-ast.h. The PyParser_ASTFromString() and
PyParser_ASTFromFile(), defined in Include/pythonrun.h, take
Python source as input and return the root of an AST representing the contents.
This AST can then be turned into a code object by PyAST_Compile(). For
more information, read the source code, and then ask questions on python-dev.
The AST code was developed under Jeremy Hylton’s management, and implemented by
(in alphabetical order) Brett Cannon, Nick Coghlan, Grant Edwards, John
Ehresman, Kurt Kaiser, Neal Norwitz, Tim Peters, Armin Rigo, and Neil
Schemenauer, plus the participants in a number of AST sprints at conferences
such as PyCon.
Evan Jones’s patch to obmalloc, first described in a talk at PyCon DC 2005,
was applied. Python 2.4 allocated small objects in 256K-sized arenas, but never
freed arenas. With this patch, Python will free arenas when they’re empty. The
net effect is that on some platforms, when you allocate many objects, Python’s
memory usage may actually drop when you delete them and the memory may be
returned to the operating system. (Implemented by Evan Jones, and reworked by
Tim Peters.)
Note that this change means extension modules must be more careful when
allocating memory. Python’s API has many different functions for allocating
memory that are grouped into families. For example, PyMem_Malloc(),
PyMem_Realloc(), and PyMem_Free() are one family that allocates
raw memory, while PyObject_Malloc(), PyObject_Realloc(), and
PyObject_Free() are another family that’s supposed to be used for
creating Python objects.
Previously these different families all reduced to the platform’s
malloc() and free() functions. This meant it didn’t matter if
you got things wrong and allocated memory with the PyMem() function but
freed it with the PyObject() function. With 2.5’s changes to obmalloc,
these families now do different things and mismatches will probably result in a
segfault. You should carefully test your C extension modules with Python 2.5.
C code can now obtain information about the exact revision of the Python
interpreter by calling the Py_GetBuildInfo() function that returns a
string of build information like this: "trunk:45355:45356M,Apr132006,07:42:19". (Contributed by Barry Warsaw.)
Two new macros can be used to indicate C functions that are local to the
current file so that a faster calling convention can be used.
Py_LOCAL(type)() declares the function as returning a value of the
specified type and uses a fast-calling qualifier.
Py_LOCAL_INLINE(type)() does the same thing and also requests the
function be inlined. If PY_LOCAL_AGGRESSIVE() is defined before
python.h is included, a set of more aggressive optimizations are enabled
for the module; you should benchmark the results to find out if these
optimizations actually make the code faster. (Contributed by Fredrik Lundh at
the NeedForSpeed sprint.)
PyErr_NewException(name,base,dict)() can now accept a tuple of base
classes as its base argument. (Contributed by Georg Brandl.)
The PyErr_Warn() function for issuing warnings is now deprecated in
favour of PyErr_WarnEx(category,message,stacklevel)() which lets you
specify the number of stack frames separating this function and the caller. A
stacklevel of 1 is the function calling PyErr_WarnEx(), 2 is the
function above that, and so forth. (Added by Neal Norwitz.)
The CPython interpreter is still written in C, but the code can now be
compiled with a C++ compiler without errors. (Implemented by Anthony Baxter,
Martin von Löwis, Skip Montanaro.)
The PyRange_New() function was removed. It was never documented, never
used in the core code, and had dangerously lax error checking. In the unlikely
case that your extensions were using it, you can replace it by something like
the following:
MacOS X (10.3 and higher): dynamic loading of modules now uses the
dlopen() function instead of MacOS-specific functions.
MacOS X: an --enable-universalsdk switch was added to the
configure script that compiles the interpreter as a universal binary
able to run on both PowerPC and Intel processors. (Contributed by Ronald
Oussoren; issue 2573.)
Windows: .dll is no longer supported as a filename extension for
extension modules. .pyd is now the only filename extension that will be
searched for.
This section lists previously described changes that may require changes to your
code:
ASCII is now the default encoding for modules. It’s now a syntax error if a
module contains string literals with 8-bit characters but doesn’t have an
encoding declaration. In Python 2.4 this triggered a warning, not a syntax
error.
Previously, the gi_frame attribute of a generator was always a frame
object. Because of the PEP 342 changes described in section PEP 342: New Generator Features,
it’s now possible for gi_frame to be None.
A new warning, UnicodeWarning, is triggered when you attempt to
compare a Unicode string and an 8-bit string that can’t be converted to Unicode
using the default ASCII encoding. Previously such comparisons would raise a
UnicodeDecodeError exception.
Library: the csv module is now stricter about multi-line quoted fields.
If your files contain newlines embedded within fields, the input should be split
into lines in a manner which preserves the newline characters.
Library: the locale module’s format() function’s would
previously accept any string as long as no more than one %char specifier
appeared. In Python 2.5, the argument must be exactly one %char specifier with
no surrounding text.
Library: The pickle and cPickle modules no longer accept a
return value of None from the __reduce__() method; the method must
return a tuple of arguments instead. The modules also no longer accept the
deprecated bin keyword parameter.
Library: The SimpleXMLRPCServer and DocXMLRPCServer classes now
have a rpc_paths attribute that constrains XML-RPC operations to a
limited set of URL paths; the default is to allow only '/' and '/RPC2'.
Setting rpc_paths to None or an empty tuple disables this path
checking.
C API: Many functions now use Py_ssize_t instead of int to
allow processing more data on 64-bit machines. Extension code may need to make
the same change to avoid warnings and to support 64-bit machines. See the
earlier section PEP 353: Using ssize_t as the index type for a discussion of this change.
C API: The obmalloc changes mean that you must be careful to not mix usage
of the PyMem_*() and PyObject_*() families of functions. Memory
allocated with one family’s *_Malloc() must be freed with the
corresponding family’s *_Free() function.
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Georg Brandl,
Nick Coghlan, Phillip J. Eby, Lars Gustäbel, Raymond Hettinger, Ralf W. Grosse-
Kunstleve, Kent Johnson, Iain Lowe, Martin von Löwis, Fredrik Lundh, Andrew
McNamara, Skip Montanaro, Gustavo Niemeyer, Paul Prescod, James Pryor, Mike
Rovner, Scott Weikart, Barry Warsaw, Thomas Wouters.
This article explains the new features in Python 2.4.1, released on March 30,
2005.
Python 2.4 is a medium-sized release. It doesn’t introduce as many changes as
the radical Python 2.2, but introduces more features than the conservative 2.3
release. The most significant new language features are function decorators and
generator expressions; most other changes are to the standard library.
According to the CVS change logs, there were 481 patches applied and 502 bugs
fixed between Python 2.3 and 2.4. Both figures are likely to be underestimates.
This article doesn’t attempt to provide a complete specification of every single
new feature, but instead provides a brief introduction to each feature. For
full details, you should refer to the documentation for Python 2.4, such as the
Python Library Reference and the Python Reference Manual. Often you will be
referred to the PEP for a particular new feature for explanations of the
implementation and design rationale.
Python 2.3 introduced the sets module. C implementations of set data
types have now been added to the Python core as two new built-in types,
set(iterable)() and frozenset(iterable)(). They provide high speed
operations for membership testing, for eliminating duplicates from sequences,
and for mathematical operations like unions, intersections, differences, and
symmetric differences.
>>> a=set('abracadabra')# form a set from a string>>> 'z'ina# fast membership testingFalse>>> a# unique letters in aset(['a', 'r', 'b', 'c', 'd'])>>> ''.join(a)# convert back into a string'arbcd'>>> b=set('alacazam')# form a second set>>> a-b# letters in a but not in bset(['r', 'd', 'b'])>>> a|b# letters in either a or bset(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])>>> a&b# letters in both a and bset(['a', 'c'])>>> a^b# letters in a or b but not bothset(['r', 'd', 'b', 'm', 'z', 'l'])>>> a.add('z')# add a new element>>> a.update('wxy')# add multiple new elements>>> aset(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])>>> a.remove('x')# take one element out>>> aset(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
The frozenset() type is an immutable version of set(). Since it is
immutable and hashable, it may be used as a dictionary key or as a member of
another set.
The sets module remains in the standard library, and may be useful if you
wish to subclass the Set or ImmutableSet classes. There are
currently no plans to deprecate the module.
The lengthy transition process for this PEP, begun in Python 2.2, takes another
step forward in Python 2.4. In 2.3, certain integer operations that would
behave differently after int/long unification triggered FutureWarning
warnings and returned values limited to 32 or 64 bits (depending on your
platform). In 2.4, these expressions no longer produce a warning and instead
produce a different result that’s usually a long integer.
The problematic expressions are primarily left shifts and lengthy hexadecimal
and octal constants. For example, 2<<32 results in a warning in 2.3,
evaluating to 0 on 32-bit platforms. In Python 2.4, this expression now returns
the correct answer, 8589934592.
The iterator feature introduced in Python 2.2 and the itertools module
make it easier to write programs that loop through large data sets without
having the entire data set in memory at one time. List comprehensions don’t fit
into this picture very well because they produce a Python list object containing
all of the items. This unavoidably pulls all of the objects into memory, which
can be a problem if your data set is very large. When trying to write a
functionally-styled program, it would be natural to write something like:
The first form is more concise and perhaps more readable, but if you’re dealing
with a large number of link objects you’d have to write the second form to avoid
having all link objects in memory at the same time.
Generator expressions work similarly to list comprehensions but don’t
materialize the entire list; instead they create a generator that will return
elements one by one. The above example could be written as:
Generator expressions always have to be written inside parentheses, as in the
above example. The parentheses signalling a function call also count, so if you
want to create an iterator that will be immediately passed to a function you
could write:
printsum(obj.countforobjinlist_all_objects())
Generator expressions differ from list comprehensions in various small ways.
Most notably, the loop variable (obj in the above example) is not accessible
outside of the generator expression. List comprehensions leave the variable
assigned to its last value; future versions of Python will change this, making
list comprehensions match generator expressions in this respect.
Some new classes in the standard library provide an alternative mechanism for
substituting variables into strings; this style of substitution may be better
for applications where untrained users need to edit templates.
The usual way of substituting variables by name is the % operator:
>>> '%(page)i: %(title)s'%{'page':2,'title':'The Best of Times'}'2: The Best of Times'
When writing the template string, it can be easy to forget the i or s
after the closing parenthesis. This isn’t a big problem if the template is in a
Python module, because you run the code, get an “Unsupported format character”
ValueError, and fix the problem. However, consider an application such
as Mailman where template strings or translations are being edited by users who
aren’t aware of the Python language. The format string’s syntax is complicated
to explain to such users, and if they make a mistake, it’s difficult to provide
helpful feedback to them.
PEP 292 adds a Template class to the string module that uses
$ to indicate a substitution:
>>> importstring>>> t=string.Template('$page: $title')>>> t.substitute({'page':2,'title':'The Best of Times'})'2: The Best of Times'
If a key is missing from the dictionary, the substitute() method will
raise a KeyError. There’s also a safe_substitute() method that
ignores missing keys:
Python 2.2 extended Python’s object model by adding static methods and class
methods, but it didn’t extend Python’s syntax to provide any new way of defining
static or class methods. Instead, you had to write a def statement
in the usual way, and pass the resulting method to a staticmethod() or
classmethod() function that would wrap up the function as a method of the
new type. Your code would look like this:
classC:defmeth(cls):...meth=classmethod(meth)# Rebind name to wrapped-up class method
If the method was very long, it would be easy to miss or forget the
classmethod() invocation after the function body.
The intention was always to add some syntax to make such definitions more
readable, but at the time of 2.2’s release a good syntax was not obvious. Today
a good syntax still isn’t obvious but users are asking for easier access to
the feature; a new syntactic feature has been added to meet this need.
The new feature is called “function decorators”. The name comes from the idea
that classmethod(), staticmethod(), and friends are storing
additional information on a function object; they’re decorating functions with
more details.
The notation borrows from Java and uses the '@' character as an indicator.
Using the new syntax, the example above would be written:
classC:@classmethoddefmeth(cls):...
The @classmethod is shorthand for the meth=classmethod(meth) assignment.
More generally, if you have the following:
@A@B@Cdeff():...
It’s equivalent to the following pre-decorator code:
deff():...f=A(B(C(f)))
Decorators must come on the line before a function definition, one decorator per
line, and can’t be on the same line as the def statement, meaning that @Adeff():... is illegal. You can only decorate function definitions, either at
the module level or inside a class; you can’t decorate class definitions.
A decorator is just a function that takes the function to be decorated as an
argument and returns either the same function or some new object. The return
value of the decorator need not be callable (though it typically is), unless
further decorators will be applied to the result. It’s easy to write your own
decorators. The following simple example just sets an attribute on the function
object:
>>> defdeco(func):... func.attr='decorated'... returnfunc...>>> @deco... deff():pass...>>> f<function f at 0x402ef0d4>>>> f.attr'decorated'>>>
As a slightly more realistic example, the following decorator checks that the
supplied argument is an integer:
An example in PEP 318 contains a fancier version of this idea that lets you
both specify the required type and check the returned type.
Decorator functions can take arguments. If arguments are supplied, your
decorator function is called with only those arguments and must return a new
decorator function; this function must take a single function and return a
function, as previously described. In other words, @A@B@C(args) becomes:
deff():..._deco=C(args)f=A(B(_deco(f)))
Getting this right can be slightly brain-bending, but it’s not too difficult.
A small related change makes the func_name attribute of functions
writable. This attribute is used to display function names in tracebacks, so
decorators should change the name of any new function that’s constructed and
returned.
See also
PEP 318 - Decorators for Functions, Methods and Classes
Written by Kevin D. Smith, Jim Jewett, and Skip Montanaro. Several people
wrote patches implementing function decorators, but the one that was actually
checked in was patch #979728, written by Mark Russell.
The standard library provides a number of ways to execute a subprocess, offering
different features and different levels of complexity.
os.system(command)() is easy to use, but slow (it runs a shell process
which executes the command) and dangerous (you have to be careful about escaping
the shell’s metacharacters). The popen2 module offers classes that can
capture standard output and standard error from the subprocess, but the naming
is confusing. The subprocess module cleans this up, providing a unified
interface that offers all the features you might need.
Instead of popen2‘s collection of classes, subprocess contains a
single class called Popen whose constructor supports a number of
different keyword arguments.
args is commonly a sequence of strings that will be the arguments to the
program executed as the subprocess. (If the shell argument is true, args
can be a string which will then be passed on to the shell for interpretation,
just as os.system() does.)
stdin, stdout, and stderr specify what the subprocess’s input, output, and
error streams will be. You can provide a file object or a file descriptor, or
you can use the constant subprocess.PIPE to create a pipe between the
subprocess and the parent.
The constructor has a number of handy options:
close_fds requests that all file descriptors be closed before running the
subprocess.
cwd specifies the working directory in which the subprocess will be executed
(defaulting to whatever the parent’s working directory is).
env is a dictionary specifying environment variables.
preexec_fn is a function that gets called before the child is started.
universal_newlines opens the child’s input and output using Python’s
universal newline feature.
Once you’ve created the Popen instance, you can call its wait()
method to pause until the subprocess has exited, poll() to check if it’s
exited without pausing, or communicate(data)() to send the string data
to the subprocess’s standard input. communicate(data)() then reads any
data that the subprocess has sent to its standard output or standard error,
returning a tuple (stdout_data,stderr_data).
call() is a shortcut that passes its arguments along to the Popen
constructor, waits for the command to complete, and returns the status code of
the subprocess. It can serve as a safer analog to os.system():
sts=subprocess.call(['dpkg','-i','/tmp/new-package.deb'])ifsts==0:# Success...else:# dpkg returned an error...
The command is invoked without use of the shell. If you really do want to use
the shell, you can add shell=True as a keyword argument and provide a string
instead of a sequence:
The PEP takes various examples of shell and Python code and shows how they’d be
translated into Python code that uses subprocess. Reading this section
of the PEP is highly recommended.
Python has always supported floating-point (FP) numbers, based on the underlying
C double type, as a data type. However, while most programming
languages provide a floating-point type, many people (even programmers) are
unaware that floating-point numbers don’t represent certain decimal fractions
accurately. The new Decimal type can represent these fractions
accurately, up to a user-specified precision limit.
The limitations arise from the representation used for floating-point numbers.
FP numbers are made up of three components:
The sign, which is positive or negative.
The mantissa, which is a single-digit binary number followed by a fractional
part. For example, 1.01 in base-2 notation is 1+0/2+1/4, or 1.25 in
decimal notation.
The exponent, which tells where the decimal point is located in the number
represented.
For example, the number 1.25 has positive sign, a mantissa value of 1.01 (in
binary), and an exponent of 0 (the decimal point doesn’t need to be shifted).
The number 5 has the same sign and mantissa, but the exponent is 2 because the
mantissa is multiplied by 4 (2 to the power of the exponent 2); 1.25 * 4 equals
5.
Modern systems usually provide floating-point support that conforms to a
standard called IEEE 754. C’s double type is usually implemented as a
64-bit IEEE 754 number, which uses 52 bits of space for the mantissa. This
means that numbers can only be specified to 52 bits of precision. If you’re
trying to represent numbers whose expansion repeats endlessly, the expansion is
cut off after 52 bits. Unfortunately, most software needs to produce output in
base 10, and common fractions in base 10 are often repeating decimals in binary.
For example, 1.1 decimal is binary 1.0001100110011...; .1 = 1/16 + 1/32 +
1/256 plus an infinite number of additional terms. IEEE 754 has to chop off
that infinitely repeated decimal after 52 digits, so the representation is
slightly inaccurate.
Sometimes you can see this inaccuracy when the number is printed:
>>> 1.11.1000000000000001
The inaccuracy isn’t always visible when you print the number because the FP-to-
decimal-string conversion is provided by the C library, and most C libraries try
to produce sensible output. Even if it’s not displayed, however, the inaccuracy
is still there and subsequent operations can magnify the error.
For many applications this doesn’t matter. If I’m plotting points and
displaying them on my monitor, the difference between 1.1 and 1.1000000000000001
is too small to be visible. Reports often limit output to a certain number of
decimal places, and if you round the number to two or three or even eight
decimal places, the error is never apparent. However, for applications where it
does matter, it’s a lot of work to implement your own custom arithmetic
routines.
A new module, decimal, was added to Python’s standard library. It
contains two classes, Decimal and Context. Decimal
instances represent numbers, and Context instances are used to wrap up
various settings such as the precision and default rounding mode.
Decimal instances are immutable, like regular Python integers and FP
numbers; once it’s been created, you can’t change the value an instance
represents. Decimal instances can be created from integers or
strings:
Cautionary note: the sign bit is a Boolean value, so 0 is positive and 1 is
negative.
Converting from floating-point numbers poses a bit of a problem: should the FP
number representing 1.1 turn into the decimal number for exactly 1.1, or for 1.1
plus whatever inaccuracies are introduced? The decision was to dodge the issue
and leave such a conversion out of the API. Instead, you should convert the
floating-point number into a string using the desired precision and pass the
string to the Decimal constructor:
Once you have Decimal instances, you can perform the usual mathematical
operations on them. One limitation: exponentiation requires an integer
exponent:
You can combine Decimal instances with integers, but not with floating-
point numbers:
>>> a+4Decimal("39.72")>>> a+4.5Traceback (most recent call last):...TypeError: You can interact Decimal only with int, long or Decimal data types.>>>
Decimal numbers can be used with the math and cmath
modules, but note that they’ll be immediately converted to floating-point
numbers before the operation is performed, resulting in a possible loss of
precision and accuracy. You’ll also get back a regular floating-point number
and not a Decimal.
Decimal instances have a sqrt() method that returns a
Decimal, but if you need other things such as trigonometric functions
you’ll have to implement them.
Instances of the Context class encapsulate several settings for
decimal operations:
prec is the precision, the number of decimal places.
rounding specifies the rounding mode. The decimal module has
constants for the various possibilities: ROUND_DOWN,
ROUND_CEILING, ROUND_HALF_EVEN, and various others.
traps is a dictionary specifying what happens on encountering certain
error conditions: either an exception is raised or a value is returned. Some
examples of error conditions are division by zero, loss of precision, and
overflow.
There’s a thread-local default context available by calling getcontext();
you can change the properties of this context to alter the default precision,
rounding, or trap handling. The following example shows the effect of changing
the precision of the default context:
The default action for error conditions is selectable; the module can either
return a special value such as infinity or not-a-number, or exceptions can be
raised:
A description of a decimal-based representation. This representation is being
proposed as a standard, and underlies the new Python decimal type. Much of this
material was written by Mike Cowlishaw, designer of the Rexx language.
One language change is a small syntactic tweak aimed at making it easier to
import many names from a module. In a frommoduleimportnames statement,
names is a sequence of names separated by commas. If the sequence is very
long, you can either write multiple imports from the same module, or you can use
backslashes to escape the line endings like this:
The syntactic change in Python 2.4 simply allows putting the names within
parentheses. Python ignores newlines within a parenthesized expression, so the
backslashes are no longer needed:
The PEP also proposes that all import statements be absolute imports,
with a leading . character to indicate a relative import. This part of the
PEP was not implemented for Python 2.4, but was completed for Python 2.5.
See also
PEP 328 - Imports: Multi-Line and Absolute/Relative
Written by Aahz. Multi-line imports were implemented by Dima Dorfman.
PEP 331: Locale-Independent Float/String Conversions¶
The locale modules lets Python software select various conversions and
display conventions that are localized to a particular country or language.
However, the module was careful to not change the numeric locale because various
functions in Python’s implementation required that the numeric locale remain set
to the 'C' locale. Often this was because the code was using the C
library’s atof() function.
Not setting the numeric locale caused trouble for extensions that used third-
party C libraries, however, because they wouldn’t have the correct locale set.
The motivating example was GTK+, whose user interface widgets weren’t displaying
numbers in the current locale.
The solution described in the PEP is to add three new functions to the Python
API that perform ASCII-only conversions, ignoring the locale setting:
PyOS_ascii_strtod(str,ptr)() and PyOS_ascii_atof(str,ptr)()
both convert a string to a C double.
PyOS_ascii_formatd(buffer,buf_len,format,d)() converts a
double to an ASCII string.
The code for these functions came from the GLib library
(http://library.gnome.org/devel/glib/stable/), whose developers kindly
relicensed the relevant functions and donated them to the Python Software
Foundation. The locale module can now change the numeric locale,
letting extensions such as GTK+ produce the correct results.
See also
PEP 331 - Locale-Independent Float/String Conversions
Written by Christian R. Reis, and implemented by Gustavo Carneiro.
Certain numeric expressions no longer return values restricted to 32 or 64
bits (PEP 237).
You can now put parentheses around the list of names in a frommoduleimportnames statement (PEP 328).
The dict.update() method now accepts the same argument forms as the
dict constructor. This includes any mapping, any iterable of key/value
pairs, and keyword arguments. (Contributed by Raymond Hettinger.)
The string methods ljust(), rjust(), and center() now take
an optional argument for specifying a fill character other than a space.
(Contributed by Raymond Hettinger.)
Strings also gained an rsplit() method that works like the split()
method but splits from the end of the string. (Contributed by Sean
Reifschneider.)
Three keyword parameters, cmp, key, and reverse, were added to the
sort() method of lists. These parameters make some common usages of
sort() simpler. All of these parameters are optional.
For the cmp parameter, the value should be a comparison function that takes
two parameters and returns -1, 0, or +1 depending on how the parameters compare.
This function will then be used to sort the list. Previously this was the only
parameter that could be provided to sort().
key should be a single-parameter function that takes a list element and
returns a comparison key for the element. The list is then sorted using the
comparison keys. The following example sorts a list case-insensitively:
The last example, which uses the cmp parameter, is the old way to perform a
case-insensitive sort. It works but is slower than using a key parameter.
Using key calls lower() method once for each element in the list while
using cmp will call it twice for each comparison, so using key saves on
invocations of the lower() method.
For simple key functions and comparison functions, it is often possible to avoid
a lambda expression by using an unbound method instead. For example,
the above case-insensitive sort is best written as:
Finally, the reverse parameter takes a Boolean value. If the value is true,
the list will be sorted into reverse order. Instead of L.sort();L.reverse(), you can now write L.sort(reverse=True).
The results of sorting are now guaranteed to be stable. This means that two
entries with equal keys will be returned in the same order as they were input.
For example, you can sort a list of people by name, and then sort the list by
age, resulting in a list sorted by age where people with the same age are in
name-sorted order.
(All changes to sort() contributed by Raymond Hettinger.)
There is a new built-in function sorted(iterable)() that works like the
in-place list.sort() method but can be used in expressions. The
differences are:
the input may be any iterable;
a newly formed copy is sorted, leaving the original intact; and
the expression returns the new sorted copy
>>> L=[9,7,8,3,2,4,1,6,5]>>> [10+iforiinsorted(L)]# usable in a list comprehension[11, 12, 13, 14, 15, 16, 17, 18, 19]>>> L# original is left unchanged[9,7,8,3,2,4,1,6,5]>>> sorted('Monty Python')# any iterable may be an input[' ', 'M', 'P', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y', 'y']>>> # List the contents of a dict sorted by key values>>> colormap=dict(red=1,blue=2,green=3,black=4,yellow=5)>>> fork,vinsorted(colormap.iteritems()):... printk,v...black 4blue 2green 3red 1yellow 5
(Contributed by Raymond Hettinger.)
Integer operations will no longer trigger an OverflowWarning. The
OverflowWarning warning will disappear in Python 2.5.
The interpreter gained a new switch, -m, that takes a name, searches
for the corresponding module on sys.path, and runs the module as a script.
For example, you can now run the Python profiler with python-mprofile.
(Contributed by Nick Coghlan.)
The eval(expr,globals,locals)() and execfile(filename,globals,locals)() functions and the exec statement now accept any mapping type
for the locals parameter. Previously this had to be a regular Python
dictionary. (Contributed by Raymond Hettinger.)
The zip() built-in function and itertools.izip() now return an
empty list if called with no arguments. Previously they raised a
TypeError exception. This makes them more suitable for use with variable
length argument lists:
Encountering a failure while importing a module no longer leaves a partially-
initialized module object in sys.modules. The incomplete module object left
behind would fool further imports of the same module into succeeding, leading to
confusing errors. (Fixed by Tim Peters.)
None is now a constant; code that binds a new value to the name
None is now a syntax error. (Contributed by Raymond Hettinger.)
The inner loops for list and tuple slicing were optimized and now run about
one-third faster. The inner loops for dictionaries were also optimized,
resulting in performance boosts for keys(), values(), items(),
iterkeys(), itervalues(), and iteritems(). (Contributed by
Raymond Hettinger.)
The machinery for growing and shrinking lists was optimized for speed and for
space efficiency. Appending and popping from lists now runs faster due to more
efficient code paths and less frequent use of the underlying system
realloc(). List comprehensions also benefit. list.extend() was
also optimized and no longer converts its argument into a temporary list before
extending the base list. (Contributed by Raymond Hettinger.)
list(), tuple(), map(), filter(), and zip() now
run several times faster with non-sequence arguments that supply a
__len__() method. (Contributed by Raymond Hettinger.)
The methods list.__getitem__(), dict.__getitem__(), and
dict.__contains__() are are now implemented as method_descriptor
objects rather than wrapper_descriptor objects. This form of access
doubles their performance and makes them more suitable for use as arguments to
functionals: map(mydict.__getitem__,keylist). (Contributed by Raymond
Hettinger.)
Added a new opcode, LIST_APPEND, that simplifies the generated bytecode
for list comprehensions and speeds them up by about a third. (Contributed by
Raymond Hettinger.)
The peephole bytecode optimizer has been improved to produce shorter, faster
bytecode; remarkably, the resulting bytecode is more readable. (Enhanced by
Raymond Hettinger.)
String concatenations in statements of the form s=s+"abc" and s+="abc" are now performed more efficiently in certain circumstances. This
optimization won’t be present in other Python implementations such as Jython, so
you shouldn’t rely on it; using the join() method of strings is still
recommended when you want to efficiently glue a large number of strings
together. (Contributed by Armin Rigo.)
The net result of the 2.4 optimizations is that Python 2.4 runs the pystone
benchmark around 5% faster than Python 2.3 and 35% faster than Python 2.2.
(pystone is not a particularly good benchmark, but it’s the most commonly used
measurement of Python’s performance. Your own applications may show greater or
smaller benefits from Python 2.4.)
As usual, Python’s standard library received a number of enhancements and bug
fixes. Here’s a partial list of the most notable changes, sorted alphabetically
by module name. Consult the Misc/NEWS file in the source tree for a more
complete list of changes, or look through the CVS logs for all the details.
The asyncore module’s loop() function now has a count parameter
that lets you perform a limited number of passes through the polling loop. The
default is still to loop forever.
The base64 module now has more complete RFC 3548 support for Base64,
Base32, and Base16 encoding and decoding, including optional case folding and
optional alternative alphabets. (Contributed by Barry Warsaw.)
The bisect module now has an underlying C implementation for improved
performance. (Contributed by Dmitry Vasiliev.)
The CJKCodecs collections of East Asian codecs, maintained by Hye-Shik Chang,
was integrated into 2.4. The new encodings are:
Chinese (PRC): gb2312, gbk, gb18030, big5hkscs, hz
Some other new encodings were added: HP Roman8, ISO_8859-11, ISO_8859-16,
PCTP-154, and TIS-620.
The UTF-8 and UTF-16 codecs now cope better with receiving partial input.
Previously the StreamReader class would try to read more data, making
it impossible to resume decoding from the stream. The read() method will
now return as much data as it can and future calls will resume decoding where
previous ones left off. (Implemented by Walter Dörwald.)
There is a new collections module for various specialized collection
datatypes. Currently it contains just one type, deque, a double-
ended queue that supports efficiently adding and removing elements from either
end:
>>> fromcollectionsimportdeque>>> d=deque('ghi')# make a new deque with three items>>> d.append('j')# add a new entry to the right side>>> d.appendleft('f')# add a new entry to the left side>>> d# show the representation of the dequedeque(['f', 'g', 'h', 'i', 'j'])>>> d.pop()# return and remove the rightmost item'j'>>> d.popleft()# return and remove the leftmost item'f'>>> list(d)# list the contents of the deque['g', 'h', 'i']>>> 'h'ind# search the dequeTrue
Several modules, such as the Queue and threading modules, now take
advantage of collections.deque for improved performance. (Contributed
by Raymond Hettinger.)
The ConfigParser classes have been enhanced slightly. The read()
method now returns a list of the files that were successfully parsed, and the
set() method raises TypeError if passed a value argument that
isn’t a string. (Contributed by John Belmonte and David Goodger.)
The curses module now supports the ncurses extension
use_default_colors(). On platforms where the terminal supports
transparency, this makes it possible to use a transparent background.
(Contributed by Jörg Lehmann.)
The difflib module now includes an HtmlDiff class that creates
an HTML table showing a side by side comparison of two versions of a text.
(Contributed by Dan Gass.)
The email package was updated to version 3.0, which dropped various
deprecated APIs and removes support for Python versions earlier than 2.3. The
3.0 version of the package uses a new incremental parser for MIME messages,
available in the email.FeedParser module. The new parser doesn’t require
reading the entire message into memory, and doesn’t raise exceptions if a
message is malformed; instead it records any problems in the defect
attribute of the message. (Developed by Anthony Baxter, Barry Warsaw, Thomas
Wouters, and others.)
The heapq module has been converted to C. The resulting tenfold
improvement in speed makes the module suitable for handling high volumes of
data. In addition, the module has two new functions nlargest() and
nsmallest() that use heaps to find the N largest or smallest values in a
dataset without the expense of a full sort. (Contributed by Raymond Hettinger.)
The httplib module now contains constants for HTTP status codes defined
in various HTTP-related RFC documents. Constants have names such as
OK, CREATED, CONTINUE, and
MOVED_PERMANENTLY; use pydoc to get a full list. (Contributed by
Andrew Eland.)
The imaplib module now supports IMAP’s THREAD command (contributed by
Yves Dionne) and new deleteacl() and myrights() methods (contributed
by Arnaud Mazin).
The itertools module gained a groupby(iterable[,*func*])()
function. iterable is something that can be iterated over to return a stream
of elements, and the optional func parameter is a function that takes an
element and returns a key value; if omitted, the key is simply the element
itself. groupby() then groups the elements into subsequences which have
matching values of the key, and returns a series of 2-tuples containing the key
value and an iterator over the subsequence.
Here’s an example to make this clearer. The key function simply returns
whether a number is even or odd, so the result of groupby() is to return
consecutive runs of odd or even numbers.
groupby() is typically used with sorted input. The logic for
groupby() is similar to the Unix uniq filter which makes it handy for
eliminating, counting, or identifying duplicate elements:
>>> word='abracadabra'>>> letters=sorted(word)# Turn string into a sorted list of letters>>> letters['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']>>> fork,ginitertools.groupby(letters):... printk,list(g)...a ['a', 'a', 'a', 'a', 'a']b ['b', 'b']c ['c']d ['d']r ['r', 'r']>>> # List unique letters>>> [kfork,gingroupby(letters)]['a', 'b', 'c', 'd', 'r']>>> # Count letter occurrences>>> [(k,len(list(g)))fork,gingroupby(letters)][('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
(Contributed by Hye-Shik Chang.)
itertools also gained a function named tee(iterator,N)() that
returns N independent iterators that replicate iterator. If N is omitted,
the default is 2.
>>> L=[1,2,3]>>> i1,i2=itertools.tee(L)>>> i1,i2(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)>>> list(i1)# Run the first iterator to exhaustion[1, 2, 3]>>> list(i2)# Run the second iterator to exhaustion[1, 2, 3]
Note that tee() has to keep copies of the values returned by the
iterator; in the worst case, it may need to keep all of them. This should
therefore be used carefully if the leading iterator can run far ahead of the
trailing iterator in a long stream of inputs. If the separation is large, then
you might as well use list() instead. When the iterators track closely
with one another, tee() is ideal. Possible applications include
bookmarking, windowing, or lookahead iterators. (Contributed by Raymond
Hettinger.)
A number of functions were added to the locale module, such as
bind_textdomain_codeset() to specify a particular encoding and a family of
l*gettext() functions that return messages in the chosen encoding.
(Contributed by Gustavo Niemeyer.)
Some keyword arguments were added to the logging package’s
basicConfig() function to simplify log configuration. The default
behavior is to log messages to standard error, but various keyword arguments can
be specified to log to a particular file, change the logging format, or set the
logging level. For example:
importlogginglogging.basicConfig(filename='/var/log/application.log',level=0,# Log all messagesformat='%(levelname):%(process):%(thread):%(message)')
Other additions to the logging package include a log(level,msg)()
convenience method, as well as a TimedRotatingFileHandler class that
rotates its log files at a timed interval. The module already had
RotatingFileHandler, which rotated logs once the file exceeded a
certain size. Both classes derive from a new BaseRotatingHandler class
that can be used to implement other rotating handlers.
(Changes implemented by Vinay Sajip.)
The marshal module now shares interned strings on unpacking a data
structure. This may shrink the size of certain pickle strings, but the primary
effect is to make .pyc files significantly smaller. (Contributed by
Martin von Löwis.)
The nntplib module’s NNTP class gained description() and
descriptions() methods to retrieve newsgroup descriptions for a single
group or for a range of groups. (Contributed by Jürgen A. Erhard.)
Two new functions were added to the operator module,
attrgetter(attr)() and itemgetter(index)(). Both functions return
callables that take a single argument and return the corresponding attribute or
item; these callables make excellent data extractors when used with map()
or sorted(). For example:
>>> L=[('c',2),('d',1),('a',4),('b',3)]>>> map(operator.itemgetter(0),L)['c', 'd', 'a', 'b']>>> map(operator.itemgetter(1),L)[2, 1, 4, 3]>>> sorted(L,key=operator.itemgetter(1))# Sort list by second tuple item[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
(Contributed by Raymond Hettinger.)
The optparse module was updated in various ways. The module now passes
its messages through gettext.gettext(), making it possible to
internationalize Optik’s help and error messages. Help messages for options can
now include the string '%default', which will be replaced by the option’s
default value. (Contributed by Greg Ward.)
The long-term plan is to deprecate the rfc822 module in some future
Python release in favor of the email package. To this end, the
email.Utils.formatdate() function has been changed to make it usable as a
replacement for rfc822.formatdate(). You may want to write new e-mail
processing code with this in mind. (Change implemented by Anthony Baxter.)
A new urandom(n)() function was added to the os module, returning
a string containing n bytes of random data. This function provides access to
platform-specific sources of randomness such as /dev/urandom on Linux or
the Windows CryptoAPI. (Contributed by Trevor Perrin.)
Another new function: os.path.lexists(path)() returns true if the file
specified by path exists, whether or not it’s a symbolic link. This differs
from the existing os.path.exists(path)() function, which returns false if
path is a symlink that points to a destination that doesn’t exist.
(Contributed by Beni Cherniavsky.)
A new getsid() function was added to the posix module that
underlies the os module. (Contributed by J. Raynor.)
The poplib module now supports POP over SSL. (Contributed by Hector
Urtubia.)
The profile module can now profile C extension functions. (Contributed
by Nick Bastin.)
The random module has a new method called getrandbits(N)() that
returns a long integer N bits in length. The existing randrange()
method now uses getrandbits() where appropriate, making generation of
arbitrarily large random numbers more efficient. (Contributed by Raymond
Hettinger.)
The regular expression language accepted by the re module was extended
with simple conditional expressions, written as (?(group)A|B). group is
either a numeric group ID or a group name defined with (?P<group>...)
earlier in the expression. If the specified group matched, the regular
expression pattern A will be tested against the string; if the group didn’t
match, the pattern B will be used instead. (Contributed by Gustavo Niemeyer.)
The re module is also no longer recursive, thanks to a massive amount
of work by Gustavo Niemeyer. In a recursive regular expression engine, certain
patterns result in a large amount of C stack space being consumed, and it was
possible to overflow the stack. For example, if you matched a 30000-byte string
of a characters against the expression (a|b)+, one stack frame was
consumed per character. Python 2.3 tried to check for stack overflow and raise
a RuntimeError exception, but certain patterns could sidestep the
checking and if you were unlucky Python could segfault. Python 2.4’s regular
expression engine can match this pattern without problems.
The signal module now performs tighter error-checking on the parameters
to the signal.signal() function. For example, you can’t set a handler on
the SIGKILL signal; previous versions of Python would quietly accept
this, but 2.4 will raise a RuntimeError exception.
Two new functions were added to the socket module. socketpair()
returns a pair of connected sockets and getservbyport(port)() looks up the
service name for a given port number. (Contributed by Dave Cole and Barry
Warsaw.)
The sys.exitfunc() function has been deprecated. Code should be using
the existing atexit module, which correctly handles calling multiple exit
functions. Eventually sys.exitfunc() will become a purely internal
interface, accessed only by atexit.
The tarfile module now generates GNU-format tar files by default.
(Contributed by Lars Gustaebel.)
The threading module now has an elegantly simple way to support
thread-local data. The module contains a local class whose attribute
values are local to different threads.
Other threads can assign and retrieve their own values for the number
and url attributes. You can subclass local to initialize
attributes or to add methods. (Contributed by Jim Fulton.)
The timeit module now automatically disables periodic garbage
collection during the timing loop. This change makes consecutive timings more
comparable. (Contributed by Raymond Hettinger.)
The weakref module now supports a wider variety of objects including
Python functions, class instances, sets, frozensets, deques, arrays, files,
sockets, and regular expression pattern objects. (Contributed by Raymond
Hettinger.)
The xmlrpclib module now supports a multi-call extension for
transmitting multiple XML-RPC calls in a single HTTP operation. (Contributed by
Brian Quinlan.)
The mpz, rotor, and xreadlines modules have been
removed.
The cookielib library supports client-side handling for HTTP cookies,
mirroring the Cookie module’s server-side cookie support. Cookies are
stored in cookie jars; the library transparently stores cookies offered by the
web server in the cookie jar, and fetches the cookie from the jar when
connecting to the server. As in web browsers, policy objects control whether
cookies are accepted or not.
In order to store cookies across sessions, two implementations of cookie jars
are provided: one that stores cookies in the Netscape format so applications can
use the Mozilla or Lynx cookie files, and one that stores cookies in the same
format as the Perl libwww library.
urllib2 has been changed to interact with cookielib:
HTTPCookieProcessor manages a cookie jar that is used when accessing
URLs.
The doctest module underwent considerable refactoring thanks to Edward
Loper and Tim Peters. Testing can still be as simple as running
doctest.testmod(), but the refactorings allow customizing the module’s
operation in various ways
The new DocTestFinder class extracts the tests from a given object’s
docstrings:
deff(x,y):""">>> f(2,2)4>>> f(3,2)6 """returnx*yfinder=doctest.DocTestFinder()# Get list of DocTest instancestests=finder.find(f)
The new DocTestRunner class then runs individual tests and can produce
a summary of the results:
DocTestRunner uses an instance of the OutputChecker class to
compare the expected output with the actual output. This class takes a number
of different flags that customize its behaviour; ambitious users can also write
a completely new subclass of OutputChecker.
The default output checker provides a number of handy features. For example,
with the doctest.ELLIPSIS option flag, an ellipsis (...) in the
expected output matches any substring, making it easier to accommodate outputs
that vary in minor ways:
defo(n):""">>> o(1)<__main__.C instance at 0x...>>>>"""
Another special string, <BLANKLINE>, matches a blank line:
defg(n):""">>> g(4)hereisalengthy>>>"""L='here is a rather lengthy list of words'.split()forwordinL[:n]:printword
Running the above function’s tests with doctest.REPORT_UDIFF specified,
you get the following output:
**********************************************************************
File "t.py", line 15, in g
Failed example:
g(4)
Differences (unified diff with -expected +actual):
@@ -2,3 +2,3 @@
is
a
-lengthy
+rather
**********************************************************************
Some of the changes to Python’s build process and to the C API are:
Three new convenience macros were added for common return values from
extension functions: Py_RETURN_NONE, Py_RETURN_TRUE, and
Py_RETURN_FALSE. (Contributed by Brett Cannon.)
Another new macro, Py_CLEAR(obj), decreases the reference count of
obj and sets obj to the null pointer. (Contributed by Jim Fulton.)
A new function, PyTuple_Pack(N,obj1,obj2,...,objN)(), constructs
tuples from a variable length argument list of Python objects. (Contributed by
Raymond Hettinger.)
A new function, PyDict_Contains(d,k)(), implements fast dictionary
lookups without masking exceptions raised during the look-up process.
(Contributed by Raymond Hettinger.)
The Py_IS_NAN(X) macro returns 1 if its float or double argument
X is a NaN. (Contributed by Tim Peters.)
C code can avoid unnecessary locking by using the new
PyEval_ThreadsInitialized() function to tell if any thread operations
have been performed. If this function returns false, no lock operations are
needed. (Contributed by Nick Coghlan.)
A new method flag, METH_COEXISTS, allows a function defined in slots
to co-exist with a PyCFunction having the same name. This can halve
the access time for a method such as set.__contains__(). (Contributed by
Raymond Hettinger.)
Python can now be built with additional profiling for the interpreter itself,
intended as an aid to people developing the Python core. Providing
----enable-profiling to the configure script will let you
profile the interpreter with gprof, and providing the
----with-tsc switch enables profiling using the Pentium’s Time-Stamp-
Counter register. Note that the ----with-tsc switch is slightly
misnamed, because the profiling feature also works on the PowerPC platform,
though that processor architecture doesn’t call that register “the TSC
register”. (Contributed by Jeremy Hylton.)
The tracebackobject type has been renamed to
PyTracebackObject.
This section lists previously described changes that may require changes to your
code:
Left shifts and hexadecimal/octal constants that are too large no longer
trigger a FutureWarning and return a value limited to 32 or 64 bits;
instead they return a long integer.
Integer operations will no longer trigger an OverflowWarning. The
OverflowWarning warning will disappear in Python 2.5.
The zip() built-in function and itertools.izip() now return an
empty list instead of raising a TypeError exception if called with no
arguments.
You can no longer compare the date and datetime instances
provided by the datetime module. Two instances of different classes
will now always be unequal, and relative comparisons (<, >) will raise
a TypeError.
dircache.listdir() now passes exceptions to the caller instead of
returning empty lists.
LexicalHandler.startDTD() used to receive the public and system IDs in
the wrong order. This has been corrected; applications relying on the wrong
order need to be fixed.
fcntl.ioctl() now warns if the mutate argument is omitted and
relevant.
The tarfile module now generates GNU-format tar files by default.
Encountering a failure while importing a module no longer leaves a partially-
initialized module object in sys.modules.
None is now a constant; code that binds a new value to the name
None is now a syntax error.
The signals.signal() function now raises a RuntimeError exception
for certain illegal values; previously these errors would pass silently. For
example, you can no longer set a handler on the SIGKILL signal.
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Koray Can, Hye-
Shik Chang, Michael Dyck, Raymond Hettinger, Brian Hurt, Hamish Lawson, Fredrik
Lundh, Sean Reifschneider, Sadruddin Rejeb.
This article explains the new features in Python 2.3. Python 2.3 was released
on July 29, 2003.
The main themes for Python 2.3 are polishing some of the features added in 2.2,
adding various small but useful enhancements to the core language, and expanding
the standard library. The new object model introduced in the previous version
has benefited from 18 months of bugfixes and from optimization efforts that have
improved the performance of new-style classes. A few new built-in functions
have been added such as sum() and enumerate(). The in
operator can now be used for substring searches (e.g. "ab"in"abc" returns
True).
Some of the many new library features include Boolean, set, heap, and date/time
data types, the ability to import modules from ZIP-format archives, metadata
support for the long-awaited Python catalog, an updated version of IDLE, and
modules for logging messages, wrapping text, parsing CSV files, processing
command-line options, using BerkeleyDB databases... the list of new and
enhanced modules is lengthy.
This article doesn’t attempt to provide a complete specification of the new
features, but instead provides a convenient overview. For full details, you
should refer to the documentation for Python 2.3, such as the Python Library
Reference and the Python Reference Manual. If you want to understand the
complete implementation and design rationale, refer to the PEP for a particular
new feature.
The new sets module contains an implementation of a set datatype. The
Set class is for mutable sets, sets that can have members added and
removed. The ImmutableSet class is for sets that can’t be modified,
and instances of ImmutableSet can therefore be used as dictionary keys.
Sets are built on top of dictionaries, so the elements within a set must be
hashable.
The union and intersection of sets can be computed with the union() and
intersection() methods; an alternative notation uses the bitwise operators
& and |. Mutable sets also have in-place versions of these methods,
union_update() and intersection_update().
It’s also possible to take the symmetric difference of two sets. This is the
set of all elements in the union that aren’t in the intersection. Another way
of putting it is that the symmetric difference contains all elements that are in
exactly one set. Again, there’s an alternative notation (^), and an in-
place version with the ungainly name symmetric_difference_update().
In Python 2.2, generators were added as an optional feature, to be enabled by a
from__future__importgenerators directive. In 2.3 generators no longer
need to be specially enabled, and are now always present; this means that
yield is now always a keyword. The rest of this section is a copy of
the description of generators from the “What’s New in Python 2.2” document; if
you read it back when Python 2.2 came out, you can skip the rest of this
section.
You’re doubtless familiar with how function calls work in Python or C. When you
call a function, it gets a private namespace where its local variables are
created. When the function reaches a return statement, the local
variables are destroyed and the resulting value is returned to the caller. A
later call to the same function will get a fresh new set of local variables.
But, what if the local variables weren’t thrown away on exiting a function?
What if you could later resume the function where it left off? This is what
generators provide; they can be thought of as resumable functions.
Here’s the simplest example of a generator function:
defgenerate_ints(N):foriinrange(N):yieldi
A new keyword, yield, was introduced for generators. Any function
containing a yield statement is a generator function; this is
detected by Python’s bytecode compiler which compiles the function specially as
a result.
When you call a generator function, it doesn’t return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
the yield statement, the generator outputs the value of i,
similar to a return statement. The big difference between
yield and a return statement is that on reaching a
yield the generator’s state of execution is suspended and local
variables are preserved. On the next call to the generator’s .next()
method, the function will resume executing immediately after the
yield statement. (For complicated reasons, the yield
statement isn’t allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the
interaction between yield and exceptions.)
Here’s a sample usage of the generate_ints() generator:
>>> gen=generate_ints(3)>>> gen<generator object at 0x8117f90>>>> gen.next()0>>> gen.next()1>>> gen.next()2>>> gen.next()Traceback (most recent call last):
File "stdin", line 1, in ?
File "stdin", line 2, in generate_intsStopIteration
You could equally write foriingenerate_ints(5), or a,b,c=generate_ints(3).
Inside a generator function, the return statement can only be used
without a value, and signals the end of the procession of values; afterwards the
generator cannot return any further values. return with a value, such
as return5, is a syntax error inside a generator function. The end of the
generator’s results can also be indicated by raising StopIteration
manually, or by just letting the flow of execution fall off the bottom of the
function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting self.count to
0, and having the next() method increment self.count and return it.
However, for a moderately complicated generator, writing a corresponding class
would be much messier. Lib/test/test_generators.py contains a number of
more interesting examples. The simplest one implements an in-order traversal of
a tree using generators recursively.
# A recursive generator that generates Tree leaves in in-order.definorder(t):ift:forxininorder(t.left):yieldxyieldt.labelforxininorder(t.right):yieldx
Two other examples in Lib/test/test_generators.py produce solutions for
the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no
queen threatens another) and the Knight’s Tour (a route that takes a knight to
every square of an $NxN$ chessboard without visiting any square twice).
The idea of generators comes from other programming languages, especially Icon
(http://www.cs.arizona.edu/icon/), where the idea of generators is central. In
Icon, every expression and function call behaves like a generator. One example
from “An Overview of the Icon Programming Language” at
http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks
like:
sentence:="Store it in the neighboring harbor"if(i:=find("or",sentence))>5thenwrite(i)
In Icon the find() function returns the indexes at which the substring
“or” is found: 3, 23, 33. In the if statement, i is first
assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
retries it with the second value of 23. 23 is greater than 5, so the comparison
now succeeds, and the code prints the value 23 to the screen.
Python doesn’t go nearly as far as Icon in adopting generators as a central
concept. Generators are considered part of the core Python language, but
learning or using them isn’t compulsory; if they don’t solve any problems that
you have, feel free to ignore them. One novel feature of Python’s interface as
compared to Icon’s is that a generator’s state is represented as a concrete
object (the iterator) that can be passed around to other functions or stored in
a data structure.
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly
by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
Python source files can now be declared as being in different character set
encodings. Encodings are declared by including a specially formatted comment in
the first or second line of the source file. For example, a UTF-8 file can be
declared with:
#!/usr/bin/env python# -*- coding: UTF-8 -*-
Without such an encoding declaration, the default encoding used is 7-bit ASCII.
Executing or importing modules that contain string literals with 8-bit
characters and have no encoding declaration will result in a
DeprecationWarning being signalled by Python 2.3; in 2.4 this will be a
syntax error.
The encoding declaration only affects Unicode string literals, which will be
converted to Unicode using the specified encoding. Note that Python identifiers
are still restricted to ASCII characters, so you can’t have variable names that
use characters outside of the usual alphanumerics.
The new zipimport module adds support for importing modules from a ZIP-
format archive. You don’t need to import the module explicitly; it will be
automatically imported if a ZIP archive’s filename is added to sys.path.
For example:
amk@nyman:~/src/python$ unzip -l /tmp/example.zip
Archive: /tmp/example.zip
Length Date Time Name
-------- ---- ---- ----
8467 11-26-02 22:30 jwzthreading.py
-------- -------
8467 1 file
amk@nyman:~/src/python$ ./python
Python 2.3 (#1, Aug 1 2003, 19:54:32)
>>> import sys
>>> sys.path.insert(0, '/tmp/example.zip') # Add .zip file to front of path
>>> import jwzthreading
>>> jwzthreading.__file__
'/tmp/example.zip/jwzthreading.py'
>>>
An entry in sys.path can now be the filename of a ZIP archive. The ZIP
archive can contain any kind of files, but only files named *.py,
*.pyc, or *.pyo can be imported. If an archive only contains
*.py files, Python will not attempt to modify the archive by adding the
corresponding *.pyc file, meaning that if a ZIP archive doesn’t contain
*.pyc files, importing may be rather slow.
A path within the archive can also be specified to only import from a
subdirectory; for example, the path /tmp/example.zip/lib/ would only
import from the lib/ subdirectory within the archive.
Written by James C. Ahlstrom, who also provided an implementation. Python 2.3
follows the specification in PEP 273, but uses an implementation written by
Just van Rossum that uses the import hooks described in PEP 302. See section
PEP 302: New Import Hooks for a description of the new import hooks.
PEP 277: Unicode file name support for Windows NT¶
On Windows NT, 2000, and XP, the system stores file names as Unicode strings.
Traditionally, Python has represented file names as byte strings, which is
inadequate because it renders some file names inaccessible.
Python now allows using arbitrary Unicode strings (within the limitations of the
file system) for all functions that expect file names, most notably the
open() built-in function. If a Unicode string is passed to
os.listdir(), Python now returns a list of Unicode strings. A new
function, os.getcwdu(), returns the current directory as a Unicode string.
Byte strings still work as file names, and on Windows Python will transparently
convert them to Unicode using the mbcs encoding.
Other systems also allow Unicode strings as file names but convert them to byte
strings before passing them to the system, which can cause a UnicodeError
to be raised. Applications can test whether arbitrary Unicode strings are
supported as file names by checking os.path.supports_unicode_filenames,
a Boolean value.
Under MacOS, os.listdir() may now return Unicode filenames.
See also
PEP 277 - Unicode file name support for Windows NT
Written by Neil Hodgson; implemented by Neil Hodgson, Martin von Löwis, and Mark
Hammond.
The three major operating systems used today are Microsoft Windows, Apple’s
Macintosh OS, and the various Unix derivatives. A minor irritation of cross-
platform work is that these three platforms all use different characters to
mark the ends of lines in text files. Unix uses the linefeed (ASCII character
10), MacOS uses the carriage return (ASCII character 13), and Windows uses a
two-character sequence of a carriage return plus a newline.
Python’s file objects can now support end of line conventions other than the one
followed by the platform on which Python is running. Opening a file with the
mode 'U' or 'rU' will open a file for reading in universal newline mode.
All three line ending conventions will be translated to a '\n' in the
strings returned by the various file methods such as read() and
readline().
Universal newline support is also used when importing modules and when executing
a file with the execfile() function. This means that Python modules can
be shared between all three operating systems without needing to convert the
line-endings.
This feature can be disabled when compiling Python by specifying the
--without-universal-newlines switch when running Python’s
configure script.
A new built-in function, enumerate(), will make certain loops a bit
clearer. enumerate(thing), where thing is either an iterator or a
sequence, returns a iterator that will return (0,thing[0]), (1,thing[1]), (2,thing[2]), and so forth.
A common idiom to change every element of a list looks like this:
foriinrange(len(L)):item=L[i]# ... compute some result based on item ...L[i]=result
A standard package for writing logs, logging, has been added to Python
2.3. It provides a powerful and flexible mechanism for generating logging
output which can then be filtered and processed in various ways. A
configuration file written in a standard format can be used to control the
logging behavior of a program. Python includes handlers that will write log
records to standard error or to a file or socket, send them to the system log,
or even e-mail them to a particular address; of course, it’s also possible to
write your own handler classes.
The Logger class is the primary class. Most application code will deal
with one or more Logger objects, each one used by a particular
subsystem of the application. Each Logger is identified by a name, and
names are organized into a hierarchy using . as the component separator.
For example, you might have Logger instances named server,
server.auth and server.network. The latter two instances are below
server in the hierarchy. This means that if you turn up the verbosity for
server or direct server messages to a different handler, the changes
will also apply to records logged to server.auth and server.network.
There’s also a root Logger that’s the parent of all other loggers.
For simple uses, the logging package contains some convenience functions
that always use the root log:
In the default configuration, informational and debugging messages are
suppressed and the output is sent to standard error. You can enable the display
of informational and debugging messages by calling the setLevel() method
on the root logger.
Notice the warning() call’s use of string formatting operators; all of the
functions for logging messages take the arguments (msg,arg1,arg2,...) and
log the string resulting from msg%(arg1,arg2,...).
There’s also an exception() function that records the most recent
traceback. Any of the other functions will also record the traceback if you
specify a true value for the keyword argument exc_info.
Slightly more advanced programs will use a logger other than the root logger.
The getLogger(name)() function is used to get a particular log, creating
it if it doesn’t exist yet. getLogger(None)() returns the root logger.
log=logging.getLogger('server')...log.info('Listening on port %i',port)...log.critical('Disk full')...
Log records are usually propagated up the hierarchy, so a message logged to
server.auth is also seen by server and root, but a Logger
can prevent this by setting its propagate attribute to False.
There are more classes provided by the logging package that can be
customized. When a Logger instance is told to log a message, it
creates a LogRecord instance that is sent to any number of different
Handler instances. Loggers and handlers can also have an attached list
of filters, and each filter can cause the LogRecord to be ignored or
can modify the record before passing it along. When they’re finally output,
LogRecord instances are converted to text by a Formatter
class. All of these classes can be replaced by your own specially-written
classes.
With all of these features the logging package should provide enough
flexibility for even the most complicated applications. This is only an
incomplete overview of its features, so please see the package’s reference
documentation for all of the details. Reading PEP 282 will also be helpful.
A Boolean type was added to Python 2.3. Two new constants were added to the
__builtin__ module, True and False. (True and
False constants were added to the built-ins in Python 2.2.1, but the
2.2.1 versions are simply set to integer values of 1 and 0 and aren’t a
different type.)
The type object for this new type is named bool; the constructor for it
takes any Python value and converts it to True or False.
Python’s Booleans were added with the primary goal of making code clearer. For
example, if you’re reading a function and encounter the statement return1,
you might wonder whether the 1 represents a Boolean truth value, an index,
or a coefficient that multiplies some other quantity. If the statement is
returnTrue, however, the meaning of the return value is quite clear.
Python’s Booleans were not added for the sake of strict type-checking. A very
strict language such as Pascal would also prevent you performing arithmetic with
Booleans, and would require that the expression in an if statement
always evaluate to a Boolean result. Python is not this strict and never will
be, as PEP 285 explicitly says. This means you can still use any expression
in an if statement, even ones that evaluate to a list or tuple or
some random object. The Boolean type is a subclass of the int class so
that arithmetic using a Boolean still works.
>>> True+12>>> False+11>>> False*750>>> True*7575
To sum up True and False in a sentence: they’re alternative
ways to spell the integer values 1 and 0, with the single difference that
str() and repr() return the strings 'True' and 'False'
instead of '1' and '0'.
When encoding a Unicode string into a byte string, unencodable characters may be
encountered. So far, Python has allowed specifying the error processing as
either “strict” (raising UnicodeError), “ignore” (skipping the
character), or “replace” (using a question mark in the output string), with
“strict” being the default behavior. It may be desirable to specify alternative
processing of such errors, such as inserting an XML character reference or HTML
entity reference into the converted string.
Python now has a flexible framework to add different processing strategies. New
error handlers can be added with codecs.register_error(), and codecs then
can access the error handler with codecs.lookup_error(). An equivalent C
API has been added for codecs written in C. The error handler gets the necessary
state information such as the string being converted, the position in the string
where the error was detected, and the target encoding. The handler can then
either raise an exception or return a replacement string.
Two additional error handlers have been implemented using this framework:
“backslashreplace” uses Python backslash quoting to represent unencodable
characters and “xmlcharrefreplace” emits XML character references.
PEP 301: Package Index and Metadata for Distutils¶
Support for the long-requested Python catalog makes its first appearance in 2.3.
The heart of the catalog is the new Distutils register command.
Running pythonsetup.pyregister will collect the metadata describing a
package, such as its name, version, maintainer, description, &c., and send it to
a central catalog server. The resulting catalog is available from
http://www.python.org/pypi.
To make the catalog a bit more useful, a new optional classifiers keyword
argument has been added to the Distutils setup() function. A list of
Trove-style strings can be supplied to help
classify the software.
Here’s an example setup.py with classifiers, written to be compatible
with older versions of the Distutils:
fromdistutilsimportcorekw={'name':"Quixote",'version':"0.5.1",'description':"A highly Pythonic Web application framework",# ...}if(hasattr(core,'setup_keywords')and'classifiers'incore.setup_keywords):kw['classifiers']= \
['Topic :: Internet :: WWW/HTTP :: Dynamic Content','Environment :: No Input/Output (Daemon)','Intended Audience :: Developers'],core.setup(**kw)
The full list of classifiers can be obtained by running pythonsetup.pyregister--list-classifiers.
See also
PEP 301 - Package Index and Metadata for Distutils
While it’s been possible to write custom import hooks ever since the
ihooks module was introduced in Python 1.3, no one has ever been really
happy with it because writing new import hooks is difficult and messy. There
have been various proposed alternatives such as the imputil and iu
modules, but none of them has ever gained much acceptance, and none of them were
easily usable from C code.
PEP 302 borrows ideas from its predecessors, especially from Gordon
McMillan’s iu module. Three new items are added to the sys
module:
sys.path_hooks is a list of callable objects; most often they’ll be
classes. Each callable takes a string containing a path and either returns an
importer object that will handle imports from this path or raises an
ImportError exception if it can’t handle this path.
sys.path_importer_cache caches importer objects for each path, so
sys.path_hooks will only need to be traversed once for each path.
sys.meta_path is a list of importer objects that will be traversed before
sys.path is checked. This list is initially empty, but user code can add
objects to it. Additional built-in and frozen modules can be imported by an
object added to this list.
Importer objects must have a single method, find_module(fullname,path=None)(). fullname will be a module or package name, e.g. string or
distutils.core. find_module() must return a loader object that has a
single method, load_module(fullname)(), that creates and returns the
corresponding module object.
Pseudo-code for Python’s new import logic, therefore, looks something like this
(simplified a bit; see PEP 302 for the full details):
formpinsys.meta_path:loader=mp(fullname)ifloaderisnotNone:<module>=loader.load_module(fullname)forpathinsys.path:forhookinsys.path_hooks:try:importer=hook(path)exceptImportError:# ImportError, so try the other path hookspasselse:loader=importer.find_module(fullname)<module>=loader.load_module(fullname)# Not found!raiseImportError
Comma-separated files are a format frequently used for exporting data from
databases and spreadsheets. Python 2.3 adds a parser for comma-separated files.
Comma-separated format is deceptively simple at first glance:
Costs,150,200,3.95
Read a line and call line.split(','): what could be simpler? But toss in
string data that can contain commas, and things get more complicated:
"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
A big ugly regular expression can parse this, but using the new csv
package is much simpler:
The reader() function takes a number of different options. The field
separator isn’t limited to the comma and can be changed to any character, and so
can the quoting and line-ending characters.
Different dialects of comma-separated files can be defined and registered;
currently there are two dialects, both used by Microsoft Excel. A separate
csv.writer class will generate comma-separated files from a succession
of tuples or lists, quoting strings that contain the delimiter.
The pickle and cPickle modules received some attention during the
2.3 development cycle. In 2.2, new-style classes could be pickled without
difficulty, but they weren’t pickled very compactly; PEP 307 quotes a trivial
example where a new-style class results in a pickled string three times longer
than that for a classic class.
The solution was to invent a new pickle protocol. The pickle.dumps()
function has supported a text-or-binary flag for a long time. In 2.3, this
flag is redefined from a Boolean to an integer: 0 is the old text-mode pickle
format, 1 is the old binary format, and now 2 is a new 2.3-specific format. A
new constant, pickle.HIGHEST_PROTOCOL, can be used to select the
fanciest protocol available.
Unpickling is no longer considered a safe operation. 2.2’s pickle
provided hooks for trying to prevent unsafe classes from being unpickled
(specifically, a __safe_for_unpickling__ attribute), but none of this
code was ever audited and therefore it’s all been ripped out in 2.3. You should
not unpickle untrusted data in any version of Python.
To reduce the pickling overhead for new-style classes, a new interface for
customizing pickling was added using three special methods:
__getstate__(), __setstate__(), and __getnewargs__(). Consult
PEP 307 for the full semantics of these methods.
As a way to compress pickles yet further, it’s now possible to use integer codes
instead of long strings to identify pickled classes. The Python Software
Foundation will maintain a list of standardized codes; there’s also a range of
codes for private use. Currently no codes have been specified.
Ever since Python 1.4, the slicing syntax has supported an optional third “step”
or “stride” argument. For example, these are all legal Python syntax:
L[1:10:2], L[:-1:1], L[::-1]. This was added to Python at the
request of the developers of Numerical Python, which uses the third argument
extensively. However, Python’s built-in list, tuple, and string sequence types
have never supported this feature, raising a TypeError if you tried it.
Michael Hudson contributed a patch to fix this shortcoming.
For example, you can now easily extract the elements of a list that have even
indexes:
>>> L=range(10)>>> L[::2][0, 2, 4, 6, 8]
Negative values also work to make a copy of the same list in reverse order:
>>> L[::-1][9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
This also works for tuples, arrays, and strings:
>>> s='abcd'>>> s[::2]'ac'>>> s[::-1]'dcba'
If you have a mutable sequence such as a list or an array you can assign to or
delete an extended slice, but there are some differences between assignment to
extended and regular slices. Assignment to a regular slice can be used to
change the length of the sequence:
Extended slices aren’t this flexible. When assigning to an extended slice, the
list on the right hand side of the statement must contain the same number of
items as the slice it is replacing:
>>> a=range(4)>>> a[0, 1, 2, 3]>>> a[::2][0, 2]>>> a[::2]=[0,-1]>>> a[0, 1, -1, 3]>>> a[::2]=[0,1,2]Traceback (most recent call last):
File "<stdin>", line 1, in ?ValueError: attempt to assign sequence of size 3 to extended slice of size 2
One can also now pass slice objects to the __getitem__() methods of the
built-in sequences:
>>> range(10).__getitem__(slice(0,5,2))[0, 2, 4]
Or use slice objects directly in subscripts:
>>> range(10)[slice(0,5,2)][0, 2, 4]
To simplify implementing sequences that support extended slicing, slice objects
now have a method indices(length)() which, given the length of a sequence,
returns a (start,stop,step) tuple that can be passed directly to
range(). indices() handles omitted and out-of-bounds indices in a
manner consistent with regular slices (and this innocuous phrase hides a welter
of confusing details!). The method is intended to be used like this:
From this example you can also see that the built-in slice object is
now the type object for the slice type, and is no longer a function. This is
consistent with Python 2.2, where int, str, etc., underwent
the same change.
Two new constants, True and False were added along with the
built-in bool type, as described in section PEP 285: A Boolean Type of this
document.
The int() type constructor will now return a long integer instead of
raising an OverflowError when a string or floating-point number is too
large to fit into an integer. This can lead to the paradoxical result that
isinstance(int(expression),int) is false, but that seems unlikely to cause
problems in practice.
Built-in types now support the extended slicing syntax, as described in
section Extended Slices of this document.
A new built-in function, sum(iterable,start=0)(), adds up the numeric
items in the iterable object and returns their sum. sum() only accepts
numbers, meaning that you can’t use it to concatenate a bunch of strings.
(Contributed by Alex Martelli.)
list.insert(pos,value) used to insert value at the front of the list
when pos was negative. The behaviour has now been changed to be consistent
with slice indexing, so when pos is -1 the value will be inserted before the
last element, and so forth.
list.index(value), which searches for value within the list and returns
its index, now takes optional start and stop arguments to limit the search
to only part of the list.
Dictionaries have a new method, pop(key[,*default*])(), that returns
the value corresponding to key and removes that key/value pair from the
dictionary. If the requested key isn’t present in the dictionary, default is
returned if it’s specified and KeyError raised if it isn’t.
>>> d={1:2}>>> d{1: 2}>>> d.pop(4)Traceback (most recent call last):
File "stdin", line 1, in ?KeyError: 4>>> d.pop(1)2>>> d.pop(1)Traceback (most recent call last):
File "stdin", line 1, in ?KeyError: 'pop(): dictionary is empty'>>> d{}>>>
There’s also a new class method, dict.fromkeys(iterable,value)(), that
creates a dictionary with keys taken from the supplied iterator iterable and
all values set to value, defaulting to None.
(Patches contributed by Raymond Hettinger.)
Also, the dict() constructor now accepts keyword arguments to simplify
creating small dictionaries:
The assert statement no longer checks the __debug__ flag, so
you can no longer disable assertions by assigning to __debug__. Running
Python with the -O switch will still generate code that doesn’t
execute any assertions.
Most type objects are now callable, so you can use them to create new objects
such as functions, classes, and modules. (This means that the new module
can be deprecated in a future Python version, because you can now use the type
objects available in the types module.) For example, you can create a new
module object with the following code:
A new warning, PendingDeprecationWarning was added to indicate features
which are in the process of being deprecated. The warning will not be printed
by default. To check for use of features that will be deprecated in the future,
supply -Walways::PendingDeprecationWarning:: on the command line or
use warnings.filterwarnings().
The process of deprecating string-based exceptions, as in raise"Erroroccurred", has begun. Raising a string will now trigger
PendingDeprecationWarning.
Using None as a variable name will now result in a SyntaxWarning
warning. In a future version of Python, None may finally become a keyword.
The xreadlines() method of file objects, introduced in Python 2.1, is no
longer necessary because files now behave as their own iterator.
xreadlines() was originally introduced as a faster way to loop over all
the lines in a file, but now you can simply write forlineinfile_obj.
File objects also have a new read-only encoding attribute that gives the
encoding used by the file; Unicode strings written to the file will be
automatically converted to bytes using the given encoding.
The method resolution order used by new-style classes has changed, though
you’ll only notice the difference if you have a really complicated inheritance
hierarchy. Classic classes are unaffected by this change. Python 2.2
originally used a topological sort of a class’s ancestors, but 2.3 now uses the
C3 algorithm as described in the paper “A Monotonic Superclass Linearization
for Dylan”. To
understand the motivation for this change, read Michele Simionato’s article
“Python 2.3 Method Resolution Order”, or
read the thread on python-dev starting with the message at
http://mail.python.org/pipermail/python-dev/2002-October/029035.html. Samuele
Pedroni first pointed out the problem and also implemented the fix by coding the
C3 algorithm.
Python runs multithreaded programs by switching between threads after
executing N bytecodes. The default value for N has been increased from 10 to
100 bytecodes, speeding up single-threaded applications by reducing the
switching overhead. Some multithreaded applications may suffer slower response
time, but that’s easily fixed by setting the limit back to a lower number using
sys.setcheckinterval(N)(). The limit can be retrieved with the new
sys.getcheckinterval() function.
One minor but far-reaching change is that the names of extension types defined
by the modules included with Python now contain the module and a '.' in
front of the type name. For example, in Python 2.2, if you created a socket and
printed its __class__, you’d get this output:
One of the noted incompatibilities between old- and new-style classes has been
removed: you can now assign to the __name__ and __bases__
attributes of new-style classes. There are some restrictions on what can be
assigned to __bases__ along the lines of those relating to assigning to
an instance’s __class__ attribute.
The in operator now works differently for strings. Previously, when
evaluating XinY where X and Y are strings, X could only be a single
character. That’s now changed; X can be a string of any length, and XinY
will return True if X is a substring of Y. If X is the empty
string, the result is always True.
Note that this doesn’t tell you where the substring starts; if you need that
information, use the find() string method.
The strip(), lstrip(), and rstrip() string methods now have
an optional argument for specifying the characters to strip. The default is
still to remove all whitespace characters:
(Suggested by Simon Brunning and implemented by Walter Dörwald.)
The startswith() and endswith() string methods now accept negative
numbers for the start and end parameters.
Another new string method is zfill(), originally a function in the
string module. zfill() pads a numeric string with zeros on the
left until it’s the specified width. Note that the % operator is still more
flexible and powerful than zfill().
A new type object, basestring, has been added. Both 8-bit strings and
Unicode strings inherit from this type, so isinstance(obj,basestring) will
return True for either kind of string. It’s a completely abstract
type, so you can’t create basestring instances.
Interned strings are no longer immortal and will now be garbage-collected in
the usual way when the only reference to them is from the internal dictionary of
interned strings. (Implemented by Oren Tirosh.)
The creation of new-style class instances has been made much faster; they’re
now faster than classic classes!
The sort() method of list objects has been extensively rewritten by Tim
Peters, and the implementation is significantly faster.
Multiplication of large long integers is now much faster thanks to an
implementation of Karatsuba multiplication, an algorithm that scales better than
the O(n*n) required for the grade-school multiplication algorithm. (Original
patch by Christopher A. Craig, and significantly reworked by Tim Peters.)
The SET_LINENO opcode is now gone. This may provide a small speed
increase, depending on your compiler’s idiosyncrasies. See section
Other Changes and Fixes for a longer explanation. (Removed by Michael Hudson.)
xrange() objects now have their own iterator, making foriinxrange(n) slightly faster than foriinrange(n). (Patch by Raymond
Hettinger.)
A number of small rearrangements have been made in various hotspots to improve
performance, such as inlining a function or removing some code. (Implemented
mostly by GvR, but lots of people have contributed single changes.)
The net result of the 2.3 optimizations is that Python 2.3 runs the pystone
benchmark around 25% faster than Python 2.2.
As usual, Python’s standard library received a number of enhancements and bug
fixes. Here’s a partial list of the most notable changes, sorted alphabetically
by module name. Consult the Misc/NEWS file in the source tree for a more
complete list of changes, or look through the CVS logs for all the details.
The array module now supports arrays of Unicode characters using the
'u' format character. Arrays also now support using the += assignment
operator to add another array’s contents, and the *= assignment operator to
repeat an array. (Contributed by Jason Orendorff.)
The bsddb module has been replaced by version 4.1.6 of the PyBSDDB package, providing a more complete interface
to the transactional features of the BerkeleyDB library.
The old version of the module has been renamed to bsddb185 and is no
longer built automatically; you’ll have to edit Modules/Setup to enable
it. Note that the new bsddb package is intended to be compatible with
the old module, so be sure to file bugs if you discover any incompatibilities.
When upgrading to Python 2.3, if the new interpreter is compiled with a new
version of the underlying BerkeleyDB library, you will almost certainly have to
convert your database files to the new version. You can do this fairly easily
with the new scripts db2pickle.py and pickle2db.py which you
will find in the distribution’s Tools/scripts directory. If you’ve
already been using the PyBSDDB package and importing it as bsddb3, you
will have to change your import statements to import it as bsddb.
The new bz2 module is an interface to the bz2 data compression library.
bz2-compressed data is usually smaller than corresponding zlib-compressed data. (Contributed by Gustavo Niemeyer.)
A set of standard date/time types has been added in the new datetime
module. See the following section for more details.
The Distutils Extension class now supports an extra constructor
argument named depends for listing additional source files that an extension
depends on. This lets Distutils recompile the module if any of the dependency
files are modified. For example, if sampmodule.c includes the header
file sample.h, you would create the Extension object like
this:
Modifying sample.h would then cause the module to be recompiled.
(Contributed by Jeremy Hylton.)
Other minor changes to Distutils: it now checks for the CC,
CFLAGS, CPP, LDFLAGS, and CPPFLAGS
environment variables, using them to override the settings in Python’s
configuration (contributed by Robert Weber).
Previously the doctest module would only search the docstrings of
public methods and functions for test cases, but it now also examines private
ones as well. The DocTestSuite(() function creates a
unittest.TestSuite object from a set of doctest tests.
The new gc.get_referents(object)() function returns a list of all the
objects referenced by object.
The getopt module gained a new function, gnu_getopt(), that
supports the same arguments as the existing getopt() function but uses
GNU-style scanning mode. The existing getopt() stops processing options as
soon as a non-option argument is encountered, but in GNU-style mode processing
continues, meaning that options and arguments can be mixed. For example:
The gzip module can now handle files exceeding 2 GiB.
The new heapq module contains an implementation of a heap queue
algorithm. A heap is an array-like data structure that keeps items in a
partially sorted order such that, for every index k, heap[k]<=heap[2*k+1] and heap[k]<=heap[2*k+2]. This makes it quick to remove the
smallest item, and inserting a new item while maintaining the heap property is
O(lg n). (See http://www.nist.gov/dads/HTML/priorityque.html for more
information about the priority queue data structure.)
The heapq module provides heappush() and heappop() functions
for adding and removing items while maintaining the heap property on top of some
other mutable Python sequence type. Here’s an example that uses a Python list:
The IDLE integrated development environment has been updated using the code
from the IDLEfork project (http://idlefork.sf.net). The most notable feature is
that the code being developed is now executed in a subprocess, meaning that
there’s no longer any need for manual reload() operations. IDLE’s core code
has been incorporated into the standard library as the idlelib package.
The imaplib module now supports IMAP over SSL. (Contributed by Piers
Lauder and Tino Lange.)
The itertools contains a number of useful functions for use with
iterators, inspired by various functions provided by the ML and Haskell
languages. For example, itertools.ifilter(predicate,iterator) returns all
elements in the iterator for which the function predicate() returns
True, and itertools.repeat(obj,N) returns objN times.
There are a number of other functions in the module; see the package’s reference
documentation for details.
(Contributed by Raymond Hettinger.)
Two new functions in the math module, degrees(rads)() and
radians(degs)(), convert between radians and degrees. Other functions in
the math module such as math.sin() and math.cos() have always
required input values measured in radians. Also, an optional base argument
was added to math.log() to make it easier to compute logarithms for bases
other than e and 10. (Contributed by Raymond Hettinger.)
Several new POSIX functions (getpgid(), killpg(), lchown(),
loadavg(), major(), makedev(), minor(), and
mknod()) were added to the posix module that underlies the
os module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S.
Otkidach.)
In the os module, the *stat() family of functions can now report
fractions of a second in a timestamp. Such time stamps are represented as
floats, similar to the value returned by time.time().
During testing, it was found that some applications will break if time stamps
are floats. For compatibility, when using the tuple interface of the
stat_result time stamps will be represented as integers. When using
named fields (a feature first introduced in Python 2.2), time stamps are still
represented as integers, unless os.stat_float_times() is invoked to enable
float return values:
In Python 2.4, the default will change to always returning floats.
Application developers should enable this feature only if all their libraries
work properly when confronted with floating point time stamps, or if they use
the tuple API. If used, the feature should be activated on an application level
instead of trying to enable it on a per-use basis.
The optparse module contains a new parser for command-line arguments
that can convert option values to a particular Python type and will
automatically generate a usage message. See the following section for more
details.
The old and never-documented linuxaudiodev module has been deprecated,
and a new version named ossaudiodev has been added. The module was
renamed because the OSS sound drivers can be used on platforms other than Linux,
and the interface has also been tidied and brought up to date in various ways.
(Contributed by Greg Ward and Nicholas FitzRoy-Dale.)
The new platform module contains a number of functions that try to
determine various properties of the platform you’re running on. There are
functions for getting the architecture, CPU type, the Windows OS version, and
even the Linux distribution version. (Contributed by Marc-André Lemburg.)
The parser objects provided by the pyexpat module can now optionally
buffer character data, resulting in fewer calls to your character data handler
and therefore faster performance. Setting the parser object’s
buffer_text attribute to True will enable buffering.
The sample(population,k)() function was added to the random
module. population is a sequence or xrange object containing the
elements of a population, and sample() chooses k elements from the
population without replacing chosen elements. k can be any value up to
len(population). For example:
>>> days=['Mo','Tu','We','Th','Fr','St','Sn']>>> random.sample(days,3)# Choose 3 elements['St', 'Sn', 'Th']>>> random.sample(days,7)# Choose 7 elements['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']>>> random.sample(days,7)# Choose 7 again['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']>>> random.sample(days,8)# Can't choose eightTraceback (most recent call last):
File "<stdin>", line 1, in ?
File "random.py", line 414, in sampleraiseValueError,"sample larger than population"ValueError: sample larger than population>>> random.sample(xrange(1,10000,2),10)# Choose ten odd nos. under 10000[3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
The random module now uses a new algorithm, the Mersenne Twister,
implemented in C. It’s faster and more extensively studied than the previous
algorithm.
(All changes contributed by Raymond Hettinger.)
The readline module also gained a number of new functions:
get_history_item(), get_current_history_length(), and
redisplay().
The rexec and Bastion modules have been declared dead, and
attempts to import them will fail with a RuntimeError. New-style classes
provide new ways to break out of the restricted execution environment provided
by rexec, and no one has interest in fixing them or time to do so. If
you have applications using rexec, rewrite them to use something else.
(Sticking with Python 2.2 or 2.1 will not make your applications any safer
because there are known bugs in the rexec module in those versions. To
repeat: if you’re using rexec, stop using it immediately.)
The rotor module has been deprecated because the algorithm it uses for
encryption is not believed to be secure. If you need encryption, use one of the
several AES Python modules that are available separately.
The shutil module gained a move(src,dest)() function that
recursively moves a file or directory to a new location.
Support for more advanced POSIX signal handling was added to the signal
but then removed again as it proved impossible to make it work reliably across
platforms.
The socket module now supports timeouts. You can call the
settimeout(t)() method on a socket object to set a timeout of t seconds.
Subsequent socket operations that take longer than t seconds to complete will
abort and raise a socket.timeout exception.
The original timeout implementation was by Tim O’Malley. Michael Gilfix
integrated it into the Python socket module and shepherded it through a
lengthy review. After the code was checked in, Guido van Rossum rewrote parts
of it. (This is a good example of a collaborative development process in
action.)
On Windows, the socket module now ships with Secure Sockets Layer
(SSL) support.
The value of the C PYTHON_API_VERSION macro is now exposed at the
Python level as sys.api_version. The current exception can be cleared by
calling the new sys.exc_clear() function.
The new tarfile module allows reading from and writing to
tar-format archive files. (Contributed by Lars Gustäbel.)
The new textwrap module contains functions for wrapping strings
containing paragraphs of text. The wrap(text,width)() function takes a
string and returns a list containing the text split into lines of no more than
the chosen width. The fill(text,width)() function returns a single
string, reformatted to fit into lines no longer than the chosen width. (As you
can guess, fill() is built on top of wrap(). For example:
>>> importtextwrap>>> paragraph="Not a whit, we defy augury: ... more text ...">>> textwrap.wrap(paragraph,60)["Not a whit, we defy augury: there's a special providence in", "the fall of a sparrow. If it be now, 'tis not to come; if it", ...]>>> printtextwrap.fill(paragraph,35)Not a whit, we defy augury: there'sa special providence in the fall ofa sparrow. If it be now, 'tis notto come; if it be not to come, itwill be now; if it be not now, yetit will come: the readiness is all.>>>
The module also contains a TextWrapper class that actually implements
the text wrapping strategy. Both the TextWrapper class and the
wrap() and fill() functions support a number of additional keyword
arguments for fine-tuning the formatting; consult the module’s documentation
for details. (Contributed by Greg Ward.)
The thread and threading modules now have companion modules,
dummy_thread and dummy_threading, that provide a do-nothing
implementation of the thread module’s interface for platforms where
threads are not supported. The intention is to simplify thread-aware modules
(ones that don’t rely on threads to run) by putting the following code at the
top:
In this example, _threading is used as the module name to make it clear
that the module being used is not necessarily the actual threading
module. Code can call functions and use classes in _threading whether or
not threads are supported, avoiding an if statement and making the
code slightly clearer. This module will not magically make multithreaded code
run without threads; code that waits for another thread to return or to do
something will simply hang forever.
The time module’s strptime() function has long been an annoyance
because it uses the platform C library’s strptime() implementation, and
different platforms sometimes have odd bugs. Brett Cannon contributed a
portable implementation that’s written in pure Python and should behave
identically on all platforms.
The new timeit module helps measure how long snippets of Python code
take to execute. The timeit.py file can be run directly from the
command line, or the module’s Timer class can be imported and used
directly. Here’s a short example that figures out whether it’s faster to
convert an 8-bit string to Unicode by appending an empty Unicode string to it or
by using the unicode() function:
importtimeittimer1=timeit.Timer('unicode("abc")')timer2=timeit.Timer('"abc" + u""')# Run three trialsprinttimer1.repeat(repeat=3,number=100000)printtimer2.repeat(repeat=3,number=100000)# On my laptop this outputs:# [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]# [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
The Tix module has received various bug fixes and updates for the
current version of the Tix package.
The Tkinter module now works with a thread-enabled version of Tcl.
Tcl’s threading model requires that widgets only be accessed from the thread in
which they’re created; accesses from another thread can cause Tcl to panic. For
certain Tcl interfaces, Tkinter will now automatically avoid this when a
widget is accessed from a different thread by marshalling a command, passing it
to the correct thread, and waiting for the results. Other interfaces can’t be
handled automatically but Tkinter will now raise an exception on such an
access so that you can at least find out about the problem. See
http://mail.python.org/pipermail/python-dev/2002-December/031107.html for a more
detailed explanation of this change. (Implemented by Martin von Löwis.)
Calling Tcl methods through _tkinter no longer returns only strings.
Instead, if Tcl returns other objects those objects are converted to their
Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj
object if no Python equivalent exists. This behavior can be controlled through
the wantobjects() method of tkapp objects.
When using _tkinter through the Tkinter module (as most Tkinter
applications will), this feature is always activated. It should not cause
compatibility problems, since Tkinter would always convert string results to
Python types where possible.
If any incompatibilities are found, the old behavior can be restored by setting
the wantobjects variable in the Tkinter module to false before
creating the first tkapp object.
importTkinterTkinter.wantobjects=0
Any breakage caused by this change should be reported as a bug.
The UserDict module has a new DictMixin class which defines
all dictionary methods for classes that already have a minimum mapping
interface. This greatly simplifies writing classes that need to be
substitutable for dictionaries, such as the classes in the shelve
module.
Adding the mix-in as a superclass provides the full dictionary interface
whenever the class defines __getitem__(), __setitem__(),
__delitem__(), and keys(). For example:
The DOM implementation in xml.dom.minidom can now generate XML output
in a particular encoding by providing an optional encoding argument to the
toxml() and toprettyxml() methods of DOM nodes.
The xmlrpclib module now supports an XML-RPC extension for handling nil
data values such as Python’s None. Nil values are always supported on
unmarshalling an XML-RPC response. To generate requests containing None,
you must supply a true value for the allow_none parameter when creating a
Marshaller instance.
The new DocXMLRPCServer module allows writing self-documenting XML-RPC
servers. Run it in demo mode (as a program) to see it in action. Pointing the
Web browser to the RPC server produces pydoc-style documentation; pointing
xmlrpclib to the server allows invoking the actual methods. (Contributed by
Brian Quinlan.)
Support for internationalized domain names (RFCs 3454, 3490, 3491, and 3492)
has been added. The “idna” encoding can be used to convert between a Unicode
domain name and the ASCII-compatible encoding (ACE) of that name.
The socket module has also been extended to transparently convert
Unicode hostnames to the ACE version before passing them to the C library.
Modules that deal with hostnames such as httplib and ftplib)
also support Unicode host names; httplib also sends HTTP Host
headers using the ACE version of the domain name. urllib supports
Unicode URLs with non-ASCII host names as long as the path part of the URL
is ASCII only.
To implement this change, the stringprep module, the mkstringprep
tool and the punycode encoding have been added.
Date and time types suitable for expressing timestamps were added as the
datetime module. The types don’t support different calendars or many
fancy features, and just stick to the basics of representing time.
The three primary types are: date, representing a day, month, and year;
time, consisting of hour, minute, and second; and datetime,
which contains all the attributes of both date and time.
There’s also a timedelta class representing differences between two
points in time, and time zone logic is implemented by classes inheriting from
the abstract tzinfo class.
You can create instances of date and time by either supplying
keyword arguments to the appropriate constructor, e.g.
datetime.date(year=1972,month=10,day=15), or by using one of a number of
class methods. For example, the date.today() class method returns the
current local date.
Once created, instances of the date/time classes are all immutable. There are a
number of methods for producing formatted strings from objects:
>>> importdatetime>>> now=datetime.datetime.now()>>> now.isoformat()'2002-12-30T21:27:03.994956'>>> now.ctime()# Only available on date, datetime'Mon Dec 30 21:27:03 2002'>>> now.strftime('%Y %d %b')'2002 30 Dec'
The replace() method allows modifying one or more fields of a
date or datetime instance, returning a new instance:
Instances can be compared, hashed, and converted to strings (the result is the
same as that of isoformat()). date and datetime
instances can be subtracted from each other, and added to timedelta
instances. The largest missing feature is that there’s no standard library
support for parsing strings and getting back a date or
datetime.
For more information, refer to the module’s reference documentation.
(Contributed by Tim Peters.)
The getopt module provides simple parsing of command-line arguments. The
new optparse module (originally named Optik) provides more elaborate
command-line parsing that follows the Unix conventions, automatically creates
the output for --help, and can perform different actions for different
options.
You start by creating an instance of OptionParser and telling it what
your program’s options are.
importsysfromoptparseimportOptionParserop=OptionParser()op.add_option('-i','--input',action='store',type='string',dest='input',help='set input filename')op.add_option('-l','--length',action='store',type='int',dest='length',help='set maximum length of output')
Parsing a command line is then done by calling the parse_args() method.
This returns an object containing all of the option values, and a list of
strings containing the remaining arguments.
Invoking the script with the various arguments now works as you’d expect it to.
Note that the length argument is automatically converted to an integer.
$ ./python opt.py -i data arg1
<Values at 0x400cad4c: {'input': 'data', 'length': None}>
['arg1']
$ ./python opt.py --input=data --length=4
<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
[]
$
The help message is automatically generated for you:
$ ./python opt.py --help
usage: opt.py [options]
options:
-h, --help show this help message and exit
-iINPUT, --input=INPUT
set input filename
-lLENGTH, --length=LENGTH
set maximum length of output
$
See the module’s documentation for more details.
Optik was written by Greg Ward, with suggestions from the readers of the Getopt
SIG.
Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a
feature added to Python 2.1. Pymalloc is intended to be faster than the system
malloc() and to have less memory overhead for allocation patterns typical
of Python programs. The allocator uses C’s malloc() function to get large
pools of memory and then fulfills smaller memory requests from these pools.
In 2.1 and 2.2, pymalloc was an experimental feature and wasn’t enabled by
default; you had to explicitly enable it when compiling Python by providing the
--with-pymalloc option to the configure script. In 2.3,
pymalloc has had further enhancements and is now enabled by default; you’ll have
to supply --without-pymalloc to disable it.
This change is transparent to code written in Python; however, pymalloc may
expose bugs in C extensions. Authors of C extension modules should test their
code with pymalloc enabled, because some incorrect code may cause core dumps at
runtime.
There’s one particularly common error that causes problems. There are a number
of memory allocation functions in Python’s C API that have previously just been
aliases for the C library’s malloc() and free(), meaning that if
you accidentally called mismatched functions the error wouldn’t be noticeable.
When the object allocator is enabled, these functions aren’t aliases of
malloc() and free() any more, and calling the wrong function to
free memory may get you a core dump. For example, if memory was allocated using
PyObject_Malloc(), it has to be freed using PyObject_Free(), not
free(). A few modules included with Python fell afoul of this and had to
be fixed; doubtless there are more third-party modules that will have the same
problem.
As part of this change, the confusing multiple interfaces for allocating memory
have been consolidated down into two API families. Memory allocated with one
family must not be manipulated with functions from the other family. There is
one family for allocating chunks of memory and another family of functions
specifically for allocating Python objects.
The “object memory” family is the interface to the pymalloc facility described
above and is biased towards a large number of “small” allocations:
PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free().
Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides debugging
features to catch memory overwrites and doubled frees in both extension modules
and in the interpreter itself. To enable this support, compile a debugging
version of the Python interpreter by running configure with
--with-pydebug.
To aid extension writers, a header file Misc/pymemcompat.h is
distributed with the source to Python 2.3 that allows Python extensions to use
the 2.3 interfaces to memory allocation while compiling against any version of
Python since 1.5.2. You would copy the file from Python’s source distribution
and bundle it with the source of your extension.
For the full details of the pymalloc implementation, see the comments at
the top of the file Objects/obmalloc.c in the Python source code.
The above link points to the file within the python.org SVN browser.
Changes to Python’s build process and to the C API include:
The cycle detection implementation used by the garbage collection has proven
to be stable, so it’s now been made mandatory. You can no longer compile Python
without it, and the --with-cycle-gc switch to configure has
been removed.
Python can now optionally be built as a shared library
(libpython2.3.so) by supplying --enable-shared when running
Python’s configure script. (Contributed by Ondrej Palkovsky.)
The DL_EXPORT and DL_IMPORT macros are now deprecated.
Initialization functions for Python extension modules should now be declared
using the new macro PyMODINIT_FUNC, while the Python core will
generally use the PyAPI_FUNC and PyAPI_DATA macros.
The interpreter can be compiled without any docstrings for the built-in
functions and modules by supplying --without-doc-strings to the
configure script. This makes the Python executable about 10% smaller,
but will also mean that you can’t get help for Python’s built-ins. (Contributed
by Gustavo Niemeyer.)
The PyArg_NoArgs() macro is now deprecated, and code that uses it
should be changed. For Python 2.2 and later, the method definition table can
specify the METH_NOARGS flag, signalling that there are no arguments,
and the argument checking can then be removed. If compatibility with pre-2.2
versions of Python is important, the code could use PyArg_ParseTuple(args,"") instead, but this will be slower than using METH_NOARGS.
PyArg_ParseTuple() accepts new format characters for various sizes of
unsigned integers: B for unsignedchar, H for unsignedshortint, I for unsignedint, and K for unsignedlonglong.
A new function, PyObject_DelItemString(mapping,char*key)() was added
as shorthand for PyObject_DelItem(mapping,PyString_New(key)).
File objects now manage their internal string buffer differently, increasing
it exponentially when needed. This results in the benchmark tests in
Lib/test/test_bufio.py speeding up considerably (from 57 seconds to 1.7
seconds, according to one measurement).
It’s now possible to define class and static methods for a C extension type by
setting either the METH_CLASS or METH_STATIC flags in a
method’s PyMethodDef structure.
Python now includes a copy of the Expat XML parser’s source code, removing any
dependence on a system version or local installation of Expat.
If you dynamically allocate type objects in your extension, you should be
aware of a change in the rules relating to the __module__ and
__name__ attributes. In summary, you will want to ensure the type’s
dictionary contains a '__module__' key; making the module name the part of
the type name leading up to the final period will no longer have the desired
effect. For more detail, read the API reference documentation or the source.
Support for a port to IBM’s OS/2 using the EMX runtime environment was merged
into the main Python source tree. EMX is a POSIX emulation layer over the OS/2
system APIs. The Python port for EMX tries to support all the POSIX-like
capability exposed by the EMX runtime, and mostly succeeds; fork() and
fcntl() are restricted by the limitations of the underlying emulation
layer. The standard OS/2 port, which uses IBM’s Visual Age compiler, also
gained support for case-sensitive import semantics as part of the integration of
the EMX port into CVS. (Contributed by Andrew MacIntyre.)
On MacOS, most toolbox modules have been weaklinked to improve backward
compatibility. This means that modules will no longer fail to load if a single
routine is missing on the current OS version. Instead calling the missing
routine will raise an exception. (Contributed by Jack Jansen.)
The RPM spec files, found in the Misc/RPM/ directory in the Python
source distribution, were updated for 2.3. (Contributed by Sean Reifschneider.)
Other new platforms now supported by Python include AtheOS
(http://www.atheos.cx/), GNU/Hurd, and OpenVMS.
As usual, there were a bunch of other improvements and bugfixes scattered
throughout the source tree. A search through the CVS change logs finds there
were 523 patches applied and 514 bugs fixed between Python 2.2 and 2.3. Both
figures are likely to be underestimates.
Some of the more notable changes are:
If the PYTHONINSPECT environment variable is set, the Python
interpreter will enter the interactive prompt after running a Python program, as
if Python had been invoked with the -i option. The environment
variable can be set before running the Python interpreter, or it can be set by
the Python program as part of its execution.
The regrtest.py script now provides a way to allow “all resources
except foo.” A resource name passed to the -u option can now be
prefixed with a hyphen ('-') to mean “remove this resource.” For example,
the option ‘-uall,-bsddb‘ could be used to enable the use of all resources
except bsddb.
The tools used to build the documentation now work under Cygwin as well as
Unix.
The SET_LINENO opcode has been removed. Back in the mists of time, this
opcode was needed to produce line numbers in tracebacks and support trace
functions (for, e.g., pdb). Since Python 1.5, the line numbers in
tracebacks have been computed using a different mechanism that works with
“python -O”. For Python 2.3 Michael Hudson implemented a similar scheme to
determine when to call the trace function, removing the need for SET_LINENO
entirely.
It would be difficult to detect any resulting difference from Python code, apart
from a slight speed up when Python is run without -O.
C extensions that access the f_lineno field of frame objects should
instead call PyCode_Addr2Line(f->f_code,f->f_lasti). This will have the
added effect of making the code work as desired under “python -O” in earlier
versions of Python.
A nifty new feature is that trace functions can now assign to the
f_lineno attribute of frame objects, changing the line that will be
executed next. A jump command has been added to the pdb debugger
taking advantage of this new feature. (Implemented by Richie Hindle.)
This section lists previously described changes that may require changes to your
code:
yield is now always a keyword; if it’s used as a variable name in
your code, a different name must be chosen.
For strings X and Y, XinY now works if X is more than one
character long.
The int() type constructor will now return a long integer instead of
raising an OverflowError when a string or floating-point number is too
large to fit into an integer.
If you have Unicode strings that contain 8-bit characters, you must declare
the file’s encoding (UTF-8, Latin-1, or whatever) by adding a comment to the top
of the file. See section PEP 263: Source Code Encodings for more information.
Calling Tcl methods through _tkinter no longer returns only strings.
Instead, if Tcl returns other objects those objects are converted to their
Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj
object if no Python equivalent exists.
Large octal and hex literals such as 0xffffffff now trigger a
FutureWarning. Currently they’re stored as 32-bit numbers and result in a
negative value, but in Python 2.4 they’ll become positive long integers.
There are a few ways to fix this warning. If you really need a positive number,
just add an L to the end of the literal. If you’re trying to get a 32-bit
integer with low bits set and have previously used an expression such as ~(1<<31), it’s probably clearest to start with all bits set and clear the
desired upper bits. For example, to clear just the top bit (bit 31), you could
write 0xffffffffL&~(1L<<31).
You can no longer disable assertions by assigning to __debug__.
The Distutils setup() function has gained various new keyword arguments
such as depends. Old versions of the Distutils will abort if passed unknown
keywords. A solution is to check for the presence of the new
get_distutil_options() function in your setup.py and only uses the
new keywords with a version of the Distutils that supports them:
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Jeff Bauer,
Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David
Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger,
Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew
MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans
Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman
Suzi, Jason Tishler, Just van Rossum.
This article explains the new features in Python 2.2.2, released on October 14,
2002. Python 2.2.2 is a bugfix release of Python 2.2, originally released on
December 21, 2001.
Python 2.2 can be thought of as the “cleanup release”. There are some features
such as generators and iterators that are completely new, but most of the
changes, significant and far-reaching though they may be, are aimed at cleaning
up irregularities and dark corners of the language design.
This article doesn’t attempt to provide a complete specification of the new
features, but instead provides a convenient overview. For full details, you
should refer to the documentation for Python 2.2, such as the Python Library
Reference and the Python
Reference Manual. If you want to
understand the complete implementation and design rationale for a change, refer
to the PEP for a particular new feature.
The largest and most far-reaching changes in Python 2.2 are to Python’s model of
objects and classes. The changes should be backward compatible, so it’s likely
that your code will continue to run unchanged, but the changes provide some
amazing new capabilities. Before beginning this, the longest and most
complicated section of this article, I’ll provide an overview of the changes and
offer some comments.
A long time ago I wrote a Web page listing flaws in Python’s design. One of the
most significant flaws was that it’s impossible to subclass Python types
implemented in C. In particular, it’s not possible to subclass built-in types,
so you can’t just subclass, say, lists in order to add a single useful method to
them. The UserList module provides a class that supports all of the
methods of lists and that can be subclassed further, but there’s lots of C code
that expects a regular Python list and won’t accept a UserList
instance.
Python 2.2 fixes this, and in the process adds some exciting new capabilities.
A brief summary:
You can subclass built-in types such as lists and even integers, and your
subclasses should work in every place that requires the original type.
It’s now possible to define static and class methods, in addition to the
instance methods available in previous versions of Python.
It’s also possible to automatically call methods on accessing or setting an
instance attribute by using a new mechanism called properties. Many uses
of __getattr__() can be rewritten to use properties instead, making the
resulting code simpler and faster. As a small side benefit, attributes can now
have docstrings, too.
The list of legal attributes for an instance can be limited to a particular
set using slots, making it possible to safeguard against typos and
perhaps make more optimizations possible in future versions of Python.
Some users have voiced concern about all these changes. Sure, they say, the new
features are neat and lend themselves to all sorts of tricks that weren’t
possible in previous versions of Python, but they also make the language more
complicated. Some people have said that they’ve always recommended Python for
its simplicity, and feel that its simplicity is being lost.
Personally, I think there’s no need to worry. Many of the new features are
quite esoteric, and you can write a lot of Python code without ever needed to be
aware of them. Writing a simple class is no more difficult than it ever was, so
you don’t need to bother learning or teaching them unless they’re actually
needed. Some very complicated tasks that were previously only possible from C
will now be possible in pure Python, and to my mind that’s all for the better.
I’m not going to attempt to cover every single corner case and small change that
were required to make the new features work. Instead this section will paint
only the broad strokes. See section Related Links, “Related Links”, for
further sources of information about Python 2.2’s new object model.
First, you should know that Python 2.2 really has two kinds of classes: classic
or old-style classes, and new-style classes. The old-style class model is
exactly the same as the class model in earlier versions of Python. All the new
features described in this section apply only to new-style classes. This
divergence isn’t intended to last forever; eventually old-style classes will be
dropped, possibly in Python 3.0.
So how do you define a new-style class? You do it by subclassing an existing
new-style class. Most of Python’s built-in types, such as integers, lists,
dictionaries, and even files, are new-style classes now. A new-style class
named object, the base class for all built-in types, has also been
added so if no built-in type is suitable, you can just subclass
object:
classC(object):def__init__(self):......
This means that class statements that don’t have any base classes are
always classic classes in Python 2.2. (Actually you can also change this by
setting a module-level variable named __metaclass__ — see PEP 253
for the details — but it’s easier to just subclass object.)
The type objects for the built-in types are available as built-ins, named using
a clever trick. Python has always had built-in functions named int(),
float(), and str(). In 2.2, they aren’t functions any more, but
type objects that behave as factories when called.
>>> int<type 'int'>>>> int('123')123
To make the set of types complete, new type objects such as dict() and
file() have been added. Here’s a more interesting example, adding a
lock() method to file objects:
The now-obsolete posixfile module contained a class that emulated all of
a file object’s methods and also added a lock() method, but this class
couldn’t be passed to internal functions that expected a built-in file,
something which is possible with our new LockableFile.
In previous versions of Python, there was no consistent way to discover what
attributes and methods were supported by an object. There were some informal
conventions, such as defining __members__ and __methods__
attributes that were lists of names, but often the author of an extension type
or a class wouldn’t bother to define them. You could fall back on inspecting
the __dict__ of an object, but when class inheritance or an arbitrary
__getattr__() hook were in use this could still be inaccurate.
The one big idea underlying the new class model is that an API for describing
the attributes of an object using descriptors has been formalized.
Descriptors specify the value of an attribute, stating whether it’s a method or
a field. With the descriptor API, static methods and class methods become
possible, as well as more exotic constructs.
Attribute descriptors are objects that live inside class objects, and have a few
attributes of their own:
__name__ is the attribute’s name.
__doc__ is the attribute’s docstring.
__get__(object)() is a method that retrieves the attribute value from
object.
__set__(object,value)() sets the attribute on object to value.
__delete__(object,value)() deletes the value attribute of object.
For example, when you write obj.x, the steps that Python actually performs
are:
descriptor=obj.__class__.xdescriptor.__get__(obj)
For methods, descriptor.__get__() returns a temporary object that’s
callable, and wraps up the instance and the method to be called on it. This is
also why static methods and class methods are now possible; they have
descriptors that wrap up just the method, or the method and the class. As a
brief explanation of these new kinds of methods, static methods aren’t passed
the instance, and therefore resemble regular functions. Class methods are
passed the class of the object, but not the object itself. Static and class
methods are defined like this:
The staticmethod() function takes the function f(), and returns it
wrapped up in a descriptor so it can be stored in the class object. You might
expect there to be special syntax for creating such methods (defstaticf,
defstaticf(), or something like that) but no such syntax has been defined
yet; that’s been left for future versions of Python.
More new features, such as slots and properties, are also implemented as new
kinds of descriptors, and it’s not difficult to write a descriptor class that
does something novel. For example, it would be possible to write a descriptor
class that made it possible to write Eiffel-style preconditions and
postconditions for a method. A class that used this feature might be defined
like this:
fromeiffelimporteiffelmethodclassC(object):deff(self,arg1,arg2):# The actual function...defpre_f(self):# Check preconditions...defpost_f(self):# Check postconditions...f=eiffelmethod(f,pre_f,post_f)
Note that a person using the new eiffelmethod() doesn’t have to understand
anything about descriptors. This is why I think the new features don’t increase
the basic complexity of the language. There will be a few wizards who need to
know about it in order to write eiffelmethod() or the ZODB or whatever,
but most users will just write code on top of the resulting libraries and ignore
the implementation details.
Multiple inheritance has also been made more useful through changing the rules
under which names are resolved. Consider this set of classes (diagram taken
from PEP 253 by Guido van Rossum):
The lookup rule for classic classes is simple but not very smart; the base
classes are searched depth-first, going from left to right. A reference to
D.save() will search the classes D, B, and then
A, where save() would be found and returned. C.save()
would never be found at all. This is bad, because if C‘s save()
method is saving some internal state specific to C, not calling it will
result in that state never getting saved.
New-style classes follow a different algorithm that’s a bit more complicated to
explain, but does the right thing in this situation. (Note that Python 2.3
changes this algorithm to one that produces the same results in most cases, but
produces more useful results for really complicated inheritance graphs.)
List all the base classes, following the classic lookup rule and include a
class multiple times if it’s visited repeatedly. In the above example, the list
of visited classes is [D, B, A, C,
A].
Scan the list for duplicated classes. If any are found, remove all but one
occurrence, leaving the last one in the list. In the above example, the list
becomes [D, B, C, A] after dropping
duplicates.
Following this rule, referring to D.save() will return C.save(),
which is the behaviour we’re after. This lookup rule is the same as the one
followed by Common Lisp. A new built-in function, super(), provides a way
to get at a class’s superclasses without having to reimplement Python’s
algorithm. The most commonly used form will be super(class,obj)(), which
returns a bound superclass object (not the actual class object). This form
will be used in methods to call a method in the superclass; for example,
D‘s save() method would look like this:
classD(B,C):defsave(self):# Call superclass .save()super(D,self).save()# Save D's private information here...
super() can also return unbound superclass objects when called as
super(class)() or super(class1,class2)(), but this probably won’t
often be useful.
A fair number of sophisticated Python classes define hooks for attribute access
using __getattr__(); most commonly this is done for convenience, to make
code more readable by automatically mapping an attribute access such as
obj.parent into a method call such as obj.get_parent. Python 2.2 adds
some new ways of controlling attribute access.
First, __getattr__(attr_name)() is still supported by new-style classes,
and nothing about it has changed. As before, it will be called when an attempt
is made to access obj.foo and no attribute named foo is found in the
instance’s dictionary.
New-style classes also support a new method,
__getattribute__(attr_name)(). The difference between the two methods is
that __getattribute__() is always called whenever any attribute is
accessed, while the old __getattr__() is only called if foo isn’t
found in the instance’s dictionary.
However, Python 2.2’s support for properties will often be a simpler way
to trap attribute references. Writing a __getattr__() method is
complicated because to avoid recursion you can’t use regular attribute accesses
inside them, and instead have to mess around with the contents of
__dict__. __getattr__() methods also end up being called by Python
when it checks for other methods such as __repr__() or __coerce__(),
and so have to be written with this in mind. Finally, calling a function on
every attribute access results in a sizable performance loss.
property is a new built-in type that packages up three functions that
get, set, or delete an attribute, and a docstring. For example, if you want to
define a size attribute that’s computed, but also settable, you could
write:
classC(object):defget_size(self):result=...computation...returnresultdefset_size(self,size):...computesomethingbasedonthesizeandsetinternalstateappropriately...# Define a property. The 'delete this attribute'# method is defined as None, so the attribute# can't be deleted.size=property(get_size,set_size,None,"Storage size of this instance")
That is certainly clearer and easier to write than a pair of
__getattr__()/__setattr__() methods that check for the size
attribute and handle it specially while retrieving all other attributes from the
instance’s __dict__. Accesses to size are also the only ones
which have to perform the work of calling a function, so references to other
attributes run at their usual speed.
Finally, it’s possible to constrain the list of attributes that can be
referenced on an object using the new __slots__ class attribute. Python
objects are usually very dynamic; at any time it’s possible to define a new
attribute on an instance by just doing obj.new_attr=1. A new-style class
can define a class attribute named __slots__ to limit the legal
attributes to a particular set of names. An example will make this clear:
>>> classC(object):... __slots__=('template','name')...>>> obj=C()>>> printobj.templateNone>>> obj.template='Test'>>> printobj.templateTest>>> obj.newattr=NoneTraceback (most recent call last):
File "<stdin>", line 1, in ?AttributeError: 'C' object has no attribute 'newattr'
Note how you get an AttributeError on the attempt to assign to an
attribute not listed in __slots__.
This section has just been a quick overview of the new features, giving enough
of an explanation to start you programming, but many details have been
simplified or ignored. Where should you go to get a more complete picture?
http://www.python.org/2.2/descrintro.html is a lengthy tutorial introduction to
the descriptor features, written by Guido van Rossum. If my description has
whetted your appetite, go read this tutorial next, because it goes into much
more detail about the new features while still remaining quite easy to read.
Next, there are two relevant PEPs, PEP 252 and PEP 253. PEP 252 is
titled “Making Types Look More Like Classes”, and covers the descriptor API.
PEP 253 is titled “Subtyping Built-in Types”, and describes the changes to
type objects that make it possible to subtype built-in objects. PEP 253 is
the more complicated PEP of the two, and at a few points the necessary
explanations of types and meta-types may cause your head to explode. Both PEPs
were written and implemented by Guido van Rossum, with substantial assistance
from the rest of the Zope Corp. team.
Finally, there’s the ultimate authority: the source code. Most of the machinery
for the type handling is in Objects/typeobject.c, but you should only
resort to it after all other avenues have been exhausted, including posting a
question to python-list or python-dev.
Another significant addition to 2.2 is an iteration interface at both the C and
Python levels. Objects can define how they can be looped over by callers.
In Python versions up to 2.1, the usual way to make foriteminobj work is
to define a __getitem__() method that looks something like this:
def__getitem__(self,index):return<nextitem>
__getitem__() is more properly used to define an indexing operation on an
object so that you can write obj[5] to retrieve the sixth element. It’s a
bit misleading when you’re using this only to support for loops.
Consider some file-like object that wants to be looped over; the index
parameter is essentially meaningless, as the class probably assumes that a
series of __getitem__() calls will be made with index incrementing by
one each time. In other words, the presence of the __getitem__() method
doesn’t mean that using file[5] to randomly access the sixth element will
work, though it really should.
In Python 2.2, iteration can be implemented separately, and __getitem__()
methods can be limited to classes that really do support random access. The
basic idea of iterators is simple. A new built-in function, iter(obj)()
or iter(C,sentinel), is used to get an iterator. iter(obj)() returns
an iterator for the object obj, while iter(C,sentinel) returns an
iterator that will invoke the callable object C until it returns sentinel to
signal that the iterator is done.
Python classes can define an __iter__() method, which should create and
return a new iterator for the object; if the object is its own iterator, this
method can just return self. In particular, iterators will usually be their
own iterators. Extension types implemented in C can implement a tp_iter
function in order to return an iterator, and extension types that want to behave
as iterators can define a tp_iternext function.
So, after all this, what do iterators actually do? They have one required
method, next(), which takes no arguments and returns the next value. When
there are no more values to be returned, calling next() should raise the
StopIteration exception.
>>> L=[1,2,3]>>> i=iter(L)>>> printi<iterator object at 0x8116870>>>> i.next()1>>> i.next()2>>> i.next()3>>> i.next()Traceback (most recent call last):
File "<stdin>", line 1, in ?StopIteration>>>
In 2.2, Python’s for statement no longer expects a sequence; it
expects something for which iter() will return an iterator. For backward
compatibility and convenience, an iterator is automatically constructed for
sequences that don’t implement __iter__() or a tp_iter slot, so
foriin[1,2,3] will still work. Wherever the Python interpreter loops
over a sequence, it’s been changed to use the iterator protocol. This means you
can do things like this:
That’s just the default behaviour. If you want to iterate over keys, values, or
key/value pairs, you can explicitly call the iterkeys(),
itervalues(), or iteritems() methods to get an appropriate iterator.
In a minor related change, the in operator now works on dictionaries,
so keyindict is now equivalent to dict.has_key(key).
Files also provide an iterator, which calls the readline() method until
there are no more lines in the file. This means you can now read each line of a
file using code like this:
forlineinfile:# do something for each line...
Note that you can only go forward in an iterator; there’s no way to get the
previous element, reset the iterator, or make a copy of it. An iterator object
could provide such additional capabilities, but the iterator protocol only
requires a next() method.
Generators are another new feature, one that interacts with the introduction of
iterators.
You’re doubtless familiar with how function calls work in Python or C. When you
call a function, it gets a private namespace where its local variables are
created. When the function reaches a return statement, the local
variables are destroyed and the resulting value is returned to the caller. A
later call to the same function will get a fresh new set of local variables.
But, what if the local variables weren’t thrown away on exiting a function?
What if you could later resume the function where it left off? This is what
generators provide; they can be thought of as resumable functions.
Here’s the simplest example of a generator function:
defgenerate_ints(N):foriinrange(N):yieldi
A new keyword, yield, was introduced for generators. Any function
containing a yield statement is a generator function; this is
detected by Python’s bytecode compiler which compiles the function specially as
a result. Because a new keyword was introduced, generators must be explicitly
enabled in a module by including a from__future__importgenerators
statement near the top of the module’s source code. In Python 2.3 this
statement will become unnecessary.
When you call a generator function, it doesn’t return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
the yield statement, the generator outputs the value of i,
similar to a return statement. The big difference between
yield and a return statement is that on reaching a
yield the generator’s state of execution is suspended and local
variables are preserved. On the next call to the generator’s next() method,
the function will resume executing immediately after the yield
statement. (For complicated reasons, the yield statement isn’t
allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the
interaction between yield and exceptions.)
Here’s a sample usage of the generate_ints() generator:
>>> gen=generate_ints(3)>>> gen<generator object at 0x8117f90>>>> gen.next()0>>> gen.next()1>>> gen.next()2>>> gen.next()Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in generate_intsStopIteration
You could equally write foriingenerate_ints(5), or a,b,c=generate_ints(3).
Inside a generator function, the return statement can only be used
without a value, and signals the end of the procession of values; afterwards the
generator cannot return any further values. return with a value, such
as return5, is a syntax error inside a generator function. The end of the
generator’s results can also be indicated by raising StopIteration
manually, or by just letting the flow of execution fall off the bottom of the
function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting self.count to
0, and having the next() method increment self.count and return it.
However, for a moderately complicated generator, writing a corresponding class
would be much messier. Lib/test/test_generators.py contains a number of
more interesting examples. The simplest one implements an in-order traversal of
a tree using generators recursively.
# A recursive generator that generates Tree leaves in in-order.definorder(t):ift:forxininorder(t.left):yieldxyieldt.labelforxininorder(t.right):yieldx
Two other examples in Lib/test/test_generators.py produce solutions for
the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no
queen threatens another) and the Knight’s Tour (a route that takes a knight to
every square of an $NxN$ chessboard without visiting any square twice).
The idea of generators comes from other programming languages, especially Icon
(http://www.cs.arizona.edu/icon/), where the idea of generators is central. In
Icon, every expression and function call behaves like a generator. One example
from “An Overview of the Icon Programming Language” at
http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks
like:
sentence:="Store it in the neighboring harbor"if(i:=find("or",sentence))>5thenwrite(i)
In Icon the find() function returns the indexes at which the substring
“or” is found: 3, 23, 33. In the if statement, i is first
assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
retries it with the second value of 23. 23 is greater than 5, so the comparison
now succeeds, and the code prints the value 23 to the screen.
Python doesn’t go nearly as far as Icon in adopting generators as a central
concept. Generators are considered a new part of the core Python language, but
learning or using them isn’t compulsory; if they don’t solve any problems that
you have, feel free to ignore them. One novel feature of Python’s interface as
compared to Icon’s is that a generator’s state is represented as a concrete
object (the iterator) that can be passed around to other functions or stored in
a data structure.
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly
by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
In recent versions, the distinction between regular integers, which are 32-bit
values on most machines, and long integers, which can be of arbitrary size, was
becoming an annoyance. For example, on platforms that support files larger than
2**32 bytes, the tell() method of file objects has to return a long
integer. However, there were various bits of Python that expected plain integers
and would raise an error if a long integer was provided instead. For example,
in Python 1.5, only regular integers could be used as a slice index, and
'abc'[1L:] would raise a TypeError exception with the message ‘slice
index must be int’.
Python 2.2 will shift values from short to long integers as required. The ‘L’
suffix is no longer needed to indicate a long integer literal, as now the
compiler will choose the appropriate type. (Using the ‘L’ suffix will be
discouraged in future 2.x versions of Python, triggering a warning in Python
2.4, and probably dropped in Python 3.0.) Many operations that used to raise an
OverflowError will now return a long integer as their result. For
example:
In most cases, integers and long integers will now be treated identically. You
can still distinguish them with the type() built-in function, but that’s
rarely needed.
The most controversial change in Python 2.2 heralds the start of an effort to
fix an old design flaw that’s been in Python from the beginning. Currently
Python’s division operator, /, behaves like C’s division operator when
presented with two integer arguments: it returns an integer result that’s
truncated down when there would be a fractional part. For example, 3/2 is
1, not 1.5, and (-1)/2 is -1, not -0.5. This means that the results of
division can vary unexpectedly depending on the type of the two operands and
because Python is dynamically typed, it can be difficult to determine the
possible types of the operands.
(The controversy is over whether this is really a design flaw, and whether
it’s worth breaking existing code to fix this. It’s caused endless discussions
on python-dev, and in July 2001 erupted into an storm of acidly sarcastic
postings on comp.lang.python. I won’t argue for either side here
and will stick to describing what’s implemented in 2.2. Read PEP 238 for a
summary of arguments and counter-arguments.)
Because this change might break code, it’s being introduced very gradually.
Python 2.2 begins the transition, but the switch won’t be complete until Python
3.0.
First, I’ll borrow some terminology from PEP 238. “True division” is the
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25,
and so forth. “Floor division” is what Python’s / operator currently does
when given integer operands; the result is the floor of the value returned by
true division. “Classic division” is the current mixed behaviour of /; it
returns the result of floor division when the operands are integers, and returns
the result of true division when one of the operands is a floating-point number.
Here are the changes 2.2 introduces:
A new operator, //, is the floor division operator. (Yes, we know it looks
like C++’s comment symbol.) //always performs floor division no matter
what the types of its operands are, so 1//2 is 0 and 1.0//2.0 is
also 0.0.
// is always available in Python 2.2; you don’t need to enable it using a
__future__ statement.
By including a from__future__importdivision in a module, the /
operator will be changed to return the result of true division, so 1/2 is
0.5. Without the __future__ statement, / still means classic division.
The default meaning of / will not change until Python 3.0.
Classes can define methods called __truediv__() and __floordiv__()
to overload the two division operators. At the C level, there are also slots in
the PyNumberMethods structure so extension types can define the two
operators.
Python 2.2 supports some command-line arguments for testing whether code will
works with the changed division semantics. Running python with -Qwarn will cause a warning to be issued whenever division is applied to two
integers. You can use this to find code that’s affected by the change and fix
it. By default, Python 2.2 will simply perform classic division without a
warning; the warning will be turned on by default in Python 2.3.
Python’s Unicode support has been enhanced a bit in 2.2. Unicode strings are
usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be
compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by
supplying --enable-unicode=ucs4 to the configure script. (It’s also
possible to specify --disable-unicode to completely disable Unicode
support.)
When built to use UCS-4 (a “wide Python”), the interpreter can natively handle
Unicode characters from U+000000 to U+110000, so the range of legal values for
the unichr() function is expanded accordingly. Using an interpreter
compiled to use UCS-2 (a “narrow Python”), values greater than 65535 will still
cause unichr() to raise a ValueError exception. This is all
described in PEP 261, “Support for ‘wide’ Unicode characters”; consult it for
further details.
Another change is simpler to explain. Since their introduction, Unicode strings
have supported an encode() method to convert the string to a selected
encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*])()
method has been added to 8-bit strings (though not to Unicode strings) in 2.2.
decode() assumes that the string is in the specified encoding and decodes
it, returning whatever is returned by the codec.
Using this new feature, codecs have been added for tasks not directly related to
Unicode. For example, codecs have been added for uu-encoding, MIME’s base64
encoding, and compression with the zlib module:
>>> s="""Here is a lengthy piece of redundant, overly verbose,... and repetitive text.... """>>> data=s.encode('zlib')>>> data'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'>>> data.decode('zlib')'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'>>> prints.encode('uu')begin 666 <data>M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*end>>> "sheesh".encode('rot-13')'furrfu'
To convert a class instance to Unicode, a __unicode__() method can be
defined by a class, analogous to __str__().
encode(), decode(), and __unicode__() were implemented by
Marc-André Lemburg. The changes to support using UCS-4 internally were
implemented by Fredrik Lundh and Martin von Löwis.
In Python 2.1, statically nested scopes were added as an optional feature, to be
enabled by a from__future__importnested_scopes directive. In 2.2 nested
scopes no longer need to be specially enabled, and are now always present. The
rest of this section is a copy of the description of nested scopes from my
“What’s New in Python 2.1” document; if you read it when 2.1 came out, you can
skip the rest of this section.
The largest change introduced in Python 2.1, and made complete in 2.2, is to
Python’s scoping rules. In Python 2.0, at any given time there are at most
three namespaces used to look up variable names: local, module-level, and the
built-in namespace. This often surprised people because it didn’t match their
intuitive expectations. For example, a nested recursive function definition
doesn’t work:
deff():...defg(value):...returng(value-1)+1...
The function g() will always raise a NameError exception, because
the binding of the name g isn’t in either its local namespace or in the
module-level namespace. This isn’t much of a problem in practice (how often do
you recursively define interior functions like this?), but this also made using
the lambda statement clumsier, and this was a problem in practice.
In code which uses lambda you can often find local variables being
copied by passing them as the default values of arguments.
deffind(self,name):"Return list of any entries equal to 'name'"L=filter(lambdax,name=name:x==name,self.list_attribute)returnL
The readability of Python code written in a strongly functional style suffers
greatly as a result.
The most significant change to Python 2.2 is that static scoping has been added
to the language to fix this problem. As a first effect, the name=name
default argument is now unnecessary in the above example. Put simply, when a
given variable name is not assigned a value within a function (by an assignment,
or the def, class, or import statements),
references to the variable will be looked up in the local namespace of the
enclosing scope. A more detailed explanation of the rules, and a dissection of
the implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the same
variable name is used both at the module level and as a local variable within a
function that contains further function definitions. This seems rather unlikely
though, since such code would have been pretty confusing to read in the first
place.
One side effect of the change is that the frommoduleimport* and
exec statements have been made illegal inside a function scope under
certain conditions. The Python reference manual has said all along that frommoduleimport* is only legal at the top level of a module, but the CPython
interpreter has never enforced this before. As part of the implementation of
nested scopes, the compiler which turns Python source into bytecodes has to
generate different code to access variables in a containing scope. frommoduleimport* and exec make it impossible for the compiler to
figure this out, because they add names to the local namespace that are
unknowable at compile time. Therefore, if a function contains function
definitions or lambda expressions with free variables, the compiler
will flag this by raising a SyntaxError exception.
To make the preceding explanation a bit clearer, here’s an example:
x=1deff():# The next line is a syntax errorexec'x=2'defg():returnx
Line 4 containing the exec statement is a syntax error, since
exec would define a new local variable named x whose value should
be accessed by g().
This shouldn’t be much of a limitation, since exec is rarely used in
most Python code (and when it is used, it’s often a sign of a poor design
anyway).
The xmlrpclib module was contributed to the standard library by Fredrik
Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple
remote procedure call protocol built on top of HTTP and XML. For example, the
following snippet retrieves a list of RSS channels from the O’Reilly Network,
and then lists the recent headlines for one channel:
importxmlrpclibs=xmlrpclib.Server('http://www.oreillynet.com/meerkat/xml-rpc/server.php')channels=s.meerkat.getChannels()# channels is a list of dictionaries, like this:# [{'id': 4, 'title': 'Freshmeat Daily News'}# {'id': 190, 'title': '32Bits Online'},# {'id': 4549, 'title': '3DGamers'}, ... ]# Get the items for one channelitems=s.meerkat.getItems({'channel':4})# 'items' is another list of dictionaries, like this:# [{'link': 'http://freshmeat.net/releases/52719/',# 'description': 'A utility which converts HTML to XSL FO.',# 'title': 'html2fo 0.3 (Default)'}, ... ]
The SimpleXMLRPCServer module makes it easy to create straightforward
XML-RPC servers. See http://www.xmlrpc.com/ for more information about XML-RPC.
The new hmac module implements the HMAC algorithm described by
RFC 2104. (Contributed by Gerhard Häring.)
Several functions that originally returned lengthy tuples now return pseudo-
sequences that still behave like tuples but also have mnemonic attributes such
as memberst_mtime or tm_year. The enhanced functions include
stat(), fstat(), statvfs(), and fstatvfs() in the
os module, and localtime(), gmtime(), and strptime() in
the time module.
For example, to obtain a file’s size using the old tuples, you’d end up writing
something like file_size=os.stat(filename)[stat.ST_SIZE], but now this can
be written more clearly as file_size=os.stat(filename).st_size.
The original patch for this feature was contributed by Nick Mathewson.
The Python profiler has been extensively reworked and various errors in its
output have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
The socket module can be compiled to support IPv6; specify the
--enable-ipv6 option to Python’s configure script. (Contributed by
Jun-ichiro “itojun” Hagino.)
Two new format characters were added to the struct module for 64-bit
integers on platforms that support the C longlong type. q is for
a signed 64-bit integer, and Q is for an unsigned one. The value is
returned in Python’s long integer type. (Contributed by Tim Peters.)
In the interpreter’s interactive mode, there’s a new built-in function
help() that uses the pydoc module introduced in Python 2.1 to
provide interactive help. help(object) displays any available help text
about object. help() with no argument puts you in an online help
utility, where you can enter the names of functions, classes, or modules to read
their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee’s
pydoc module.)
Various bugfixes and performance improvements have been made to the SRE engine
underlying the re module. For example, the re.sub() and
re.split() functions have been rewritten in C. Another contributed patch
speeds up certain Unicode character ranges by a factor of two, and a new
finditer() method that returns an iterator over all the non-overlapping
matches in a given string. (SRE is maintained by Fredrik Lundh. The
BIGCHARSET patch was contributed by Martin von Löwis.)
The smtplib module now supports RFC 2487, “Secure SMTP over TLS”, so
it’s now possible to encrypt the SMTP traffic between a Python program and the
mail transport agent being handed a message. smtplib also supports SMTP
authentication. (Contributed by Gerhard Häring.)
The imaplib module, maintained by Piers Lauder, has support for several
new extensions: the NAMESPACE extension defined in RFC 2342, SORT, GETACL and
SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)
The rfc822 module’s parsing of email addresses is now compliant with
RFC 2822, an update to RFC 822. (The module’s name is not going to be
changed to rfc2822.) A new package, email, has also been added for
parsing and generating e-mail messages. (Contributed by Barry Warsaw, and
arising out of his work on Mailman.)
The difflib module now contains a new Differ class for
producing human-readable lists of changes (a “delta”) between two sequences of
lines of text. There are also two generator functions, ndiff() and
restore(), which respectively return a delta from two sequences, or one of
the original sequences from a delta. (Grunt work contributed by David Goodger,
from ndiff.py code by Tim Peters who then did the generatorization.)
New constants ascii_letters, ascii_lowercase, and
ascii_uppercase were added to the string module. There were
several modules in the standard library that used string.letters to
mean the ranges A-Za-z, but that assumption is incorrect when locales are in
use, because string.letters varies depending on the set of legal
characters defined by the current locale. The buggy modules have all been fixed
to use ascii_letters instead. (Reported by an unknown person; fixed by
Fred L. Drake, Jr.)
The mimetypes module now makes it easier to use alternative MIME-type
databases by the addition of a MimeTypes class, which takes a list of
filenames to be parsed. (Contributed by Fred L. Drake, Jr.)
A Timer class was added to the threading module that allows
scheduling an activity to happen at some future time. (Contributed by Itamar
Shtull-Trauring.)
Some of the changes only affect people who deal with the Python interpreter at
the C level because they’re writing Python extension modules, embedding the
interpreter, or just hacking on the interpreter itself. If you only write Python
code, none of the changes described here will affect you very much.
Profiling and tracing functions can now be implemented in C, which can operate
at much higher speeds than Python-based functions and should reduce the overhead
of profiling and tracing. This will be of interest to authors of development
environments for Python. Two new C functions were added to Python’s API,
PyEval_SetProfile() and PyEval_SetTrace(). The existing
sys.setprofile() and sys.settrace() functions still exist, and have
simply been changed to use the new C-level interface. (Contributed by Fred L.
Drake, Jr.)
The C-level interface to the garbage collector has been changed to make it
easier to write extension types that support garbage collection and to debug
misuses of the functions. Various functions have slightly different semantics,
so a bunch of functions had to be renamed. Extensions that use the old API will
still compile but will not participate in garbage collection, so updating them
for 2.2 should be considered fairly high priority.
To upgrade an extension module to the new API, perform the following steps:
Remove PyGC_HEAD_SIZE() from object size calculations.
Remove calls to PyObject_AS_GC() and PyObject_FROM_GC().
A new et format sequence was added to PyArg_ParseTuple(); et
takes both a parameter and an encoding name, and converts the parameter to the
given encoding if the parameter turns out to be a Unicode string, or leaves it
alone if it’s an 8-bit string, assuming it to already be in the desired
encoding. This differs from the es format character, which assumes that
8-bit strings are in Python’s default ASCII encoding and converts them to the
specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS
support on Windows described in the following section.)
A different argument parsing function, PyArg_UnpackTuple(), has been
added that’s simpler and presumably faster. Instead of specifying a format
string, the caller simply gives the minimum and maximum number of arguments
expected, and a set of pointers to PyObject* variables that will be
filled in with argument values.
Two new flags METH_NOARGS and METH_O are available in method
definition tables to simplify implementation of methods with no arguments or a
single untyped argument. Calling such methods is more efficient than calling a
corresponding method that uses METH_VARARGS. Also, the old
METH_OLDARGS style of writing C methods is now officially deprecated.
Two new wrapper functions, PyOS_snprintf() and PyOS_vsnprintf()
were added to provide cross-platform implementations for the relatively new
snprintf() and vsnprintf() C lib APIs. In contrast to the standard
sprintf() and vsprintf() functions, the Python versions check the
bounds of the buffer used to protect against buffer overruns. (Contributed by
M.-A. Lemburg.)
The _PyTuple_Resize() function has lost an unused parameter, so now it
takes 2 parameters instead of 3. The third argument was never used, and can
simply be discarded when porting code from earlier versions to Python 2.2.
As usual there were a bunch of other improvements and bugfixes scattered
throughout the source tree. A search through the CVS change logs finds there
were 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1
applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82
bugs. These figures are likely to be underestimates.
Some of the more notable changes are:
The code for the MacOS port for Python, maintained by Jack Jansen, is now kept
in the main Python CVS tree, and many changes have been made to support MacOS X.
The most significant change is the ability to build Python as a framework,
enabled by supplying the --enable-framework option to the configure
script when compiling Python. According to Jack Jansen, “This installs a self-
contained Python installation plus the OS X framework “glue” into
/Library/Frameworks/Python.framework (or another location of choice).
For now there is little immediate added benefit to this (actually, there is the
disadvantage that you have to change your PATH to be able to find Python), but
it is the basis for creating a full-blown Python application, porting the
MacPython IDE, possibly using Python as a standard OSA scripting language and
much more.”
Most of the MacPython toolbox modules, which interface to MacOS APIs such as
windowing, QuickTime, scripting, etc. have been ported to OS X, but they’ve been
left commented out in setup.py. People who want to experiment with
these modules can uncomment them manually.
Keyword arguments passed to built-in functions that don’t take them now cause a
TypeError exception to be raised, with the message “function takes no
keyword arguments”.
Weak references, added in Python 2.1 as an extension module, are now part of
the core because they’re used in the implementation of new-style classes. The
ReferenceError exception has therefore moved from the weakref
module to become a built-in exception.
A new script, Tools/scripts/cleanfuture.py by Tim Peters,
automatically removes obsolete __future__ statements from Python source
code.
An additional flags argument has been added to the built-in function
compile(), so the behaviour of __future__ statements can now be
correctly observed in simulated shells, such as those presented by IDLE and
other development environments. This is described in PEP 264. (Contributed
by Michael Hudson.)
The new license introduced with Python 1.6 wasn’t GPL-compatible. This is
fixed by some minor textual changes to the 2.2 license, so it’s now legal to
embed Python inside a GPLed program again. Note that Python itself is not
GPLed, but instead is under a license that’s essentially equivalent to the BSD
license, same as it always was. The license changes were also applied to the
Python 2.0.1 and 2.1.1 releases.
When presented with a Unicode filename on Windows, Python will now convert it
to an MBCS encoded string, as used by the Microsoft file APIs. As MBCS is
explicitly used by the file APIs, Python’s choice of ASCII as the default
encoding turns out to be an annoyance. On Unix, the locale’s character set is
used if locale.nl_langinfo(CODESET)() is available. (Windows support was
contributed by Mark Hammond with assistance from Marc-André Lemburg. Unix
support was added by Martin von Löwis.)
Large file support is now enabled on Windows. (Contributed by Tim Peters.)
The Tools/scripts/ftpmirror.py script now parses a .netrc
file, if you have one. (Contributed by Mike Romberg.)
Some features of the object returned by the xrange() function are now
deprecated, and trigger warnings when they’re accessed; they’ll disappear in
Python 2.3. xrange objects tried to pretend they were full sequence
types by supporting slicing, sequence multiplication, and the in
operator, but these features were rarely used and therefore buggy. The
tolist() method and the start, stop, and step
attributes are also being deprecated. At the C level, the fourth argument to
the PyRange_New() function, repeat, has also been deprecated.
There were a bunch of patches to the dictionary implementation, mostly to fix
potential core dumps if a dictionary contains objects that sneakily changed
their hash value, or mutated the dictionary they were contained in. For a while
python-dev fell into a gentle rhythm of Michael Hudson finding a case that
dumped core, Tim Peters fixing the bug, Michael finding another case, and round
and round it went.
On Windows, Python can now be compiled with Borland C thanks to a number of
patches contributed by Stephen Hansen, though the result isn’t fully functional
yet. (But this is progress...)
Another Windows enhancement: Wise Solutions generously offered PythonLabs use
of their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers used
Wise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
Files ending in .pyw can now be imported on Windows. .pyw is a
Windows-only thing, used to indicate that a script needs to be run using
PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping
up to display the output. This patch makes it possible to import such scripts,
in case they’re also usable as modules. (Implemented by David Bolen.)
On platforms where Python uses the C dlopen() function to load
extension modules, it’s now possible to set the flags used by dlopen()
using the sys.getdlopenflags() and sys.setdlopenflags() functions.
(Contributed by Bram Stolk.)
The pow() built-in function no longer supports 3 arguments when
floating-point numbers are supplied. pow(x,y,z) returns (x**y)%z,
but this is never useful for floating point numbers, and the final result varies
unpredictably depending on the platform. A call such as pow(2.0,8.0,7.0)
will now raise a TypeError exception.
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Fred Bremmer,
Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger,
Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg,
Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore,
Gustavo Niemeyer, Don O’Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom
Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.
This article explains the new features in Python 2.1. While there aren’t as
many changes in 2.1 as there were in Python 2.0, there are still some pleasant
surprises in store. 2.1 is the first release to be steered through the use of
Python Enhancement Proposals, or PEPs, so most of the sizable changes have
accompanying PEPs that provide more complete documentation and a design
rationale for the change. This article doesn’t attempt to document the new
features completely, but simply provides an overview of the new features for
Python programmers. Refer to the Python 2.1 documentation, or to the specific
PEP, for more details about any new feature that particularly interests you.
One recent goal of the Python development team has been to accelerate the pace
of new releases, with a new release coming every 6 to 9 months. 2.1 is the first
release to come out at this faster pace, with the first alpha appearing in
January, 3 months after the final version of 2.0 was released.
The final release of Python 2.1 was made on April 17, 2001.
The largest change in Python 2.1 is to Python’s scoping rules. In Python 2.0,
at any given time there are at most three namespaces used to look up variable
names: local, module-level, and the built-in namespace. This often surprised
people because it didn’t match their intuitive expectations. For example, a
nested recursive function definition doesn’t work:
deff():...defg(value):...returng(value-1)+1...
The function g() will always raise a NameError exception, because
the binding of the name g isn’t in either its local namespace or in the
module-level namespace. This isn’t much of a problem in practice (how often do
you recursively define interior functions like this?), but this also made using
the lambda statement clumsier, and this was a problem in practice.
In code which uses lambda you can often find local variables being
copied by passing them as the default values of arguments.
deffind(self,name):"Return list of any entries equal to 'name'"L=filter(lambdax,name=name:x==name,self.list_attribute)returnL
The readability of Python code written in a strongly functional style suffers
greatly as a result.
The most significant change to Python 2.1 is that static scoping has been added
to the language to fix this problem. As a first effect, the name=name
default argument is now unnecessary in the above example. Put simply, when a
given variable name is not assigned a value within a function (by an assignment,
or the def, class, or import statements),
references to the variable will be looked up in the local namespace of the
enclosing scope. A more detailed explanation of the rules, and a dissection of
the implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the same
variable name is used both at the module level and as a local variable within a
function that contains further function definitions. This seems rather unlikely
though, since such code would have been pretty confusing to read in the first
place.
One side effect of the change is that the frommoduleimport* and
exec statements have been made illegal inside a function scope under
certain conditions. The Python reference manual has said all along that frommoduleimport* is only legal at the top level of a module, but the CPython
interpreter has never enforced this before. As part of the implementation of
nested scopes, the compiler which turns Python source into bytecodes has to
generate different code to access variables in a containing scope. frommoduleimport* and exec make it impossible for the compiler to
figure this out, because they add names to the local namespace that are
unknowable at compile time. Therefore, if a function contains function
definitions or lambda expressions with free variables, the compiler
will flag this by raising a SyntaxError exception.
To make the preceding explanation a bit clearer, here’s an example:
x=1deff():# The next line is a syntax errorexec'x=2'defg():returnx
Line 4 containing the exec statement is a syntax error, since
exec would define a new local variable named x whose value should
be accessed by g().
This shouldn’t be much of a limitation, since exec is rarely used in
most Python code (and when it is used, it’s often a sign of a poor design
anyway).
Compatibility concerns have led to nested scopes being introduced gradually; in
Python 2.1, they aren’t enabled by default, but can be turned on within a module
by using a future statement as described in PEP 236. (See the following section
for further discussion of PEP 236.) In Python 2.2, nested scopes will become
the default and there will be no way to turn them off, but users will have had
all of 2.1’s lifetime to fix any breakage resulting from their introduction.
The reaction to nested scopes was widespread concern about the dangers of
breaking code with the 2.1 release, and it was strong enough to make the
Pythoneers take a more conservative approach. This approach consists of
introducing a convention for enabling optional functionality in release N that
will become compulsory in release N+1.
The syntax uses a from...import statement using the reserved module name
__future__. Nested scopes can be enabled by the following statement:
from__future__importnested_scopes
While it looks like a normal import statement, it’s not; there are
strict rules on where such a future statement can be put. They can only be at
the top of a module, and must precede any Python code or regular
import statements. This is because such statements can affect how
the Python bytecode compiler parses code and generates bytecode, so they must
precede any statement that will result in bytecodes being produced.
In earlier versions, Python’s support for implementing comparisons on user-
defined classes and extension types was quite simple. Classes could implement a
__cmp__() method that was given two instances of a class, and could only
return 0 if they were equal or +1 or -1 if they weren’t; the method couldn’t
raise an exception or return anything other than a Boolean value. Users of
Numeric Python often found this model too weak and restrictive, because in the
number-crunching programs that numeric Python is used for, it would be more
useful to be able to perform elementwise comparisons of two matrices, returning
a matrix containing the results of a given comparison for each element. If the
two matrices are of different sizes, then the compare has to be able to raise an
exception to signal the error.
In Python 2.1, rich comparisons were added in order to support this need.
Python classes can now individually overload each of the <, <=, >,
>=, ==, and != operations. The new magic method names are:
(The magic methods are named after the corresponding Fortran operators .LT..
.LE., &c. Numeric programmers are almost certainly quite familiar with
these names and will find them easy to remember.)
Each of these magic methods is of the form method(self,other), where
self will be the object on the left-hand side of the operator, while
other will be the object on the right-hand side. For example, the
expression A<B will cause A.__lt__(B) to be called.
Each of these magic methods can return anything at all: a Boolean, a matrix, a
list, or any other Python object. Alternatively they can raise an exception if
the comparison is impossible, inconsistent, or otherwise meaningless.
The built-in cmp(A,B)() function can use the rich comparison machinery,
and now accepts an optional argument specifying which comparison operation to
use; this is given as one of the strings "<", "<=", ">", ">=",
"==", or "!=". If called without the optional third argument,
cmp() will only return -1, 0, or +1 as in previous versions of Python;
otherwise it will call the appropriate method and can return any Python object.
There are also corresponding changes of interest to C programmers; there’s a new
slot tp_richcmp in type objects and an API for performing a given rich
comparison. I won’t cover the C API here, but will refer you to PEP 207, or to
2.1’s C API documentation, for the full list of related functions.
Over its 10 years of existence, Python has accumulated a certain number of
obsolete modules and features along the way. It’s difficult to know when a
feature is safe to remove, since there’s no way of knowing how much code uses it
— perhaps no programs depend on the feature, or perhaps many do. To enable
removing old features in a more structured way, a warning framework was added.
When the Python developers want to get rid of a feature, it will first trigger a
warning in the next version of Python. The following Python version can then
drop the feature, and users will have had a full release cycle to remove uses of
the old feature.
Python 2.1 adds the warning framework to be used in this scheme. It adds a
warnings module that provide functions to issue warnings, and to filter
out warnings that you don’t want to be displayed. Third-party modules can also
use this framework to deprecate old features that they no longer wish to
support.
For example, in Python 2.1 the regex module is deprecated, so importing
it causes a warning to be printed:
>>> importregex__main__:1: DeprecationWarning: the regex module is deprecated; please use the re module>>>
Warnings can be issued by calling the warnings.warn() function:
warnings.warn("feature X no longer supported")
The first parameter is the warning message; an additional optional parameters
can be used to specify a particular warning category.
Filters can be added to disable certain warnings; a regular expression pattern
can be applied to the message or to the module name in order to suppress a
warning. For example, you may have a program that uses the regex module
and not want to spare the time to convert it to use the re module right
now. The warning can be suppressed by calling
importwarningswarnings.filterwarnings(action='ignore',message='.*regex module is deprecated',category=DeprecationWarning,module='__main__')
This adds a filter that will apply only to warnings of the class
DeprecationWarning triggered in the __main__ module, and applies
a regular expression to only match the message about the regex module
being deprecated, and will cause such warnings to be ignored. Warnings can also
be printed only once, printed every time the offending code is executed, or
turned into exceptions that will cause the program to stop (unless the
exceptions are caught in the usual way, of course).
Functions were also added to Python’s C API for issuing warnings; refer to PEP
230 or to Python’s API documentation for the details.
Written by Paul Prescod, to specify procedures to be followed when removing old
features from Python. The policy described in this PEP hasn’t been officially
adopted, but the eventual policy probably won’t be too different from Prescod’s
proposal.
When compiling Python, the user had to go in and edit the Modules/Setup
file in order to enable various additional modules; the default set is
relatively small and limited to modules that compile on most Unix platforms.
This means that on Unix platforms with many more features, most notably Linux,
Python installations often don’t contain all useful modules they could.
Python 2.0 added the Distutils, a set of modules for distributing and installing
extensions. In Python 2.1, the Distutils are used to compile much of the
standard library of extension modules, autodetecting which ones are supported on
the current machine. It’s hoped that this will make Python installations easier
and more featureful.
Instead of having to edit the Modules/Setup file in order to enable
modules, a setup.py script in the top directory of the Python source
distribution is run at build time, and attempts to discover which modules can be
enabled by examining the modules and header files on the system. If a module is
configured in Modules/Setup, the setup.py script won’t attempt
to compile that module and will defer to the Modules/Setup file’s
contents. This provides a way to specific any strange command-line flags or
libraries that are required for a specific platform.
In another far-reaching change to the build mechanism, Neil Schemenauer
restructured things so Python now uses a single makefile that isn’t recursive,
instead of makefiles in the top directory and in each of the Python/,
Parser/, Objects/, and Modules/ subdirectories. This
makes building Python faster and also makes hacking the Makefiles clearer and
simpler.
Weak references, available through the weakref module, are a minor but
useful new data type in the Python programmer’s toolbox.
Storing a reference to an object (say, in a dictionary or a list) has the side
effect of keeping that object alive forever. There are a few specific cases
where this behaviour is undesirable, object caches being the most common one,
and another being circular references in data structures such as trees.
For example, consider a memoizing function that caches the results of another
function f(x)() by storing the function’s argument and its result in a
dictionary:
_cache={}defmemoize(x):if_cache.has_key(x):return_cache[x]retval=f(x)# Cache the returned object_cache[x]=retvalreturnretval
This version works for simple things such as integers, but it has a side effect;
the _cache dictionary holds a reference to the return values, so they’ll
never be deallocated until the Python process exits and cleans up This isn’t
very noticeable for integers, but if f() returns an object, or a data
structure that takes up a lot of memory, this can be a problem.
Weak references provide a way to implement a cache that won’t keep objects alive
beyond their time. If an object is only accessible through weak references, the
object will be deallocated and the weak references will now indicate that the
object it referred to no longer exists. A weak reference to an object obj is
created by calling wr=weakref.ref(obj). The object being referred to is
returned by calling the weak reference as if it were a function: wr(). It
will return the referenced object, or None if the object no longer exists.
This makes it possible to write a memoize() function whose cache doesn’t
keep objects alive, by storing weak references in the cache.
_cache={}defmemoize(x):if_cache.has_key(x):obj=_cache[x]()# If weak reference object still exists,# return itifobjisnotNone:returnobjretval=f(x)# Cache a weak reference_cache[x]=weakref.ref(retval)returnretval
The weakref module also allows creating proxy objects which behave like
weak references — an object referenced only by proxy objects is deallocated –
but instead of requiring an explicit call to retrieve the object, the proxy
transparently forwards all operations to the object as long as the object still
exists. If the object is deallocated, attempting to use a proxy will cause a
weakref.ReferenceError exception to be raised.
proxy=weakref.proxy(obj)proxy.attr# Equivalent to obj.attrproxy.meth()# Equivalent to obj.meth()delobjproxy.attr# raises weakref.ReferenceError
In Python 2.1, functions can now have arbitrary information attached to them.
People were often using docstrings to hold information about functions and
methods, because the __doc__ attribute was the only way of attaching any
information to a function. For example, in the Zope Web application server,
functions are marked as safe for public access by having a docstring, and in
John Aycock’s SPARK parsing framework, docstrings hold parts of the BNF grammar
to be parsed. This overloading is unfortunate, since docstrings are really
intended to hold a function’s documentation; for example, it means you can’t
properly document functions intended for private use in Zope.
Arbitrary attributes can now be set and retrieved on functions using the regular
Python syntax:
deff():passf.publish=1f.secure=1f.grammar="A ::= B (C D)*"
The dictionary containing attributes can be accessed as the function’s
__dict__. Unlike the __dict__ attribute of class instances, in
functions you can actually assign a new dictionary to __dict__, though
the new value is restricted to a regular Python dictionary; you can’t be
tricky and set it to a UserDict instance, or any other random object
that behaves like a mapping.
PEP 235: Importing Modules on Case-Insensitive Platforms¶
Some operating systems have filesystems that are case-insensitive, MacOS and
Windows being the primary examples; on these systems, it’s impossible to
distinguish the filenames FILE.PY and file.py, even though they do store
the file’s name in its original case (they’re case-preserving, too).
In Python 2.1, the import statement will work to simulate case-
sensitivity on case-insensitive platforms. Python will now search for the first
case-sensitive match by default, raising an ImportError if no such file
is found, so importfile will not import a module named FILE.PY. Case-
insensitive matching can be requested by setting the PYTHONCASEOK
environment variable before starting the Python interpreter.
When using the Python interpreter interactively, the output of commands is
displayed using the built-in repr() function. In Python 2.1, the variable
sys.displayhook() can be set to a callable object which will be called
instead of repr(). For example, you can set it to a special pretty-
printing function:
>>> # Create a recursive data structure... L=[1,2,3]>>> L.append(L)>>> L# Show Python's default output[1, 2, 3, [...]]>>> # Use pprint.pprint() as the display function... importsys,pprint>>> sys.displayhook=pprint.pprint>>> L[1, 2, 3, <Recursion on list with id=135143996>]>>>
How numeric coercion is done at the C level was significantly modified. This
will only affect the authors of C extensions to Python, allowing them more
flexibility in writing extension types that support numeric operations.
Extension types can now set the type flag Py_TPFLAGS_CHECKTYPES in their
PyTypeObject structure to indicate that they support the new coercion model.
In such extension types, the numeric slot functions can no longer assume that
they’ll be passed two arguments of the same type; instead they may be passed two
arguments of differing types, and can then perform their own internal coercion.
If the slot function is passed a type it can’t handle, it can indicate the
failure by returning a reference to the Py_NotImplemented singleton value.
The numeric functions of the other type will then be tried, and perhaps they can
handle the operation; if the other type also returns Py_NotImplemented, then
a TypeError will be raised. Numeric methods written in Python can also
return Py_NotImplemented, causing the interpreter to act as if the method
did not exist (perhaps raising a TypeError, perhaps trying another
object’s numeric methods).
Written and implemented by Neil Schemenauer, heavily based upon earlier work by
Marc-André Lemburg. Read this to understand the fine points of how numeric
operations will now be processed at the C level.
A common complaint from Python users is that there’s no single catalog of all
the Python modules in existence. T. Middleton’s Vaults of Parnassus at
http://www.vex.net/parnassus/ are the largest catalog of Python modules, but
registering software at the Vaults is optional, and many people don’t bother.
As a first small step toward fixing the problem, Python software packaged using
the Distutils sdist command will include a file named
PKG-INFO containing information about the package such as its name,
version, and author (metadata, in cataloguing terminology). PEP 241 contains
the full list of fields that can be present in the PKG-INFO file. As
people began to package their software using Python 2.1, more and more packages
will include metadata, making it possible to build automated cataloguing systems
and experiment with them. With the result experience, perhaps it’ll be possible
to design a really good catalog and then build support for it into Python 2.2.
For example, the Distutils sdist and bdist_* commands
could support a upload option that would automatically upload your
package to a catalog server.
You can start creating packages containing PKG-INFO even if you’re not
using Python 2.1, since a new release of the Distutils will be made for users of
earlier Python versions. Version 1.0.2 of the Distutils includes the changes
described in PEP 241, as well as various bugfixes and enhancements. It will be
available from the Distutils SIG at http://www.python.org/sigs/distutils-sig/.
Ka-Ping Yee contributed two new modules: inspect.py, a module for
getting information about live Python code, and pydoc.py, a module for
interactively converting docstrings to HTML or text. As a bonus,
Tools/scripts/pydoc, which is now automatically installed, uses
pydoc.py to display documentation given a Python module, package, or
class name. For example, pydocxml.dom displays the following:
pydoc also includes a Tk-based interactive help browser. pydoc
quickly becomes addictive; try it out!
Two different modules for unit testing were added to the standard library.
The doctest module, contributed by Tim Peters, provides a testing
framework based on running embedded examples in docstrings and comparing the
results against the expected output. PyUnit, contributed by Steve Purcell, is a
unit testing framework inspired by JUnit, which was in turn an adaptation of
Kent Beck’s Smalltalk testing framework. See http://pyunit.sourceforge.net/ for
more information about PyUnit.
The difflib module contains a class, SequenceMatcher, which
compares two sequences and computes the changes required to transform one
sequence into the other. For example, this module can be used to write a tool
similar to the Unix diff program, and in fact the sample program
Tools/scripts/ndiff.py demonstrates how to write such a script.
curses.panel, a wrapper for the panel library, part of ncurses and of
SYSV curses, was contributed by Thomas Gellekum. The panel library provides
windows with the additional feature of depth. Windows can be moved higher or
lower in the depth ordering, and the panel library figures out where panels
overlap and which sections are visible.
The PyXML package has gone through a few releases since Python 2.0, and Python
2.1 includes an updated version of the xml package. Some of the
noteworthy changes include support for Expat 1.2 and later versions, the ability
for Expat parsers to handle files in any encoding supported by Python, and
various bugfixes for SAX, DOM, and the minidom module.
Ping also contributed another hook for handling uncaught exceptions.
sys.excepthook() can be set to a callable object. When an exception isn’t
caught by any try...except blocks, the exception will be
passed to sys.excepthook(), which can then do whatever it likes. At the
Ninth Python Conference, Ping demonstrated an application for this hook:
printing an extended traceback that not only lists the stack frames, but also
lists the function arguments and the local variables for each frame.
Various functions in the time module, such as asctime() and
localtime(), require a floating point argument containing the time in
seconds since the epoch. The most common use of these functions is to work with
the current time, so the floating point argument has been made optional; when a
value isn’t provided, the current time will be used. For example, log file
entries usually need a string containing the current time; in Python 2.1,
time.asctime() can be used, instead of the lengthier
time.asctime(time.localtime(time.time())) that was previously required.
This change was proposed and implemented by Thomas Wouters.
The ftplib module now defaults to retrieving files in passive mode,
because passive mode is more likely to work from behind a firewall. This
request came from the Debian bug tracking system, since other Debian packages
use ftplib to retrieve files and then don’t work from behind a firewall.
It’s deemed unlikely that this will cause problems for anyone, because Netscape
defaults to passive mode and few people complain, but if passive mode is
unsuitable for your application or network setup, call set_pasv(0)() on
FTP objects to disable passive mode.
Support for raw socket access has been added to the socket module,
contributed by Grant Edwards.
The pstats module now contains a simple interactive statistics browser
for displaying timing profiles for Python programs, invoked when the module is
run as a script. Contributed by Eric S. Raymond.
A new implementation-dependent function, sys._getframe([depth])(), has
been added to return a given frame object from the current call stack.
sys._getframe() returns the frame at the top of the call stack; if the
optional integer argument depth is supplied, the function returns the frame
that is depth calls below the top of the stack. For example,
sys._getframe(1) returns the caller’s frame object.
This function is only present in CPython, not in Jython or the .NET
implementation. Use it for debugging, and resist the temptation to put it into
production code.
There were relatively few smaller changes made in Python 2.1 due to the shorter
release cycle. A search through the CVS change logs turns up 117 patches
applied, and 136 bugs fixed; both figures are likely to be underestimates. Some
of the more notable changes are:
A specialized object allocator is now optionally available, that should be
faster than the system malloc() and have less memory overhead. The
allocator uses C’s malloc() function to get large pools of memory, and
then fulfills smaller memory requests from these pools. It can be enabled by
providing the --with-pymalloc option to the configure
script; see Objects/obmalloc.c for the implementation details.
Authors of C extension modules should test their code with the object allocator
enabled, because some incorrect code may break, causing core dumps at runtime.
There are a bunch of memory allocation functions in Python’s C API that have
previously been just aliases for the C library’s malloc() and
free(), meaning that if you accidentally called mismatched functions, the
error wouldn’t be noticeable. When the object allocator is enabled, these
functions aren’t aliases of malloc() and free() any more, and
calling the wrong function to free memory will get you a core dump. For
example, if memory was allocated using PyMem_New(), it has to be freed
using PyMem_Del(), not free(). A few modules included with Python
fell afoul of this and had to be fixed; doubtless there are more third-party
modules that will have the same problem.
The object allocator was contributed by Vladimir Marangozov.
The speed of line-oriented file I/O has been improved because people often
complain about its lack of speed, and because it’s often been used as a naïve
benchmark. The readline() method of file objects has therefore been
rewritten to be much faster. The exact amount of the speedup will vary from
platform to platform depending on how slow the C library’s getc() was, but
is around 66%, and potentially much faster on some particular operating systems.
Tim Peters did much of the benchmarking and coding for this change, motivated by
a discussion in comp.lang.python.
A new module and method for file objects was also added, contributed by Jeff
Epler. The new method, xreadlines(), is similar to the existing
xrange() built-in. xreadlines() returns an opaque sequence object
that only supports being iterated over, reading a line on every iteration but
not reading the entire file into memory as the existing readlines() method
does. You’d use it like this:
forlineinsys.stdin.xreadlines():# ... do something for each line ......
A new method, popitem(), was added to dictionaries to enable
destructively iterating through the contents of a dictionary; this can be faster
for large dictionaries because there’s no need to construct a list containing
all the keys or values. D.popitem() removes a random (key,value) pair
from the dictionary D and returns it as a 2-tuple. This was implemented
mostly by Tim Peters and Guido van Rossum, after a suggestion and preliminary
patch by Moshe Zadka.
Modules can now control which names are imported when frommoduleimport*
is used, by defining an __all__ attribute containing a list of names that
will be imported. One common complaint is that if the module imports other
modules such as sys or string, frommoduleimport* will add
them to the importing module’s namespace. To fix this, simply list the public
names in __all__:
# List public names__all__=['Database','open']
A stricter version of this patch was first suggested and implemented by Ben
Wolfson, but after some python-dev discussion, a weaker final version was
checked in.
Applying repr() to strings previously used octal escapes for
non-printable characters; for example, a newline was '\012'. This was a
vestigial trace of Python’s C ancestry, but today octal is of very little
practical use. Ka-Ping Yee suggested using hex escapes instead of octal ones,
and using the \n, \t, \r escapes for the appropriate characters,
and implemented this new formatting.
Syntax errors detected at compile-time can now raise exceptions containing the
filename and line number of the error, a pleasant side effect of the compiler
reorganization done by Jeremy Hylton.
C extensions which import other modules have been changed to use
PyImport_ImportModule(), which means that they will use any import hooks
that have been installed. This is also encouraged for third-party extensions
that need to import some other module from C code.
The size of the Unicode character database was shrunk by another 340K thanks
to Fredrik Lundh.
Some new ports were contributed: MacOS X (by Steven Majewski), Cygwin (by
Jason Tishler); RISCOS (by Dietmar Schwertberger); Unixware 7 (by Billy G.
Allie).
And there’s the usual list of minor bugfixes, minor memory leaks, docstring
edits, and other tweaks, too lengthy to be worth itemizing; see the CVS logs for
the full details if you want them.
The author would like to thank the following people for offering suggestions on
various drafts of this article: Graeme Cross, David Goodger, Jay Graves, Michael
Hudson, Marc-André Lemburg, Fredrik Lundh, Neil Schemenauer, Thomas Wouters.
A new release of Python, version 2.0, was released on October 16, 2000. This
article covers the exciting new features in 2.0, highlights some other useful
changes, and points out a few incompatible changes that may require rewriting
code.
Python’s development never completely stops between releases, and a steady flow
of bug fixes and improvements are always being submitted. A host of minor fixes,
a few optimizations, additional docstrings, and better error messages went into
2.0; to list them all would be impossible, but they’re certainly significant.
Consult the publicly-available CVS logs if you want to see the full list. This
progress is due to the five developers working for PythonLabs are now getting
paid to spend their days fixing bugs, and also due to the improved communication
resulting from moving to SourceForge.
Python 1.6 can be thought of as the Contractual Obligations Python release.
After the core development team left CNRI in May 2000, CNRI requested that a 1.6
release be created, containing all the work on Python that had been performed at
CNRI. Python 1.6 therefore represents the state of the CVS tree as of May 2000,
with the most significant new feature being Unicode support. Development
continued after May, of course, so the 1.6 tree received a few fixes to ensure
that it’s forward-compatible with Python 2.0. 1.6 is therefore part of Python’s
evolution, and not a side branch.
So, should you take much interest in Python 1.6? Probably not. The 1.6final
and 2.0beta1 releases were made on the same day (September 5, 2000), the plan
being to finalize Python 2.0 within a month or so. If you have applications to
maintain, there seems little point in breaking things by moving to 1.6, fixing
them, and then having another round of breakage within a month by moving to 2.0;
you’re better off just going straight to 2.0. Most of the really interesting
features described in this document are only in 2.0, because a lot of work was
done between May and September.
The most important change in Python 2.0 may not be to the code at all, but to
how Python is developed: in May 2000 the Python developers began using the tools
made available by SourceForge for storing source code, tracking bug reports,
and managing the queue of patch submissions. To report bugs or submit patches
for Python 2.0, use the bug tracking and patch manager tools available from
Python’s project page, located at http://sourceforge.net/projects/python/.
The most important of the services now hosted at SourceForge is the Python CVS
tree, the version-controlled repository containing the source code for Python.
Previously, there were roughly 7 or so people who had write access to the CVS
tree, and all patches had to be inspected and checked in by one of the people on
this short list. Obviously, this wasn’t very scalable. By moving the CVS tree
to SourceForge, it became possible to grant write access to more people; as of
September 2000 there were 27 people able to check in changes, a fourfold
increase. This makes possible large-scale changes that wouldn’t be attempted if
they’d have to be filtered through the small group of core developers. For
example, one day Peter Schneider-Kamp took it into his head to drop K&R C
compatibility and convert the C source for Python to ANSI C. After getting
approval on the python-dev mailing list, he launched into a flurry of checkins
that lasted about a week, other developers joined in to help, and the job was
done. If there were only 5 people with write access, probably that task would
have been viewed as “nice, but not worth the time and effort needed” and it
would never have gotten done.
The shift to using SourceForge’s services has resulted in a remarkable increase
in the speed of development. Patches now get submitted, commented on, revised
by people other than the original submitter, and bounced back and forth between
people until the patch is deemed worth checking in. Bugs are tracked in one
central location and can be assigned to a specific person for fixing, and we can
count the number of open bugs to measure progress. This didn’t come without a
cost: developers now have more e-mail to deal with, more mailing lists to
follow, and special tools had to be written for the new environment. For
example, SourceForge sends default patch and bug notification e-mail messages
that are completely unhelpful, so Ka-Ping Yee wrote an HTML screen-scraper that
sends more useful messages.
The ease of adding code caused a few initial growing pains, such as code was
checked in before it was ready or without getting clear agreement from the
developer group. The approval process that has emerged is somewhat similar to
that used by the Apache group. Developers can vote +1, +0, -0, or -1 on a patch;
+1 and -1 denote acceptance or rejection, while +0 and -0 mean the developer is
mostly indifferent to the change, though with a slight positive or negative
slant. The most significant change from the Apache model is that the voting is
essentially advisory, letting Guido van Rossum, who has Benevolent Dictator For
Life status, know what the general opinion is. He can still ignore the result of
a vote, and approve or reject a change even if the community disagrees with him.
Producing an actual patch is the last step in adding a new feature, and is
usually easy compared to the earlier task of coming up with a good design.
Discussions of new features can often explode into lengthy mailing list threads,
making the discussion hard to follow, and no one can read every posting to
python-dev. Therefore, a relatively formal process has been set up to write
Python Enhancement Proposals (PEPs), modelled on the Internet RFC process. PEPs
are draft documents that describe a proposed new feature, and are continually
revised until the community reaches a consensus, either accepting or rejecting
the proposal. Quoting from the introduction to PEP 1, “PEP Purpose and
Guidelines”:
PEP stands for Python Enhancement Proposal. A PEP is a design document
providing information to the Python community, or describing a new feature for
Python. The PEP should provide a concise technical specification of the feature
and a rationale for the feature.
We intend PEPs to be the primary mechanisms for proposing new features, for
collecting community input on an issue, and for documenting the design decisions
that have gone into Python. The PEP author is responsible for building
consensus within the community and documenting dissenting opinions.
Read the rest of PEP 1 for the details of the PEP editorial process, style, and
format. PEPs are kept in the Python CVS tree on SourceForge, though they’re not
part of the Python 2.0 distribution, and are also available in HTML form from
http://www.python.org/peps/. As of September 2000, there are 25 PEPS, ranging
from PEP 201, “Lockstep Iteration”, to PEP 225, “Elementwise/Objectwise
Operators”.
The largest new feature in Python 2.0 is a new fundamental data type: Unicode
strings. Unicode uses 16-bit numbers to represent characters instead of the
8-bit number used by ASCII, meaning that 65,536 distinct characters can be
supported.
The final interface for Unicode support was arrived at through countless often-
stormy discussions on the python-dev mailing list, and mostly implemented by
Marc-André Lemburg, based on a Unicode string type implementation by Fredrik
Lundh. A detailed explanation of the interface was written up as PEP 100,
“Python Unicode Integration”. This article will simply cover the most
significant points about the Unicode interfaces.
In Python source code, Unicode strings are written as u"string". Arbitrary
Unicode characters can be written using a new escape sequence, \uHHHH, where
HHHH is a 4-digit hexadecimal number from 0000 to FFFF. The existing
\xHHHH escape sequence can also be used, and octal escapes can be used for
characters up to U+01FF, which is represented by \777.
Unicode strings, just like regular strings, are an immutable sequence type.
They can be indexed and sliced, but not modified in place. Unicode strings have
an encode([encoding]) method that returns an 8-bit string in the desired
encoding. Encodings are named by strings, such as 'ascii', 'utf-8',
'iso-8859-1', or whatever. A codec API is defined for implementing and
registering new encodings that are then available throughout a Python program.
If an encoding isn’t specified, the default encoding is usually 7-bit ASCII,
though it can be changed for your Python installation by calling the
sys.setdefaultencoding(encoding)() function in a customised version of
site.py.
Combining 8-bit and Unicode strings always coerces to Unicode, using the default
ASCII encoding; the result of 'a'+u'bc' is u'abc'.
New built-in functions have been added, and existing built-ins modified to
support Unicode:
unichr(ch) returns a Unicode string 1 character long, containing the
character ch.
ord(u), where u is a 1-character regular or Unicode string, returns the
number of the character as an integer.
unicode(string[,encoding][,errors]) creates a Unicode string
from an 8-bit string. encoding is a string naming the encoding to use. The
errors parameter specifies the treatment of characters that are invalid for
the current encoding; passing 'strict' as the value causes an exception to
be raised on any encoding error, while 'ignore' causes errors to be silently
ignored and 'replace' uses U+FFFD, the official replacement character, in
case of any problems.
The exec statement, and various built-ins such as eval(),
getattr(), and setattr() will also accept Unicode strings as well as
regular strings. (It’s possible that the process of fixing this missed some
built-ins; if you find a built-in function that accepts strings but doesn’t
accept Unicode strings at all, please report it as a bug.)
A new module, unicodedata, provides an interface to Unicode character
properties. For example, unicodedata.category(u'A') returns the 2-character
string ‘Lu’, the ‘L’ denoting it’s a letter, and ‘u’ meaning that it’s
uppercase. unicodedata.bidirectional(u'\u0660') returns ‘AN’, meaning that
U+0660 is an Arabic number.
The codecs module contains functions to look up existing encodings and
register new ones. Unless you want to implement a new encoding, you’ll most
often use the codecs.lookup(encoding)() function, which returns a
4-element tuple: (encode_func,decode_func,stream_reader,stream_writer).
encode_func is a function that takes a Unicode string, and returns a 2-tuple
(string,length). string is an 8-bit string containing a portion (perhaps
all) of the Unicode string converted into the given encoding, and length tells
you how much of the Unicode string was converted.
decode_func is the opposite of encode_func, taking an 8-bit string and
returning a 2-tuple (ustring,length), consisting of the resulting Unicode
string ustring and the integer length telling how much of the 8-bit string
was consumed.
stream_reader is a class that supports decoding input from a stream.
stream_reader(file_obj) returns an object that supports the read(),
readline(), and readlines() methods. These methods will all
translate from the given encoding and return Unicode strings.
stream_writer, similarly, is a class that supports encoding output to a
stream. stream_writer(file_obj) returns an object that supports the
write() and writelines() methods. These methods expect Unicode
strings, translating them to the given encoding on output.
For example, the following code writes a Unicode string into a file, encoding
it as UTF-8:
Unicode-aware regular expressions are available through the re module,
which has a new underlying implementation called SRE written by Fredrik Lundh of
Secret Labs AB.
A -U command line option was added which causes the Python compiler to
interpret all string literals as Unicode string literals. This is intended to be
used in testing and future-proofing your Python code, since some future version
of Python may drop support for 8-bit strings and provide only Unicode strings.
Lists are a workhorse data type in Python, and many programs manipulate a list
at some point. Two common operations on lists are to loop over them, and either
pick out the elements that meet a certain criterion, or apply some function to
each element. For example, given a list of strings, you might want to pull out
all the strings containing a given substring, or strip off trailing whitespace
from each line.
The existing map() and filter() functions can be used for this
purpose, but they require a function as one of their arguments. This is fine if
there’s an existing built-in function that can be passed directly, but if there
isn’t, you have to create a little function to do the required work, and
Python’s scoping rules make the result ugly if the little function needs
additional information. Take the first example in the previous paragraph,
finding all the strings in the list containing a given substring. You could
write the following to do it:
# Given the list L, make a list of all strings# containing the substring S.sublist=filter(lambdas,substring=S:string.find(s,substring)!=-1,L)
Because of Python’s scoping rules, a default argument is used so that the
anonymous function created by the lambda statement knows what
substring is being searched for. List comprehensions make this cleaner:
The for...in clauses contain the sequences to be
iterated over. The sequences do not have to be the same length, because they
are not iterated over in parallel, but from left to right; this is explained
more clearly in the following paragraphs. The elements of the generated list
will be the successive values of expression. The final if clause
is optional; if present, expression is only evaluated and added to the result
if condition is true.
To make the semantics very clear, a list comprehension is equivalent to the
following Python code:
forexpr1insequence1:forexpr2insequence2:...forexprNinsequenceN:if(condition):# Append the value of# the expression to the# resulting list.
This means that when there are multiple for...in
clauses, the resulting list will be equal to the product of the lengths of all
the sequences. If you have two lists of length 3, the output list is 9 elements
long:
To avoid introducing an ambiguity into Python’s grammar, if expression is
creating a tuple, it must be surrounded with parentheses. The first list
comprehension below is a syntax error, while the second one is correct:
The idea of list comprehensions originally comes from the functional programming
language Haskell (http://www.haskell.org). Greg Ewing argued most effectively
for adding them to Python and wrote the initial list comprehension patch, which
was then discussed for a seemingly endless time on the python-dev mailing list
and kept up-to-date by Skip Montanaro.
Augmented assignment operators, another long-requested feature, have been added
to Python 2.0. Augmented assignment operators include +=, -=, *=,
and so forth. For example, the statement a+=2 increments the value of the
variable a by 2, equivalent to the slightly lengthier a=a+2.
The full list of supported assignment operators is +=, -=, *=,
/=, %=, **=, &=, |=, ^=, >>=, and <<=. Python
classes can override the augmented assignment operators by defining methods
named __iadd__(), __isub__(), etc. For example, the following
Number class stores a number and supports using += to create a new
instance with an incremented value.
The __iadd__() special method is called with the value of the increment,
and should return a new instance with an appropriately modified value; this
return value is bound as the new value of the variable on the left-hand side.
Augmented assignment operators were first introduced in the C programming
language, and most C-derived languages, such as awk, C++, Java, Perl,
and PHP also support them. The augmented assignment patch was implemented by
Thomas Wouters.
Until now string-manipulation functionality was in the string module,
which was usually a front-end for the strop module written in C. The
addition of Unicode posed a difficulty for the strop module, because the
functions would all need to be rewritten in order to accept either 8-bit or
Unicode strings. For functions such as string.replace(), which takes 3
string arguments, that means eight possible permutations, and correspondingly
complicated code.
Instead, Python 2.0 pushes the problem onto the string type, making string
manipulation functionality available through methods on both 8-bit strings and
Unicode strings.
One thing that hasn’t changed, a noteworthy April Fools’ joke notwithstanding,
is that Python strings are immutable. Thus, the string methods return new
strings, and do not modify the string on which they operate.
The old string module is still around for backwards compatibility, but it
mostly acts as a front-end to the new string methods.
Two methods which have no parallel in pre-2.0 versions, although they did exist
in JPython for quite some time, are startswith() and endswith().
s.startswith(t) is equivalent to s[:len(t)]==t, while
s.endswith(t) is equivalent to s[-len(t):]==t.
One other method which deserves special mention is join(). The
join() method of a string receives one parameter, a sequence of strings,
and is equivalent to the string.join() function from the old string
module, with the arguments reversed. In other words, s.join(seq) is
equivalent to the old string.join(seq,s).
The C implementation of Python uses reference counting to implement garbage
collection. Every Python object maintains a count of the number of references
pointing to itself, and adjusts the count as references are created or
destroyed. Once the reference count reaches zero, the object is no longer
accessible, since you need to have a reference to an object to access it, and if
the count is zero, no references exist any longer.
Reference counting has some pleasant properties: it’s easy to understand and
implement, and the resulting implementation is portable, fairly fast, and reacts
well with other libraries that implement their own memory handling schemes. The
major problem with reference counting is that it sometimes doesn’t realise that
objects are no longer accessible, resulting in a memory leak. This happens when
there are cycles of references.
Consider the simplest possible cycle, a class instance which has a reference to
itself:
instance=SomeClass()instance.myself=instance
After the above two lines of code have been executed, the reference count of
instance is 2; one reference is from the variable named 'instance', and
the other is from the myself attribute of the instance.
If the next line of code is delinstance, what happens? The reference count
of instance is decreased by 1, so it has a reference count of 1; the
reference in the myself attribute still exists. Yet the instance is no
longer accessible through Python code, and it could be deleted. Several objects
can participate in a cycle if they have references to each other, causing all of
the objects to be leaked.
Python 2.0 fixes this problem by periodically executing a cycle detection
algorithm which looks for inaccessible cycles and deletes the objects involved.
A new gc module provides functions to perform a garbage collection,
obtain debugging statistics, and tuning the collector’s parameters.
Running the cycle detection algorithm takes some time, and therefore will result
in some additional overhead. It is hoped that after we’ve gotten experience
with the cycle collection from using 2.0, Python 2.1 will be able to minimize
the overhead with careful tuning. It’s not yet obvious how much performance is
lost, because benchmarking this is tricky and depends crucially on how often the
program creates and destroys objects. The detection of cycles can be disabled
when Python is compiled, if you can’t afford even a tiny speed penalty or
suspect that the cycle collection is buggy, by specifying the
--without-cycle-gc switch when running the configure
script.
Several people tackled this problem and contributed to a solution. An early
implementation of the cycle detection approach was written by Toby Kelsey. The
current algorithm was suggested by Eric Tiedemann during a visit to CNRI, and
Guido van Rossum and Neil Schemenauer wrote two different implementations, which
were later integrated by Neil. Lots of other people offered suggestions along
the way; the March 2000 archives of the python-dev mailing list contain most of
the relevant discussion, especially in the threads titled “Reference cycle
collection for Python” and “Finalization again”.
Various minor changes have been made to Python’s syntax and built-in functions.
None of the changes are very far-reaching, but they’re handy conveniences.
A new syntax makes it more convenient to call a given function with a tuple of
arguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier,
you’d use the apply() built-in function: apply(f,args,kw) calls the
function f() with the argument tuple args and the keyword arguments in
the dictionary kw. apply() is the same in 2.0, but thanks to a patch
from Greg Ewing, f(*args,**kw) as a shorter and clearer way to achieve the
same effect. This syntax is symmetrical with the syntax for defining
functions:
deff(*args,**kw):# args is a tuple of positional args,# kw is a dictionary of keyword args...
The print statement can now have its output directed to a file-like
object by following the print with >>file, similar to the
redirection operator in Unix shells. Previously you’d either have to use the
write() method of the file-like object, which lacks the convenience and
simplicity of print, or you could assign a new value to
sys.stdout and then restore the old value. For sending output to standard
error, it’s much easier to write this:
print>>sys.stderr,"Warning: action field not supplied"
Modules can now be renamed on importing them, using the syntax importmoduleasname or frommoduleimportnameasothername. The patch was submitted
by Thomas Wouters.
A new format style is available when using the % operator; ‘%r’ will insert
the repr() of its argument. This was also added from symmetry
considerations, this time for symmetry with the existing ‘%s’ format style,
which inserts the str() of its argument. For example, '%r%s'%('abc','abc') returns a string containing 'abc'abc.
Previously there was no way to implement a class that overrode Python’s built-in
in operator and implemented a custom version. objinseq returns
true if obj is present in the sequence seq; Python computes this by simply
trying every index of the sequence until either obj is found or an
IndexError is encountered. Moshe Zadka contributed a patch which adds a
__contains__() magic method for providing a custom implementation for
in. Additionally, new built-in objects written in C can define what
in means for them via a new slot in the sequence protocol.
Earlier versions of Python used a recursive algorithm for deleting objects.
Deeply nested data structures could cause the interpreter to fill up the C stack
and crash; Christian Tismer rewrote the deletion logic to fix this problem. On
a related note, comparing recursive objects recursed infinitely and crashed;
Jeremy Hylton rewrote the code to no longer crash, producing a useful result
instead. For example, after this code:
a=[]b=[]a.append(a)b.append(b)
The comparison a==b returns true, because the two recursive data structures
are isomorphic. See the thread “trashcan and PR#7” in the April 2000 archives of
the python-dev mailing list for the discussion leading up to this
implementation, and some useful relevant links. Note that comparisons can now
also raise exceptions. In earlier versions of Python, a comparison operation
such as cmp(a,b) would always produce an answer, even if a user-defined
__cmp__() method encountered an error, since the resulting exception would
simply be silently swallowed.
Work has been done on porting Python to 64-bit Windows on the Itanium processor,
mostly by Trent Mick of ActiveState. (Confusingly, sys.platform is still
'win32' on Win64 because it seems that for ease of porting, MS Visual C++
treats code as 32 bit on Itanium.) PythonWin also supports Windows CE; see the
Python CE page at http://pythonce.sourceforge.net/ for more information.
Another new platform is Darwin/MacOS X; initial support for it is in Python 2.0.
Dynamic loading works, if you specify “configure –with-dyld –with-suffix=.x”.
Consult the README in the Python source distribution for more instructions.
An attempt has been made to alleviate one of Python’s warts, the often-confusing
NameError exception when code refers to a local variable before the
variable has been assigned a value. For example, the following code raises an
exception on the print statement in both 1.5.2 and 2.0; in 1.5.2 a
NameError exception is raised, while 2.0 raises a new
UnboundLocalError exception. UnboundLocalError is a subclass of
NameError, so any existing code that expects NameError to be
raised should still work.
deff():print"i=",ii=i+1f()
Two new exceptions, TabError and IndentationError, have been
introduced. They’re both subclasses of SyntaxError, and are raised when
Python code is found to be improperly indented.
A new built-in, zip(seq1,seq2,...)(), has been added. zip()
returns a list of tuples where each tuple contains the i-th element from each of
the argument sequences. The difference between zip() and map(None,seq1,seq2) is that map() pads the sequences with None if the
sequences aren’t all of the same length, while zip() truncates the
returned list to the length of the shortest argument sequence.
The int() and long() functions now accept an optional “base”
parameter when the first argument is a string. int('123',10) returns 123,
while int('123',16) returns 291. int(123,16) raises a
TypeError exception with the message “can’t convert non-string with
explicit base”.
A new variable holding more detailed version information has been added to the
sys module. sys.version_info is a tuple (major,minor,micro,level,serial) For example, in a hypothetical 2.0.1beta1, sys.version_info
would be (2,0,1,'beta',1). level is a string such as "alpha",
"beta", or "final" for a final release.
Dictionaries have an odd new method, setdefault(key,default)(), which
behaves similarly to the existing get() method. However, if the key is
missing, setdefault() both returns the value of default as get()
would do, and also inserts it into the dictionary as the value for key. Thus,
the following lines of code:
can be reduced to a single returndict.setdefault(key,[]) statement.
The interpreter sets a maximum recursion depth in order to catch runaway
recursion before filling the C stack and causing a core dump or GPF..
Previously this limit was fixed when you compiled Python, but in 2.0 the maximum
recursion depth can be read and modified using sys.getrecursionlimit() and
sys.setrecursionlimit(). The default value is 1000, and a rough maximum
value for a given platform can be found by running a new script,
Misc/find_recursionlimit.py.
New Python releases try hard to be compatible with previous releases, and the
record has been pretty good. However, some changes are considered useful
enough, usually because they fix initial design decisions that turned out to be
actively mistaken, that breaking backward compatibility can’t always be avoided.
This section lists the changes in Python 2.0 that may cause old Python code to
break.
The change which will probably break the most code is tightening up the
arguments accepted by some methods. Some methods would take multiple arguments
and treat them as a tuple, particularly various list methods such as
append() and insert(). In earlier versions of Python, if L is
a list, L.append(1,2) appends the tuple (1,2) to the list. In Python
2.0 this causes a TypeError exception to be raised, with the message:
‘append requires exactly 1 argument; 2 given’. The fix is to simply add an
extra set of parentheses to pass both values as a tuple: L.append((1,2)).
The earlier versions of these methods were more forgiving because they used an
old function in Python’s C interface to parse their arguments; 2.0 modernizes
them to use PyArg_ParseTuple(), the current argument parsing function,
which provides more helpful error messages and treats multi-argument calls as
errors. If you absolutely must use 2.0 but can’t fix your code, you can edit
Objects/listobject.c and define the preprocessor symbol
NO_STRICT_LIST_APPEND to preserve the old behaviour; this isn’t recommended.
Some of the functions in the socket module are still forgiving in this
way. For example, socket.connect(('hostname',25))() is the correct
form, passing a tuple representing an IP address, but socket.connect('hostname',25)() also works. socket.connect_ex() and socket.bind()
are similarly easy-going. 2.0alpha1 tightened these functions up, but because
the documentation actually used the erroneous multiple argument form, many
people wrote code which would break with the stricter checking. GvR backed out
the changes in the face of public reaction, so for the socket module, the
documentation was fixed and the multiple argument form is simply marked as
deprecated; it will be tightened up again in a future Python version.
The \x escape in string literals now takes exactly 2 hex digits. Previously
it would consume all the hex digits following the ‘x’ and take the lowest 8 bits
of the result, so \x123456 was equivalent to \x56.
The AttributeError and NameError exceptions have a more friendly
error message, whose text will be something like 'Spam'instancehasnoattribute'eggs' or name'eggs'isnotdefined. Previously the error
message was just the missing attribute name eggs, and code written to take
advantage of this fact will break in 2.0.
Some work has been done to make integers and long integers a bit more
interchangeable. In 1.5.2, large-file support was added for Solaris, to allow
reading files larger than 2 GiB; this made the tell() method of file
objects return a long integer instead of a regular integer. Some code would
subtract two file offsets and attempt to use the result to multiply a sequence
or slice a string, but this raised a TypeError. In 2.0, long integers
can be used to multiply or slice a sequence, and it’ll behave as you’d
intuitively expect it to; 3L*'abc' produces ‘abcabcabc’, and
(0,1,2,3)[2L:4L] produces (2,3). Long integers can also be used in various
contexts where previously only integers were accepted, such as in the
seek() method of file objects, and in the formats supported by the %
operator (%d, %i, %x, etc.). For example, "%d"%2L**64 will
produce the string 18446744073709551616.
The subtlest long integer change of all is that the str() of a long
integer no longer has a trailing ‘L’ character, though repr() still
includes it. The ‘L’ annoyed many people who wanted to print long integers that
looked just like regular integers, since they had to go out of their way to chop
off the character. This is no longer a problem in 2.0, but code which does
str(longval)[:-1] and assumes the ‘L’ is there, will now lose the final
digit.
Taking the repr() of a float now uses a different formatting precision
than str(). repr() uses %.17g format string for C’s
sprintf(), while str() uses %.12g as before. The effect is that
repr() may occasionally show more decimal places than str(), for
certain numbers. For example, the number 8.1 can’t be represented exactly in
binary, so repr(8.1) is '8.0999999999999996', while str(8.1) is
'8.1'.
The -X command-line option, which turned all standard exceptions into
strings instead of classes, has been removed; the standard exceptions will now
always be classes. The exceptions module containing the standard
exceptions was translated from Python to a built-in C module, written by Barry
Warsaw and Fredrik Lundh.
Some of the changes are under the covers, and will only be apparent to people
writing C extension modules or embedding a Python interpreter in a larger
application. If you aren’t dealing with Python’s C API, you can safely skip
this section.
The version number of the Python C API was incremented, so C extensions compiled
for 1.5.2 must be recompiled in order to work with 2.0. On Windows, it’s not
possible for Python 2.0 to import a third party extension built for Python 1.5.x
due to how Windows DLLs work, so Python will raise an exception and the import
will fail.
Users of Jim Fulton’s ExtensionClass module will be pleased to find out that
hooks have been added so that ExtensionClasses are now supported by
isinstance() and issubclass(). This means you no longer have to
remember to write code such as iftype(obj)==myExtensionClass, but can use
the more natural ifisinstance(obj,myExtensionClass).
The Python/importdl.c file, which was a mass of #ifdefs to support
dynamic loading on many different platforms, was cleaned up and reorganised by
Greg Stein. importdl.c is now quite small, and platform-specific code
has been moved into a bunch of Python/dynload_*.c files. Another
cleanup: there were also a number of my*.h files in the Include/
directory that held various portability hacks; they’ve been merged into a single
file, Include/pyport.h.
Vladimir Marangozov’s long-awaited malloc restructuring was completed, to make
it easy to have the Python interpreter use a custom allocator instead of C’s
standard malloc(). For documentation, read the comments in
Include/pymem.h and Include/objimpl.h. For the lengthy
discussions during which the interface was hammered out, see the Web archives of
the ‘patches’ and ‘python-dev’ lists at python.org.
Recent versions of the GUSI development environment for MacOS support POSIX
threads. Therefore, Python’s POSIX threading support now works on the
Macintosh. Threading support using the user-space GNU pth library was also
contributed.
Threading support on Windows was enhanced, too. Windows supports thread locks
that use kernel objects only in case of contention; in the common case when
there’s no contention, they use simpler functions which are an order of
magnitude faster. A threaded version of Python 1.5.2 on NT is twice as slow as
an unthreaded version; with the 2.0 changes, the difference is only 10%. These
improvements were contributed by Yakov Markovitch.
Python 2.0’s source now uses only ANSI C prototypes, so compiling Python now
requires an ANSI C compiler, and can no longer be done using a compiler that
only supports K&R C.
Previously the Python virtual machine used 16-bit numbers in its bytecode,
limiting the size of source files. In particular, this affected the maximum
size of literal lists and dictionaries in Python source; occasionally people who
are generating Python code would run into this limit. A patch by Charles G.
Waldman raises the limit from 2^16 to 2^{32}.
Three new convenience functions intended for adding constants to a module’s
dictionary at module initialization time were added: PyModule_AddObject(),
PyModule_AddIntConstant(), and PyModule_AddStringConstant(). Each
of these functions takes a module object, a null-terminated C string containing
the name to be added, and a third argument for the value to be assigned to the
name. This third argument is, respectively, a Python object, a C long, or a C
string.
A wrapper API was added for Unix-style signal handlers. PyOS_getsig() gets
a signal handler and PyOS_setsig() will set a new handler.
Before Python 2.0, installing modules was a tedious affair – there was no way
to figure out automatically where Python is installed, or what compiler options
to use for extension modules. Software authors had to go through an arduous
ritual of editing Makefiles and configuration files, which only really work on
Unix and leave Windows and MacOS unsupported. Python users faced wildly
differing installation instructions which varied between different extension
packages, which made administering a Python installation something of a chore.
The SIG for distribution utilities, shepherded by Greg Ward, has created the
Distutils, a system to make package installation much easier. They form the
distutils package, a new part of Python’s standard library. In the best
case, installing a Python module from source will require the same steps: first
you simply mean unpack the tarball or zip archive, and the run “pythonsetup.pyinstall”. The platform will be automatically detected, the compiler
will be recognized, C extension modules will be compiled, and the distribution
installed into the proper directory. Optional command-line arguments provide
more control over the installation process, the distutils package offers many
places to override defaults – separating the build from the install, building
or installing in non-default directories, and more.
In order to use the Distutils, you need to write a setup.py script. For
the simple case, when the software contains only .py files, a minimal
setup.py can be just a few lines long:
The Distutils can also take care of creating source and binary distributions.
The “sdist” command, run by “pythonsetup.pysdist‘, builds a source
distribution such as foo-1.0.tar.gz. Adding new commands isn’t
difficult, “bdist_rpm” and “bdist_wininst” commands have already been
contributed to create an RPM distribution and a Windows installer for the
software, respectively. Commands to create other distribution formats such as
Debian packages and Solaris .pkg files are in various stages of
development.
All this is documented in a new manual, Distributing Python Modules, that
joins the basic set of Python documentation.
Python 1.5.2 included a simple XML parser in the form of the xmllib
module, contributed by Sjoerd Mullender. Since 1.5.2’s release, two different
interfaces for processing XML have become common: SAX2 (version 2 of the Simple
API for XML) provides an event-driven interface with some similarities to
xmllib, and the DOM (Document Object Model) provides a tree-based
interface, transforming an XML document into a tree of nodes that can be
traversed and modified. Python 2.0 includes a SAX2 interface and a stripped-
down DOM interface as part of the xml package. Here we will give a brief
overview of these new interfaces; consult the Python documentation or the source
code for complete details. The Python XML SIG is also working on improved
documentation.
SAX defines an event-driven interface for parsing XML. To use SAX, you must
write a SAX handler class. Handler classes inherit from various classes
provided by SAX, and override various methods that will then be called by the
XML parser. For example, the startElement() and endElement()
methods are called for every starting and end tag encountered by the parser, the
characters() method is called for every chunk of character data, and so
forth.
The advantage of the event-driven approach is that the whole document doesn’t
have to be resident in memory at any one time, which matters if you are
processing really huge documents. However, writing the SAX handler class can
get very complicated if you’re trying to modify the document structure in some
elaborate way.
For example, this little example program defines a handler that prints a message
for every starting and ending tag, and then parses the file hamlet.xml
using it:
fromxmlimportsaxclassSimpleHandler(sax.ContentHandler):defstartElement(self,name,attrs):print'Start of element:',name,attrs.keys()defendElement(self,name):print'End of element:',name# Create a parser objectparser=sax.make_parser()# Tell it what handler to usehandler=SimpleHandler()parser.setContentHandler(handler)# Parse a file!parser.parse('hamlet.xml')
The Document Object Model is a tree-based representation for an XML document. A
top-level Document instance is the root of the tree, and has a single
child which is the top-level Element instance. This Element
has children nodes representing character data and any sub-elements, which may
have further children of their own, and so forth. Using the DOM you can
traverse the resulting tree any way you like, access element and attribute
values, insert and delete nodes, and convert the tree back into XML.
The DOM is useful for modifying XML documents, because you can create a DOM
tree, modify it by adding new nodes or rearranging subtrees, and then produce a
new XML document as output. You can also construct a DOM tree manually and
convert it to XML, which can be a more flexible way of producing XML output than
simply writing <tag1>...</tag1> to a file.
The DOM implementation included with Python lives in the xml.dom.minidom
module. It’s a lightweight implementation of the Level 1 DOM with support for
XML namespaces. The parse() and parseString() convenience
functions are provided for generating a DOM tree:
doc is a Document instance. Document, like all the other
DOM classes such as Element and Text, is a subclass of the
Node base class. All the nodes in a DOM tree therefore support certain
common methods, such as toxml() which returns a string containing the XML
representation of the node and its children. Each class also has special
methods of its own; for example, Element and Document
instances have a method to find all child elements with a given tag name.
Continuing from the previous 2-line example:
The root element of the document is available as doc.documentElement, and
its children can be easily modified by deleting, adding, or removing nodes:
root=doc.documentElement# Remove the first childroot.removeChild(root.childNodes[0])# Move the new first child to the endroot.appendChild(root.childNodes[0])# Insert the new first child (originally,# the third child) before the 20th child.root.insertBefore(root.childNodes[0],root.childNodes[20])
Again, I will refer you to the Python documentation for a complete listing of
the different Node classes and their various methods.
The XML Special Interest Group has been working on XML-related Python code for a
while. Its code distribution, called PyXML, is available from the SIG’s Web
pages at http://www.python.org/sigs/xml-sig/. The PyXML distribution also used
the package name xml. If you’ve written programs that used PyXML, you’re
probably wondering about its compatibility with the 2.0 xml package.
The answer is that Python 2.0’s xml package isn’t compatible with PyXML,
but can be made compatible by installing a recent version PyXML. Many
applications can get by with the XML support that is included with Python 2.0,
but more complicated applications will require that the full PyXML package will
be installed. When installed, PyXML versions 0.6.0 or greater will replace the
xml package shipped with Python, and will be a strict superset of the
standard package, adding a bunch of additional features. Some of the additional
features in PyXML include:
4DOM, a full DOM implementation from FourThought, Inc.
The xmlproc validating parser, written by Lars Marius Garshol.
The sgmlop parser accelerator module, written by Fredrik Lundh.
Lots of improvements and bugfixes were made to Python’s extensive standard
library; some of the affected modules include readline,
ConfigParser, cgi, calendar, posix, readline,
xmllib, aifc, chunk,wave, random, shelve,
and nntplib. Consult the CVS logs for the exact patch-by-patch details.
Brian Gallew contributed OpenSSL support for the socket module. OpenSSL
is an implementation of the Secure Socket Layer, which encrypts the data being
sent over a socket. When compiling Python, you can edit Modules/Setup
to include SSL support, which adds an additional function to the socket
module: socket.ssl(socket,keyfile,certfile)(), which takes a socket
object and returns an SSL socket. The httplib and urllib modules
were also changed to support https:// URLs, though no one has implemented
FTP or SMTP over SSL.
The httplib module has been rewritten by Greg Stein to support HTTP/1.1.
Backward compatibility with the 1.5 version of httplib is provided,
though using HTTP/1.1 features such as pipelining will require rewriting code to
use a different set of interfaces.
The Tkinter module now supports Tcl/Tk version 8.1, 8.2, or 8.3, and
support for the older 7.x versions has been dropped. The Tkinter module now
supports displaying Unicode strings in Tk widgets. Also, Fredrik Lundh
contributed an optimization which makes operations like create_line and
create_polygon much faster, especially when using lots of coordinates.
The curses module has been greatly extended, starting from Oliver
Andrich’s enhanced version, to provide many additional functions from ncurses
and SYSV curses, such as colour, alternative character set support, pads, and
mouse support. This means the module is no longer compatible with operating
systems that only have BSD curses, but there don’t seem to be any currently
maintained OSes that fall into this category.
As mentioned in the earlier discussion of 2.0’s Unicode support, the underlying
implementation of the regular expressions provided by the re module has
been changed. SRE, a new regular expression engine written by Fredrik Lundh and
partially funded by Hewlett Packard, supports matching against both 8-bit
strings and Unicode strings.
A number of new modules were added. We’ll simply list them with brief
descriptions; consult the 2.0 documentation for the details of a particular
module.
atexit: For registering functions to be called before the Python
interpreter exits. Code that currently sets sys.exitfunc directly should be
changed to use the atexit module instead, importing atexit and
calling atexit.register() with the function to be called on exit.
(Contributed by Skip Montanaro.)
codecs, encodings, unicodedata: Added as part of the new
Unicode support.
filecmp: Supersedes the old cmp, cmpcache and
dircmp modules, which have now become deprecated. (Contributed by Gordon
MacMillan and Moshe Zadka.)
gettext: This module provides internationalization (I18N) and
localization (L10N) support for Python programs by providing an interface to the
GNU gettext message catalog library. (Integrated by Barry Warsaw, from separate
contributions by Martin von Löwis, Peter Funk, and James Henstridge.)
linuxaudiodev: Support for the /dev/audio device on Linux, a
twin to the existing sunaudiodev module. (Contributed by Peter Bosch,
with fixes by Jeremy Hylton.)
mmap: An interface to memory-mapped files on both Windows and Unix. A
file’s contents can be mapped directly into memory, at which point it behaves
like a mutable string, so its contents can be read and modified. They can even
be passed to functions that expect ordinary strings, such as the re
module. (Contributed by Sam Rushing, with some extensions by A.M. Kuchling.)
pyexpat: An interface to the Expat XML parser. (Contributed by Paul
Prescod.)
robotparser: Parse a robots.txt file, which is used for writing
Web spiders that politely avoid certain areas of a Web site. The parser accepts
the contents of a robots.txt file, builds a set of rules from it, and
can then answer questions about the fetchability of a given URL. (Contributed
by Skip Montanaro.)
tabnanny: A module/script to check Python source code for ambiguous
indentation. (Contributed by Tim Peters.)
UserString: A base class useful for deriving objects that behave like
strings.
webbrowser: A module that provides a platform independent way to launch
a web browser on a specific URL. For each platform, various browsers are tried
in a specific order. The user can alter which browser is launched by setting the
BROWSER environment variable. (Originally inspired by Eric S. Raymond’s patch
to urllib which added similar functionality, but the final module comes
from code originally implemented by Fred Drake as
Tools/idle/BrowserControl.py, and adapted for the standard library by
Fred.)
_winreg: An interface to the Windows registry. _winreg is an
adaptation of functions that have been part of PythonWin since 1995, but has now
been added to the core distribution, and enhanced to support Unicode.
_winreg was written by Bill Tutt and Mark Hammond.
zipfile: A module for reading and writing ZIP-format archives. These
are archives produced by PKZIP on DOS/Windows or zip on
Unix, not to be confused with gzip-format files (which are
supported by the gzip module) (Contributed by James C. Ahlstrom.)
imputil: A module that provides a simpler way for writing customised
import hooks, in comparison to the existing ihooks module. (Implemented
by Greg Stein, with much discussion on python-dev along the way.)
IDLE is the official Python cross-platform IDE, written using Tkinter. Python
2.0 includes IDLE 0.6, which adds a number of new features and improvements. A
partial list:
UI improvements and optimizations, especially in the area of syntax
highlighting and auto-indentation.
The class browser now shows more information, such as the top level functions
in a module.
Tab width is now a user settable option. When opening an existing Python file,
IDLE automatically detects the indentation conventions, and adapts.
There is now support for calling browsers on various platforms, used to open
the Python documentation in a browser.
IDLE now has a command line, which is largely similar to the vanilla Python
interpreter.
Call tips were added in many places.
IDLE can now be installed as a package.
In the editor window, there is now a line/column bar at the bottom.
Three new keystroke commands: Check module (Alt-F5), Import module (F5) and
Run script (Ctrl-F5).
A few modules have been dropped because they’re obsolete, or because there are
now better ways to do the same thing. The stdwin module is gone; it was
for a platform-independent windowing toolkit that’s no longer developed.
A number of modules have been moved to the lib-old subdirectory:
cmp, cmpcache, dircmp, dump, find,
grep, packmail, poly, util, whatsound,
zmod. If you have code which relies on a module that’s been moved to
lib-old, you can simply add that directory to sys.path to get them
back, but you’re encouraged to update any code that uses these modules.
The authors would like to thank the following people for offering suggestions on
various drafts of this article: David Bolen, Mark Hammond, Gregg Hauser, Jeremy
Hylton, Fredrik Lundh, Detlef Lannert, Aahz Maruch, Skip Montanaro, Vladimir
Marangozov, Tobias Polzin, Guido van Rossum, Neil Schemenauer, and Russ Schmidt.
$ python3.2
Python 3.2 (py3k, Sep 12 2011, 12:21:02)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
输入多行结构时就需要从属提示符了, 例如, 下面这个 if 语句:
>>> the_world_is_flat=1>>> ifthe_world_is_flat:... print("Be careful not to fall off!")...Be careful not to fall off!
>>> 'spam eggs''spam eggs'>>> 'doesn\'t'"doesn't">>> "doesn't""doesn't">>> '"Yes," he said.''"Yes," he said.'>>> "\"Yes,\" he said."'"Yes," he said.'>>> '"Isn\'t," she said.''"Isn\'t," she said.'
>>> word[0]='x'Traceback (most recent call last):
File "<stdin>", line 1, in ?TypeError: 'str' object does not support item assignment>>> word[:1]='Splat'Traceback (most recent call last):
File "<stdin>", line 1, in ?TypeError: 'str' object does not support slice assignment
>>> x=int(input("Please enter an integer: "))Please enter an integer: 42>>> ifx<0:... x=0... print('Negative changed to zero')... elifx==0:... print('Zero')... elifx==1:... print('Single')... else:... print('More')...More
break 语句, 像 C 里的一样, 跳出最小的 for 或 while
循环.
循环语句可以有一个 else 子句; 当循环因耗尽整个列表而终止时 (使用 for)
或者当条件变为假时 (使用 while), 它就会被执行, 但是, 如果循环因为
break 语句终止的话, 它不会被执行. 下面的搜索质数的例子将证明这点:
>>> forninrange(2,10):... forxinrange(2,n):... ifn%x==0:... print(n,'equals',x,'*',n//x)... break... else:... # 循环因为没有找到一个因数而停止... print(n,'is a prime number')...2 is a prime number3 is a prime number4 equals 2 * 25 is a prime number6 equals 2 * 37 is a prime number8 equals 2 * 49 equals 3 * 3
>>> fornuminrange(2,10):... ifnum%2==0:... print("Found an even number",num)... continue... print("Found a number",num)Found an even number 2Found a number 3Found an even number 4Found a number 5Found an even number 6Found a number 7Found an even number 8Found a number 9
defask_ok(prompt,retries=4,complaint='Yes or no, please!'):whileTrue:ok=input(prompt)ifokin('y','ye','yes'):returnTrueifokin('n','no','nop','nope'):returnFalseretries=retries-1ifretries<0:raiseIOError('refusenik user')print(complaint)
defparrot(voltage,state='a stiff',action='voom',type='Norwegian Blue'):print("-- This parrot wouldn't",action,end=' ')print("if you put",voltage,"volts through it.")print("-- Lovely plumage, the",type)print("-- It's",state,"!")
通过以下任一方法调用:
parrot(1000)parrot(action='VOOOOOM',voltage=1000000)parrot('a thousand',state='pushing up the daisies')parrot('a million','bereft of life','jump')
defcheeseshop(kind,*arguments,**keywords):print("-- Do you have any",kind,"?")print("-- I'm sorry, we're all out of",kind)forarginarguments:print(arg)print("-"*40)keys=sorted(keywords.keys())forkwinkeys:print(kw,":",keywords[kw])
它可以如下地调用:
cheeseshop("Limburger","It's very runny, sir.","It's really very, VERY runny, sir.",shopkeeper="Michael Palin",client="John Cleese",sketch="Cheese Shop Sketch")
当然它将打印:
-- Do you have any Limburger ?
-- I'm sorry, we're all out of Limburger
It's very runny, sir.
It's really very, VERY runny, sir.
----------------------------------------
client : John Cleese
shopkeeper : Michael Palin
sketch : Cheese Shop Sketch
>>> defparrot(voltage,state='a stiff',action='voom'):... print("-- This parrot wouldn't",action,end=' ')... print("if you put",voltage,"volts through it.",end=' ')... print("E's",state,"!")...>>> d={"voltage":"four million","state":"bleedin' demised","action":"VOOM"}>>> parrot(**d)-- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised !
>>> defmy_function():... """Do nothing, but document it....... No, really, it doesn't do anything.... """... pass...>>> print(my_function.__doc__)Do nothing, but document it. No, really, it doesn't do anything.
Function annotations are completely optional, arbitrary metadata information about user-defined functions. Neither Python itself nor the standard library use function annotations in any way; this section just shows the syntax. Third-party projects are free to use function annotations for documentation, type checking, and other uses.
Annotations are stored in the __annotations__ attribute of the function as a dictionary and have no effect on any other part of the function. Parameter annotations are defined by a colon after the parameter name, followed by an expression evaluating to the value of the annotation. Return annotations are defined by a literal ->, followed by an expression, between the parameter list and the colon denoting the end of the def statement. The following example has a positional argument, a keyword argument, and the return value annotated with nonsense:
>>>
>>> deff(ham:42,eggs:int='spam')->"Nothing to see here":... print("Annotations:",f.__annotations__)... print("Arguments:",ham,eggs)...>>> f('wonderful')Annotations: {'eggs': <class 'int'>, 'return': 'Nothing to see here', 'ham': 42}Arguments: wonderful spam
>>> [x,x**2forxinvec]# error - parens required for tuples File "<stdin>", line 1, in ? [x, x**2 for x in vec] ^SyntaxError: invalid syntax>>> [(x,x**2)forxinvec][(2, 4), (4, 16), (6, 36)]
>>> questions=['name','quest','favorite color']>>> answers=['lancelot','the holy grail','blue']>>> forq,ainzip(questions,answers):... print('What is your {0}? It is {1}.'.format(q,a))...What is your name? It is lancelot.What is your quest? It is the holy grail.What is your favorite color? It is blue.
>>> s='Hello, world.'>>> str(s)'Hello, world.'>>> repr(s)"'Hello, world.'">>> str(1.0/7.0)'0.142857142857'>>> repr(1.0/7.0)'0.14285714285714285'>>> x=10*3.25>>> y=200*200>>> s='The value of x is '+repr(x)+', and y is '+repr(y)+'...'>>> print(s)The value of x is 32.5, and y is 40000...>>> # The repr() of a string adds string quotes and backslashes:... hello='hello, world\n'>>> hellos=repr(hello)>>> print(hellos)'hello, world\n'>>> # The argument to repr() may be any Python object:... repr((x,y,('spam','eggs')))"(32.5, 40000, ('spam', 'eggs'))"
>>> importmath>>> print('The value of PI is approximately {}.'.format(math.pi))The value of PI is approximately 3.14159265359.>>> print('The value of PI is approximately {!r}.'.format(math.pi))The value of PI is approximately 3.141592653589793.
可选项 ':' 和格式标识符可以跟着 field name. 这就允许对值进行更好的格式化.
下面的例子将 Pi 保留到小数点后三位.
>>> importmath>>> print('The value of PI is approximately {0:.3f}.'.format(math.pi))The value of PI is approximately 3.142.
>>> f=open('/tmp/workfile','rb+')>>> f.write(b'0123456789abcdef')16>>> f.seek(5)# Go to the 6th byte in the file5>>> f.read(1)b'5'>>> f.seek(-3,2)# Go to the 3rd byte before the end13>>> f.read(1)b'd'
在文本文件中 (那些打开文件的模式下没有 b 的), 只会相对于文件起始位置进行定位,
(如果要定文件的最后面, 要用 seek(0,2) ).
>>> 10*(1/0)Traceback (most recent call last):
File "<stdin>", line 1, in ?ZeroDivisionError: int division or modulo by zero>>> 4+spam*3Traceback (most recent call last):
File "<stdin>", line 1, in ?NameError: name 'spam' is not defined>>> '2'+2Traceback (most recent call last):
File "<stdin>", line 1, in ?TypeError: Can't convert 'int' object to str implicitly
>>> whileTrue:... try:... x=int(input("Please enter a number: "))... break... exceptValueError:... print("Oops! That was no valid number. Try again...")...
importsystry:f=open('myfile.txt')s=f.readline()i=int(s.strip())exceptIOErroraserr:print("I/O error: {0}".format(err))exceptValueError:print("Could not convert data to an integer.")except:print("Unexpected error:",sys.exc_info()[0])raise
>>> defthis_fails():... x=1/0...>>> try:... this_fails()... exceptZeroDivisionErroraserr:... print('Handling run-time error:',err)...Handling run-time error: int division or modulo by zero
classError(Exception):"""Base class for exceptions in this module."""passclassInputError(Error):"""Exception raised for errors in the input. Attributes: expression -- input expression in which the error occurred message -- explanation of the error """def__init__(self,expression,message):self.expression=expressionself.message=messageclassTransitionError(Error):"""Raised when an operation attempts a state transition that's not allowed. Attributes: previous -- state at beginning of transition next -- attempted new state message -- explanation of why the specific transition is not allowed """def__init__(self,previous,next,message):self.previous=previousself.next=nextself.message=message
defscope_test():defdo_local():spam="local spam"defdo_nonlocal():nonlocalspamspam="nonlocal spam"defdo_global():globalspamspam="global spam"spam="test spam"do_local()print("After local assignment:",spam)do_nonlocal()print("After nonlocal assignment:",spam)do_global()print("After global assignment:",spam)scope_test()print("In global scope:",spam)
classEmployee:passjohn=Employee()# Create an empty employee record# Fill the fields of the recordjohn.name='John Doe'john.dept='computer lab'john.salary=1000
classReverse:"Iterator for looping over a sequence backwards"def__init__(self,data):self.data=dataself.index=len(data)def__iter__(self):returnselfdef__next__(self):ifself.index==0:raiseStopIterationself.index=self.index-1returnself.data[self.index]>>>rev=Reverse('spam')>>>iter(rev)<__main__.Reverseobjectat0x00A1DB50>>>>forcharinrev:...print(char)...maps
re 模块提供了一些正则表达式工具, 以便对字符串做进一步的处理. 对于复杂的区配
和操作, 正则表达式提供了简洁的、优化了的解决方法:
>>> importre>>> re.findall(r'\bf[a-z]*','which foot or hand fell fastest')['foot', 'fell', 'fastest']>>> re.sub(r'(\b[a-z]+) \1',r'\1','cat in the the hat')'cat in the hat'
而当只需要一些简单功能的时候, 我们更倾向于字符串方法, 因为它们更容易阅读和调试:
>>> 'tea for too'.replace('too','two')'tea for two'
>>> # dates are easily constructed and formatted>>> fromdatetimeimportdate>>> now=date.today()>>> nowdatetime.date(2003, 12, 2)>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'>>> # dates support calendar arithmetic>>> birthday=date(1964,7,31)>>> age=now-birthday>>> age.days14368
>>> importzlib>>> s=b'witch which has which witches wrist watch'>>> len(s)41>>> t=zlib.compress(s)>>> len(t)37>>> zlib.decompress(t)b'witch which has which witches wrist watch'>>> zlib.crc32(s)226805979
defaverage(values):"""Computes the arithmetic mean of a list of numbers. >>> print(average([20, 30, 70])) 40.0 """returnsum(values)/len(values)importdoctestdoctest.testmod()# 自动地通过嵌入的测试程序进行有效性检测
>>> importtextwrap>>> doc="""The wrap() method is just like fill() except that it returns... a list of strings instead of one big string with newlines to separate... the wrapped lines."""...>>> print(textwrap.fill(doc,width=40))The wrap() method is just like fill()except that it returns a list of stringsinstead of one big string with newlinesto separate the wrapped lines.
importthreading,zipfileclassAsyncZip(threading.Thread):def__init__(self,infile,outfile):threading.Thread.__init__(self)self.infile=infileself.outfile=outfiledefrun(self):f=zipfile.ZipFile(self.outfile,'w',zipfile.ZIP_DEFLATED)f.write(self.infile)f.close()print('Finished background zip of:',self.infile)background=AsyncZip('mydata.txt','myarchive.zip')background.start()print('The main program continues to run in foreground.')background.join()# 等待后台任务结束print('Main program waited until background was done.')
>>> fromcollectionsimportdeque>>> d=deque(["task1","task2","task3"])>>> d.append("task4")>>> print("Handling",d.popleft())Handling task1unsearched = deque([starting_node])def breadth_first_search(unsearched): node = unsearched.popleft() for m in gen_moves(node): if is_goal(m): return m unsearched.append(m)
# I prefer vi-style editing:setediting-modevi# Edit using a single line:sethorizontal-scroll-modeOn# Rebind some keys:Meta-h:backward-kill-word"\C-u":universal-argument"\C-x\C-r":re-read-init-file
# Add auto-completion and a stored history file of commands to your Python# interactive interpreter. Requires Python 2.0+, readline. Autocomplete is# bound to the Esc key by default (you can change it - see readline docs).## Store the file in ~/.pystartup, and set an environment variable to point# to it: "export PYTHONSTARTUP=/home/user/.pystartup" in bash.## Note that PYTHONSTARTUP does *not* expand "~", so you have to put in the# full path to your home directory.importatexitimportosimportreadlineimportrlcompleterhistoryPath=os.path.expanduser("~/.pyhistory")defsave_history(historyPath=historyPath):importreadlinereadline.write_history_file(historyPath)ifos.path.exists(historyPath):readline.read_history_file(historyPath)atexit.register(save_history)delos,atexit,readline,rlcompleter,save_history,historyPath
>>> format(math.pi,'.12g')# give 12 significant digits'3.14159265359'>>> format(math.pi,'.2f')# give 2 digits after the point'3.14'>>> repr(math.pi)'3.141592653589793'
The interpreter interface resembles that of the UNIX shell, but provides some
additional methods of invocation:
When called with standard input connected to a tty device, it prompts for
commands and executes them until an EOF (an end-of-file character, you can
produce that with Ctrl-D on UNIX or Ctrl-Z, Enter on Windows) is read.
When called with a file name argument or with a file as standard input, it
reads and executes a script from that file.
When called with a directory name argument, it reads and executes an
appropriately named script from that directory.
When called with -ccommand, it executes the Python statement(s) given as
command. Here command may contain multiple statements separated by
newlines. Leading whitespace is significant in Python statements!
When called with -mmodule-name, the given module is located on the
Python module path and executed as a script.
In non-interactive mode, the entire input is parsed before it is executed.
An interface option terminates the list of options consumed by the interpreter,
all consecutive arguments will end up in sys.argv – note that the first
element, subscript zero (sys.argv[0]), is a string reflecting the program’s
source.
Execute the Python code in command. command can be one or more
statements separated by newlines, with significant leading whitespace as in
normal module code.
If this option is given, the first element of sys.argv will be
"-c" and the current directory will be added to the start of
sys.path (allowing modules in that directory to be imported as top
level modules).
Search sys.path for the named module and execute its contents as
the __main__ module.
Since the argument is a module name, you must not give a file extension
(.py). The module-name should be a valid Python module name, but
the implementation may not always enforce this (e.g. it may allow you to
use a name that includes a hyphen).
Package names are also permitted. When a package name is supplied instead
of a normal module, the interpreter will execute <pkg>.__main__ as
the main module. This behaviour is deliberately similar to the handling
of directories and zipfiles that are passed to the interpreter as the
script argument.
Note
This option cannot be used with built-in modules and extension modules
written in C, since they do not have Python module files. However, it
can still be used for precompiled modules, even if the original source
file is not available.
If this option is given, the first element of sys.argv will be the
full path to the module file (while the module file is being located, the
first element will be set to "-m"). As with the -c option,
the current directory will be added to the start of sys.path.
Many standard library modules contain code that is invoked on their execution
as a script. An example is the timeit module:
Changed in version 3.1: Supply the package name to run a __main__ submodule.
-
Read commands from standard input (sys.stdin). If standard input is
a terminal, -i is implied.
If this option is given, the first element of sys.argv will be
"-" and the current directory will be added to the start of
sys.path.
<script>
Execute the Python code contained in script, which must be a filesystem
path (absolute or relative) referring to either a Python file, a directory
containing a __main__.py file, or a zipfile containing a
__main__.py file.
If this option is given, the first element of sys.argv will be the
script name as given on the command line.
If the script name refers directly to a Python file, the directory
containing that file is added to the start of sys.path, and the
file is executed as the __main__ module.
If the script name refers to a directory or zipfile, the script name is
added to the start of sys.path and the __main__.py file in
that location is executed as the __main__ module.
If no interface option is given, -i is implied, sys.argv[0] is
an empty string ("") and the current directory will be added to the
start of sys.path.
When a script is passed as first argument or the -c option is used,
enter interactive mode after executing the script or the command, even when
sys.stdin does not appear to be a terminal. The
PYTHONSTARTUP file is not read.
This can be useful to inspect global variables or a stack trace when a script
raises an exception. See also PYTHONINSPECT.
Force the binary layer of the stdin, stdout and stderr streams (which is
available as their buffer attribute) to be unbuffered. The text I/O
layer will still be line-buffered.
Print a message each time a module is initialized, showing the place
(filename or built-in module) from which it is loaded. When given twice
(-vv), print a message for each file that is checked for when
searching for a module. Also provides information on module cleanup at exit.
See also PYTHONVERBOSE.
Warning control. Python’s warning machinery by default prints warning
messages to sys.stderr. A typical warning message has the following
form:
file:line: category: message
By default, each warning is printed once for each source line where it
occurs. This option controls how often warnings are printed.
Multiple -W options may be given; when a warning matches more than
one option, the action for the last matching option is performed. Invalid
-W options are ignored (though, a warning message is printed about
invalid options when the first warning is issued).
Warnings can also be controlled from within a Python program using the
warnings module.
The simplest form of argument is one of the following action strings (or a
unique abbreviation):
ignore
Ignore all warnings.
default
Explicitly request the default behavior (printing each warning once per
source line).
all
Print a warning each time it occurs (this may generate many messages if a
warning is triggered repeatedly for the same source line, such as inside a
loop).
module
Print each warning only the first time it occurs in each module.
once
Print each warning only the first time it occurs in the program.
error
Raise an exception instead of printing a warning message.
The full form of argument is:
action:message:category:module:line
Here, action is as explained above but only applies to messages that match
the remaining fields. Empty fields match all values; trailing empty fields
may be omitted. The message field matches the start of the warning message
printed; this match is case-insensitive. The category field matches the
warning category. This must be a class name; the match tests whether the
actual warning category of the message is a subclass of the specified warning
category. The full class name must be given. The module field matches the
(fully-qualified) module name; this match is case-sensitive. The line
field matches the line number, where zero matches all line numbers and is
thus equivalent to an omitted line number.
Reserved for various implementation-specific options. CPython currently
defines none of them, but allows to pass arbitrary values and retrieve
them through the sys._xoptions dictionary.
Changed in version 3.2:
Changed in version 3.2: It is now allowed to pass -X with CPython.
Change the location of the standard Python libraries. By default, the
libraries are searched in prefix/lib/pythonversion and
exec_prefix/lib/pythonversion, where prefix and
exec_prefix are installation-dependent directories, both defaulting
to /usr/local.
When PYTHONHOME is set to a single directory, its value replaces
both prefix and exec_prefix. To specify different values
for these, set PYTHONHOME to prefix:exec_prefix.
Augment the default search path for module files. The format is the same as
the shell’s PATH: one or more directory pathnames separated by
os.pathsep (e.g. colons on Unix or semicolons on Windows).
Non-existent directories are silently ignored.
In addition to normal directories, individual PYTHONPATH entries
may refer to zipfiles containing pure Python modules (in either source or
compiled form). Extension modules cannot be imported from zipfiles.
The default search path is installation dependent, but generally begins with
prefix/lib/pythonversion (see PYTHONHOME above). It
is always appended to PYTHONPATH.
An additional directory will be inserted in the search path in front of
PYTHONPATH as described above under
Interface options. The search path can be manipulated from
within a Python program as the variable sys.path.
If this is the name of a readable file, the Python commands in that file are
executed before the first prompt is displayed in interactive mode. The file
is executed in the same namespace where interactive commands are executed so
that objects defined or imported in it can be used without qualification in
the interactive session. You can also change the prompts sys.ps1 and
sys.ps2 in this file.
Set this to a non-empty string to cause the time module to require
dates specified as strings to include 4-digit years, otherwise 2-digit years
are converted based on rules described in the time module
documentation.
If this is set to a non-empty string it is equivalent to specifying the
-O option. If set to an integer, it is equivalent to specifying
-O multiple times.
If this is set to a non-empty string it is equivalent to specifying the
-d option. If set to an integer, it is equivalent to specifying
-d multiple times.
If this is set to a non-empty string it is equivalent to specifying the
-v option. If set to an integer, it is equivalent to specifying
-v multiple times.
If this is set before running the interpreter, it overrides the encoding used
for stdin/stdout/stderr, in the syntax encodingname:errorhandler. The
:errorhandler part is optional and has the same meaning as in
str.encode().
For stderr, the :errorhandler part is ignored; the handler will always be
'backslashreplace'.
Viewing environment variables can also be done more straight-forward: The
command prompt will expand strings wrapped into percent signs automatically
查看环境变量可以更为直接地做到: 命令提示会自动的扩展以百分号包围的字符串:
or “Creating Python extensions in C/C++ with SWIG and compiling them with
MinGW gcc under Windows” or “Installing Python extension with distutils
and without Microsoft Visual C++” by Sébastien Sauvage, 2003
Python on a Macintosh running Mac OS X is in principle very similar to Python on
any other Unix platform, but there are a number of additional features such as
the IDE and the Package Manager that are worth pointing out.
Mac OS X 10.5 comes with Python 2.5.1 pre-installed by Apple. If you wish, you
are invited to install the most recent version of Python from the Python website
(http://www.python.org). A current “universal binary” build of Python, which
runs natively on the Mac’s new Intel and legacy PPC CPU’s, is available there.
What you get after installing is a number of things:
A MacPython2.5 folder in your Applications folder. In here
you find IDLE, the development environment that is a standard part of official
Python distributions; PythonLauncher, which handles double-clicking Python
scripts from the Finder; and the “Build Applet” tool, which allows you to
package Python scripts as standalone applications on your system.
A framework /Library/Frameworks/Python.framework, which includes the
Python executable and libraries. The installer adds this location to your shell
path. To uninstall MacPython, you can simply remove these three things. A
symlink to the Python executable is placed in /usr/local/bin/.
The Apple-provided build of Python is installed in
/System/Library/Frameworks/Python.framework and /usr/bin/python,
respectively. You should never modify or delete these, as they are
Apple-controlled and are used by Apple- or third-party software. Remember that
if you choose to install a newer Python version from python.org, you will have
two different but functional Python installations on your computer, so it will
be important that your paths and usages are consistent with what you want to do.
IDLE includes a help menu that allows you to access Python documentation. If you
are completely new to Python you should start reading the tutorial introduction
in that document.
If you are familiar with Python on other Unix platforms you should read the
section on running Python scripts from the Unix shell.
Your best way to get started with Python on Mac OS X is through the IDLE
integrated development environment, see section The IDE and use the Help menu
when the IDE is running.
If you want to run Python scripts from the Terminal window command line or from
the Finder you first need an editor to create your script. Mac OS X comes with a
number of standard Unix command line editors, vim and
emacs among them. If you want a more Mac-like editor,
BBEdit or TextWrangler from Bare Bones Software (see
http://www.barebones.com/products/bbedit/index.shtml) are good choices, as is
TextMate (see http://macromates.com/). Other editors include
Gvim (http://macvim.org) and Aquamacs
(http://aquamacs.org/).
To run your script from the Terminal window you must make sure that
/usr/local/bin is in your shell search path.
To run your script from the Finder you have two options:
Drag it to PythonLauncher
Select PythonLauncher as the default application to open your
script (or any .py script) through the finder Info window and double-click it.
PythonLauncher has various preferences to control how your script is
launched. Option-dragging allows you to change these for one invocation, or use
its Preferences menu to change things globally.
With older versions of Python, there is one Mac OS X quirk that you need to be
aware of: programs that talk to the Aqua window manager (in other words,
anything that has a GUI) need to be run in a special way. Use pythonw
instead of python to start such scripts.
With Python 2.5, you can use either python or pythonw.
Python on OS X honors all standard Unix environment variables such as
PYTHONPATH, but setting these variables for programs started from the
Finder is non-standard as the Finder does not read your .profile or
.cshrc at startup. You need to create a file ~/.MacOSX/environment.plist. See Apple’s Technical Document QA1067 for details.
There are several options for building GUI applications on the Mac with Python.
PyObjC is a Python binding to Apple’s Objective-C/Cocoa framework, which is
the foundation of most modern Mac development. Information on PyObjC is
available from http://pyobjc.sourceforge.net.
The standard Python GUI toolkit is tkinter, based on the cross-platform
Tk toolkit (http://www.tcl.tk). An Aqua-native version of Tk is bundled with OS
X by Apple, and the latest version can be downloaded and installed from
http://www.activestate.com; it can also be built from source.
wxPython is another popular cross-platform GUI toolkit that runs natively on
Mac OS X. Packages and documentation are available from http://www.wxpython.org.
The “Build Applet” tool that is placed in the MacPython 2.5 folder is fine for
packaging small Python scripts on your own machine to run as a standard Mac
application. This tool, however, is not robust enough to distribute Python
applications to other users.
The standard tool for deploying standalone Python applications on the Mac is
py2app. More information on installing and using py2app can be found
at http://undefined.org/python/#py2app.
Python can also be used to script other Mac applications via Apple’s Open
Scripting Architecture (OSA); see http://appscript.sourceforge.net. Appscript is
a high-level, user-friendly Apple event bridge that allows you to control
scriptable Mac OS X applications using ordinary Python scripts. Appscript makes
Python a serious alternative to Apple’s own AppleScript language for
automating your Mac. A related package, PyOSA, is an OSA language component
for the Python scripting language, allowing Python code to be executed by any
OSA-enabled application (Script Editor, Mail, iTunes, etc.). PyOSA makes Python
a full peer to AppleScript.
if 1900 < year < 2100 and 1 <= month <= 12 \
and 1 <= day <= 31 and 0 <= hour < 24 \
and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
return 1
month_names = ['Januari', 'Februari', 'Maart', # These are the
'April', 'Mei', 'Juni', # Dutch names
'Juli', 'Augustus', 'September', # for the months
'Oktober', 'November', 'December'] # of the year
def perm(l):
# Compute the list of all permutations of l
if len(l) <= 1:
return [l]
r = []
for i in range(len(l)):
s = l[:i] + l[i+1:]
p = perm(s)
for x in p:
r.append(l[i:i+1] + x)
return r
下面的例子展示了各种缩进错误:
def perm(l): # error: first line indented (首行缩进)
for i in range(len(l)): # error: not indented (未缩进)
s = l[:i] + l[i+1:]
p = perm(l[:i] + l[i+1:]) # error: unexpected indent (意外缩进)
for x in p:
r.append(l[i:i+1] + x)
return r # error: inconsistent dedent (不一致的缩进)
a level popped off the stack.)
Python 3.0 引入了在ASCII范围之外额外字符 (参见 PEP 3131).
对于这些字符, 分类(classification) 可以使用
unicodedata 模块中的 Unicode Character Database.
标识符不限长度, 区分大小写.
identifier ::= id_startid_continue*
id_start ::= <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue ::= <all characters in id_start, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise
(The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.)
Objects are never explicitly destroyed; however, when they become unreachable
they may be garbage-collected. An implementation is allowed to postpone garbage
collection or omit it altogether — it is a matter of implementation quality
how garbage collection is implemented, as long as no objects are collected that
are still reachable.
These represent finite sets of objects indexed by arbitrary index sets. The
subscript notation a[k] selects the item indexed by k from the mapping
a; this can be used in expressions and as the target of assignments or
del statements. The built-in function len() returns the number
of items in a mapping.
表示由任意类型作索引的有限对象集合. 下标记法 a[k] 表示在映射类型对象 a 中选择以
k 为索引的项, 这该项可以用于表达式, 作为赋值语句和 del 语句的目标.
内建函式 len() 返回映射对象的元素数量.
Special read-only attributes: __self__ is the class instance object,
__func__ is the function object; __doc__ is the method’s
documentation (same as __func__.__doc__); __name__ is the
method name (same as __func__.__name__); __module__ is the
name of the module the method was defined in, or None if unavailable.
>>> class C(object):
... pass
...
>>> c = C()
>>> c.__len__ = lambda: 5
>>> len(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'C' has no len()
Names refer to objects. Names are introduced by name binding operations.
Each occurrence of a name in the program text refers to the binding of
that name established in the innermost function block containing the use.
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block. A script file (a file given as
standard input to the interpreter or specified on the interpreter command line
the first argument) is a code block. A script command (a command specified on
the interpreter command line with the ‘-c‘ option) is a code block. The
string argument passed to the built-in functions eval() and exec()
is a code block.
A code block is executed in an execution frame. A frame contains some
administrative information (used for debugging) and determines where and how
execution continues after the code block’s execution has completed.
A scope defines the visibility of a name within a block. If a local
variable is defined in a block, its scope includes that block. If the
definition occurs in a function block, the scope extends to any blocks contained
within the defining one, unless a contained block introduces a different binding
for the name. The scope of names defined in a class block is limited to the
class block; it does not extend to the code blocks of methods – this includes
comprehensions and generator expressions since they are implemented using a
function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
When a name is used in a code block, it is resolved using the nearest enclosing
scope. The set of all such scopes visible to a code block is called the block’s
environment.
If a name is bound in a block, it is a local variable of that block, unless
declared as nonlocal. If a name is bound at the module level, it is
a global variable. (The variables of the module code block are local and
global.) If a variable is used in a code block but not defined there, it is a
free variable.
When a name is not found at all, a NameError exception is raised. If the
name refers to a local variable that has not been bound, a
UnboundLocalError exception is raised. UnboundLocalError is a
subclass of NameError.
The following constructs bind names: formal parameters to functions,
import statements, class and function definitions (these bind the
class or function name in the defining block), and targets that are identifiers
if occurring in an assignment, for loop header, or after
as in a with statement or except clause.
The import statement
of the form from...import* binds all names defined in the imported
module, except those beginning with an underscore. This form may only be used
at the module level.
A target occurring in a del statement is also considered bound for
this purpose (though the actual semantics are to unbind the name).
Each assignment or import statement occurs within a block defined by a class or
function definition or at the module level (the top-level code block).
If a name binding operation occurs anywhere within a code block, all uses of the
name within the block are treated as references to the current block. This can
lead to errors when a name is used within a block before it is bound. This rule
is subtle. Python lacks declarations and allows name binding operations to
occur anywhere within a code block. The local variables of a code block can be
determined by scanning the entire text of the block for name binding operations.
If the global statement occurs within a block, all uses of the name
specified in the statement refer to the binding of that name in the top-level
namespace. Names are resolved in the top-level namespace by searching the
global namespace, i.e. the namespace of the module containing the code block,
and the builtins namespace, the namespace of the module builtins. The
global namespace is searched first. If the name is not found there, the builtins
namespace is searched. The global statement must precede all uses of the name.
The builtins namespace associated with the execution of a code block is actually
found by looking up the name __builtins__ in its global namespace; this
should be a dictionary or a module (in the latter case the module’s dictionary
is used). By default, when in the __main__ module, __builtins__ is
the built-in module builtins; when in any other module,
__builtins__ is an alias for the dictionary of the builtins module
itself. __builtins__ can be set to a user-created dictionary to create a
weak form of restricted execution.
CPython implementation detail: Users should not touch __builtins__; it is strictly an implementation
detail. Users wanting to override values in the builtins namespace should
import the builtins module and modify its
attributes appropriately.
The namespace for a module is automatically created the first time a module is
imported. The main module for a script is always called __main__.
The global statement has the same scope as a name binding operation
in the same block. If the nearest enclosing scope for a free variable contains
a global statement, the free variable is treated as a global.
A class definition is an executable statement that may use and define names.
These references follow the normal rules for name resolution. The namespace of
the class definition becomes the attribute dictionary of the class. Names
defined at the class scope are not visible in methods.
There are several cases where Python statements are illegal when used in
conjunction with nested scopes that contain free variables.
If a variable is referenced in an enclosing scope, it is illegal to delete the
name. An error will be reported at compile time.
If the wild card form of import — import* — is used in a function and
the function contains or is a nested block with free variables, the compiler
will raise a SyntaxError.
The eval() and exec() functions do not have access to the full
environment for resolving names. Names may be resolved in the local and global
namespaces of the caller. Free variables are not resolved in the nearest
enclosing namespace, but in the global namespace. [1] The exec() and
eval() functions have optional arguments to override the global and local
namespace. If only one namespace is specified, it is used for both.
Exceptions are a means of breaking out of the normal flow of control of a code
block in order to handle errors or other exceptional conditions. An exception
is raised at the point where the error is detected; it may be handled by the
surrounding code block or by any code block that directly or indirectly invoked
the code block where the error occurred.
The Python interpreter raises an exception when it detects a run-time error
(such as division by zero). A Python program can also explicitly raise an
exception with the raise statement. Exception handlers are specified
with the try ... except statement. The finally
clause of such a statement can be used to specify cleanup code which does not
handle the exception, but is executed whether an exception occurred or not in
the preceding code.
Python uses the “termination” model of error handling: an exception handler can
find out what happened and continue execution at an outer level, but it cannot
repair the cause of the error and retry the failing operation (except by
re-entering the offending piece of code from the top).
When an exception is not handled at all, the interpreter terminates execution of
the program, or returns to its interactive main loop. In either case, it prints
a stack backtrace, except when the exception is SystemExit.
Exceptions are identified by class instances. The except clause is
selected depending on the class of the instance: it must reference the class of
the instance or a base class thereof. The instance can be received by the
handler and can carry additional information about the exceptional condition.
Note
Exception messages are not part of the Python API. Their contents may change
from one version of Python to the next without warning and should not be
relied on by code which will run under multiple versions of the interpreter.
This chapter explains the meaning of the elements of expressions in Python.
Syntax Notes: In this and the following chapters, extended BNF notation will
be used to describe syntax, not lexical analysis. When (one alternative of) a
syntax rule has the form
name ::= othername
and no semantics are given, the semantics of this form of name are the same
as for othername.
When a description of an arithmetic operator below uses the phrase “the numeric
arguments are converted to a common type,” this means that the operator
implementation for built-in types works that way:
If either argument is a complex number, the other is converted to complex;
otherwise, if either argument is a floating point number, the other is
converted to floating point;
otherwise, both must be integers and no conversion is necessary.
Some additional rules apply for certain operators (e.g., a string left argument
to the ‘%’ operator). Extensions must define their own conversion behavior.
Atoms are the most basic elements of expressions. The simplest atoms are
identifiers or literals. Forms enclosed in parentheses, brackets or braces are
also categorized syntactically as atoms. The syntax for atoms is:
An identifier occurring as an atom is a name. See section 标识符和关键字
for lexical definition and section Naming and binding for documentation of naming and
binding.
When the name is bound to an object, evaluation of the atom yields that object.
When a name is not bound, an attempt to evaluate it raises a NameError
exception.
Private name mangling: When an identifier that textually occurs in a class
definition begins with two or more underscore characters and does not end in two
or more underscores, it is considered a private name of that class.
Private names are transformed to a longer form before code is generated for
them. The transformation inserts the class name in front of the name, with
leading underscores removed, and a single underscore inserted in front of the
class name. For example, the identifier __spam occurring in a class named
Ham will be transformed to _Ham__spam. This transformation is
independent of the syntactical context in which the identifier is used. If the
transformed name is extremely long (longer than 255 characters), implementation
defined truncation may happen. If the class name consists only of underscores,
no transformation is done.
Evaluation of a literal yields an object of the given type (string, bytes,
integer, floating point number, complex number) with the given value. The value
may be approximated in the case of floating point and imaginary (complex)
literals. See section 字面值 for details.
With the exception of bytes literals, these all correspond to immutable data
types, and hence the object’s identity is less important than its value.
Multiple evaluations of literals with the same value (either the same occurrence
in the program text or a different occurrence) may obtain the same object or a
different object with the same value.
A parenthesized expression list yields whatever that expression list yields: if
the list contains at least one comma, it yields a tuple; otherwise, it yields
the single expression that makes up the expression list.
An empty pair of parentheses yields an empty tuple object. Since tuples are
immutable, the rules for literals apply (i.e., two occurrences of the empty
tuple may or may not yield the same object).
Note that tuples are not formed by the parentheses, but rather by use of the
comma operator. The exception is the empty tuple, for which parentheses are
required — allowing unparenthesized “nothing” in expressions would cause
ambiguities and allow common typos to pass uncaught.
The comprehension consists of a single expression followed by at least one
for clause and zero or more for or if clauses.
In this case, the elements of the new container are those that would be produced
by considering each of the for or if clauses a block,
nesting from left to right, and evaluating the expression to produce an element
each time the innermost block is reached.
Note that the comprehension is executed in a separate scope, so names assigned
to in the target list don’t “leak” in the enclosing scope.
A list display yields a new list object, the contents being specified by either
a list of expressions or a comprehension. When a comma-separated list of
expressions is supplied, its elements are evaluated from left to right and
placed into the list object in that order. When a comprehension is supplied,
the list is constructed from the elements resulting from the comprehension.
A set display yields a new mutable set object, the contents being specified by
either a sequence of expressions or a comprehension. When a comma-separated
list of expressions is supplied, its elements are evaluated from left to right
and added to the set object. When a comprehension is supplied, the set is
constructed from the elements resulting from the comprehension.
An empty set cannot be constructed with {}; this literal constructs an empty
dictionary.
A dictionary display yields a new dictionary object.
If a comma-separated sequence of key/datum pairs is given, they are evaluated
from left to right to define the entries of the dictionary: each key object is
used as a key into the dictionary to store the corresponding datum. This means
that you can specify the same key multiple times in the key/datum list, and the
final dictionary’s value for that key will be the last one given.
A dict comprehension, in contrast to list and set comprehensions, needs two
expressions separated with a colon followed by the usual “for” and “if” clauses.
When the comprehension is run, the resulting key and value elements are inserted
in the new dictionary in the order they are produced.
Restrictions on the types of the key values are listed earlier in section
标准类型层次. (To summarize, the key type should be hashable, which excludes
all mutable objects.) Clashes between duplicate keys are not detected; the last
datum (textually rightmost in the display) stored for a given key value
prevails.
A generator expression yields a new generator object. Its syntax is the same as
for comprehensions, except that it is enclosed in parentheses instead of
brackets or curly braces.
Variables used in the generator expression are evaluated lazily when the
__next__() method is called for generator object (in the same fashion as
normal generators). However, the leftmost for clause is immediately
evaluated, so that an error produced by it can be seen before any other possible
error in the code that handles the generator expression. Subsequent
for clauses cannot be evaluated immediately since they may depend on
the previous for loop. For example: (x*yforxinrange(10)foryinbar(x)).
The parentheses can be omitted on calls with only one argument. See section
Calls for the detail.
The yield expression is only used when defining a generator function,
and can only be used in the body of a function definition. Using a
yield expression in a function definition is sufficient to cause that
definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a
generator. That generator then controls the execution of a generator function.
The execution starts when one of the generator’s methods is called. At that
time, the execution proceeds to the first yield expression, where it
is suspended again, returning the value of expression_list to
generator’s caller. By suspended we mean that all local state is retained,
including the current bindings of local variables, the instruction pointer, and
the internal evaluation stack. When the execution is resumed by calling one of
the generator’s methods, the function can proceed exactly as if the
yield expression was just another external call. The value of the
yield expression after resuming depends on the method which resumed
the execution.
All of this makes generator functions quite similar to coroutines; they yield
multiple times, they have more than one entry point and their execution can be
suspended. The only difference is that a generator function cannot control
where should the execution continue after it yields; the control is always
transferred to the generator’s caller.
The yield statement is allowed in the try clause of a
try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or by being
garbage collected), the generator-iterator’s close() method will be
called, allowing any pending finally clauses to execute.
The following generator’s methods can be used to control the execution of a
generator function:
Starts the execution of a generator function or resumes it at the last
executed yield expression. When a generator function is resumed
with a __next__() method, the current yield expression
always evaluates to None. The execution then continues to the next
yield expression, where the generator is suspended again, and the
value of the expression_list is returned to next()‘s caller.
If the generator exits without yielding another value, a StopIteration
exception is raised.
This method is normally called implicitly, e.g. by a for loop, or
by the built-in next() function.
Resumes the execution and “sends” a value into the generator function. The
value argument becomes the result of the current yield
expression. The send() method returns the next value yielded by the
generator, or raises StopIteration if the generator exits without
yielding another value. When send() is called to start the generator,
it must be called with None as the argument, because there is no
yield expression that could receive the value.
Raises an exception of type type at the point where generator was paused,
and returns the next value yielded by the generator function. If the generator
exits without yielding another value, a StopIteration exception is
raised. If the generator function does not catch the passed-in exception, or
raises a different exception, then that exception propagates to the caller.
Raises a GeneratorExit at the point where the generator function was
paused. If the generator function then raises StopIteration (by
exiting normally, or due to already being closed) or GeneratorExit (by
not catching the exception), close returns to its caller. If the generator
yields a value, a RuntimeError is raised. If the generator raises any
other exception, it is propagated to the caller. close() does nothing
if the generator has already exited due to an exception or normal exit.
Here is a simple example that demonstrates the behavior of generators and
generator functions:
>>> def echo(value=None):
... print("Execution starts when 'next()' is called for the first time.")
... try:
... while True:
... try:
... value = (yield value)
... except Exception as e:
... value = e
... finally:
... print("Don't forget to clean up when 'close()' is called.")
...
>>> generator = echo(1)
>>> print(next(generator))
Execution starts when 'next()' is called for the first time.
1
>>> print(next(generator))
None
>>> print(generator.send(2))
2
>>> generator.throw(TypeError, "spam")
TypeError('spam',)
>>> generator.close()
Don't forget to clean up when 'close()' is called.
The primary must evaluate to an object of a type that supports attribute
references, which most objects do. This object is then asked to produce the
attribute whose name is the identifier (which can be customized by overriding
the __getattr__() method). If this attribute is not available, the
exception AttributeError is raised. Otherwise, the type and value of the
object produced is determined by the object. Multiple evaluations of the same
attribute reference may yield different objects.
The primary must evaluate to an object that supports subscription, e.g. a list
or dictionary. User-defined objects can support subscription by defining a
__getitem__() method.
For built-in objects, there are two types of objects that support subscription:
If the primary is a mapping, the expression list must evaluate to an object
whose value is one of the keys of the mapping, and the subscription selects the
value in the mapping that corresponds to that key. (The expression list is a
tuple except if it has exactly one item.)
If the primary is a sequence, the expression (list) must evaluate to an integer
or a slice (as discussed in the following section).
The formal syntax makes no special provision for negative indices in
sequences; however, built-in sequences all provide a __getitem__()
method that interprets negative indices by adding the length of the sequence
to the index (so that x[-1] selects the last item of x). The
resulting value must be a nonnegative integer less than the number of items in
the sequence, and the subscription selects the item whose index is that value
(counting from zero). Since the support for negative indices and slicing
occurs in the object’s __getitem__() method, subclasses overriding
this method will need to explicitly add that support.
A string’s items are characters. A character is not a separate data type but a
string of exactly one character.
A slicing selects a range of items in a sequence object (e.g., a string, tuple
or list). Slicings may be used as expressions or as targets in assignment or
del statements. The syntax for a slicing:
There is ambiguity in the formal syntax here: anything that looks like an
expression list also looks like a slice list, so any subscription can be
interpreted as a slicing. Rather than further complicating the syntax, this is
disambiguated by defining that in this case the interpretation as a subscription
takes priority over the interpretation as a slicing (this is the case if the
slice list contains no proper slice).
The semantics for a slicing are as follows. The primary must evaluate to a
mapping object, and it is indexed (using the same __getitem__() method as
normal subscription) with a key that is constructed from the slice list, as
follows. If the slice list contains at least one comma, the key is a tuple
containing the conversion of the slice items; otherwise, the conversion of the
lone slice item is the key. The conversion of a slice item that is an
expression is that expression. The conversion of a proper slice is a slice
object (see section 标准类型层次) whose start, stop and
step attributes are the values of the expressions given as lower bound,
upper bound and stride, respectively, substituting None for missing
expressions.
A trailing comma may be present after the positional and keyword arguments but
does not affect the semantics.
The primary must evaluate to a callable object (user-defined functions, built-in
functions, methods of built-in objects, class objects, methods of class
instances, and all objects having a __call__() method are callable). All
argument expressions are evaluated before the call is attempted. Please refer
to section 函数定义 for the syntax of formal parameter lists.
If keyword arguments are present, they are first converted to positional
arguments, as follows. First, a list of unfilled slots is created for the
formal parameters. If there are N positional arguments, they are placed in the
first N slots. Next, for each keyword argument, the identifier is used to
determine the corresponding slot (if the identifier is the same as the first
formal parameter name, the first slot is used, and so on). If the slot is
already filled, a TypeError exception is raised. Otherwise, the value of
the argument is placed in the slot, filling it (even if the expression is
None, it fills the slot). When all arguments have been processed, the slots
that are still unfilled are filled with the corresponding default value from the
function definition. (Default values are calculated, once, when the function is
defined; thus, a mutable object such as a list or dictionary used as default
value will be shared by all calls that don’t specify an argument value for the
corresponding slot; this should usually be avoided.) If there are any unfilled
slots for which no default value is specified, a TypeError exception is
raised. Otherwise, the list of filled slots is used as the argument list for
the call.
CPython implementation detail: An implementation may provide built-in functions whose positional parameters
do not have names, even if they are ‘named’ for the purpose of documentation,
and which therefore cannot be supplied by keyword. In CPython, this is the
case for functions implemented in C that use PyArg_ParseTuple() to
parse their arguments.
If there are more positional arguments than there are formal parameter slots, a
TypeError exception is raised, unless a formal parameter using the syntax
*identifier is present; in this case, that formal parameter receives a tuple
containing the excess positional arguments (or an empty tuple if there were no
excess positional arguments).
If any keyword argument does not correspond to a formal parameter name, a
TypeError exception is raised, unless a formal parameter using the syntax
**identifier is present; in this case, that formal parameter receives a
dictionary containing the excess keyword arguments (using the keywords as keys
and the argument values as corresponding values), or a (new) empty dictionary if
there were no excess keyword arguments.
If the syntax *expression appears in the function call, expression must
evaluate to an iterable. Elements from this iterable are treated as if they
were additional positional arguments; if there are positional arguments
x1, ..., xN, and expression evaluates to a sequence y1, ..., yM,
this is equivalent to a call with M+N positional arguments x1, ..., xN,
y1, ..., yM.
A consequence of this is that although the *expression syntax may appear
after some keyword arguments, it is processed before the keyword arguments
(and the **expression argument, if any – see below). So:
It is unusual for both keyword arguments and the *expression syntax to be
used in the same call, so in practice this confusion does not arise.
If the syntax **expression appears in the function call, expression must
evaluate to a mapping, the contents of which are treated as additional keyword
arguments. In the case of a keyword appearing in both expression and as an
explicit keyword argument, a TypeError exception is raised.
Formal parameters using the syntax *identifier or **identifier cannot be
used as positional argument slots or as keyword argument names.
A call always returns some value, possibly None, unless it raises an
exception. How this value is computed depends on the type of the callable
object.
If it is—
a user-defined function:
The code block for the function is executed, passing it the argument list. The
first thing the code block will do is bind the formal parameters to the
arguments; this is described in section 函数定义. When the code block
executes a return statement, this specifies the return value of the
function call.
a built-in function or method:
The result is up to the interpreter; see 内置函数 for the
descriptions of built-in functions and methods.
a class object:
A new instance of that class is returned.
a class instance method:
The corresponding user-defined function is called, with an argument list that is
one longer than the argument list of the call: the instance becomes the first
argument.
a class instance:
The class must define a __call__() method; the effect is then the same as
if that method was called.
Thus, in an unparenthesized sequence of power and unary operators, the operators
are evaluated from right to left (this does not constrain the evaluation order
for the operands): -1**2 results in -1.
The power operator has the same semantics as the built-in pow() function,
when called with two arguments: it yields its left argument raised to the power
of its right argument. The numeric arguments are first converted to a common
type, and the result is of that type.
For int operands, the result has the same type as the operands unless the second
argument is negative; in that case, all arguments are converted to float and a
float result is delivered. For example, 10**2 returns 100, but
10**-2 returns 0.01.
Raising 0.0 to a negative power results in a ZeroDivisionError.
Raising a negative number to a fractional power results in a complex
number. (In earlier versions it raised a ValueError.)
The unary - (minus) operator yields the negation of its numeric argument.
The unary + (plus) operator yields its numeric argument unchanged.
The unary ~ (invert) operator yields the bitwise inversion of its integer
argument. The bitwise inversion of x is defined as -(x+1). It only
applies to integral numbers.
In all three cases, if the argument does not have the proper type, a
TypeError exception is raised.
The binary arithmetic operations have the conventional priority levels. Note
that some of these operations also apply to certain non-numeric types. Apart
from the power operator, there are only two levels, one for multiplicative
operators and one for additive operators:
The * (multiplication) operator yields the product of its arguments. The
arguments must either both be numbers, or one argument must be an integer and
the other must be a sequence. In the former case, the numbers are converted to a
common type and then multiplied together. In the latter case, sequence
repetition is performed; a negative repetition factor yields an empty sequence.
The / (division) and // (floor division) operators yield the quotient of
their arguments. The numeric arguments are first converted to a common type.
Integer division yields a float, while floor division of integers results in an
integer; the result is that of mathematical division with the ‘floor’ function
applied to the result. Division by zero raises the ZeroDivisionError
exception.
The % (modulo) operator yields the remainder from the division of the first
argument by the second. The numeric arguments are first converted to a common
type. A zero right argument raises the ZeroDivisionError exception. The
arguments may be floating point numbers, e.g., 3.14%0.7 equals 0.34
(since 3.14 equals 4*0.7+0.34.) The modulo operator always yields a
result with the same sign as its second operand (or zero); the absolute value of
the result is strictly smaller than the absolute value of the second operand
[1].
The floor division and modulo operators are connected by the following
identity: x==(x//y)*y+(x%y). Floor division and modulo are also
connected with the built-in function divmod(): divmod(x,y)==(x//y,x%y). [2].
In addition to performing the modulo operation on numbers, the % operator is
also overloaded by string objects to perform old-style string formatting (also
known as interpolation). The syntax for string formatting is described in the
Python Library Reference, section Old String Formatting Operations.
The floor division operator, the modulo operator, and the divmod()
function are not defined for complex numbers. Instead, convert to a floating
point number using the abs() function if appropriate.
The + (addition) operator yields the sum of its arguments. The arguments
must either both be numbers or both sequences of the same type. In the former
case, the numbers are converted to a common type and then added together. In
the latter case, the sequences are concatenated.
The - (subtraction) operator yields the difference of its arguments. The
numeric arguments are first converted to a common type.
These operators accept integers as arguments. They shift the first argument to
the left or right by the number of bits given by the second argument.
A right shift by n bits is defined as division by pow(2,n). A left shift
by n bits is defined as multiplication with pow(2,n).
Note
In the current implementation, the right-hand operand is required
to be at most sys.maxsize. If the right-hand operand is larger than
sys.maxsize an OverflowError exception is raised.
Unlike C, all comparison operations in Python have the same priority, which is
lower than that of any arithmetic, shifting or bitwise operation. Also unlike
C, expressions like a<b<c have the interpretation that is conventional
in mathematics:
Comparisons can be chained arbitrarily, e.g., x<y<=z is equivalent to
x<yandy<=z, except that y is evaluated only once (but in both
cases z is not evaluated at all when x<y is found to be false).
Formally, if a, b, c, ..., y, z are expressions and op1, op2, ...,
opN are comparison operators, then aop1bop2c...yopNz is equivalent
to aop1bandbop2cand...yopNz, except that each expression is
evaluated at most once.
Note that aop1bop2c doesn’t imply any kind of comparison between a and
c, so that, e.g., x<y>z is perfectly legal (though perhaps not
pretty).
The operators <, >, ==, >=, <=, and != compare the
values of two objects. The objects need not have the same type. If both are
numbers, they are converted to a common type. Otherwise, the == and !=
operators always consider objects of different types to be unequal, while the
<, >, >= and <= operators raise a TypeError when
comparing objects of different types that do not implement these operators for
the given pair of types. You can control comparison behavior of objects of
non-built-in types by defining rich comparison methods like __gt__(),
described in section 基本定制.
Comparison of objects of the same type depends on the type:
Numbers are compared arithmetically.
The values float('NaN') and Decimal('NaN') are special.
The are identical to themselves, xisx but are not equal to themselves,
x!=x. Additionally, comparing any value to a not-a-number value
will return False. For example, both 3<float('NaN') and
float('NaN')<3 will return False.
Bytes objects are compared lexicographically using the numeric values of their
elements.
Strings are compared lexicographically using the numeric equivalents (the
result of the built-in function ord()) of their characters. [3] String
and bytes object can’t be compared!
Tuples and lists are compared lexicographically using comparison of
corresponding elements. This means that to compare equal, each element must
compare equal and the two sequences must be of the same type and have the same
length.
If not equal, the sequences are ordered the same as their first differing
elements. For example, [1,2,x]<=[1,2,y] has the same value as
x<=y. If the corresponding element does not exist, the shorter
sequence is ordered first (for example, [1,2]<[1,2,3]).
Mappings (dictionaries) compare equal if and only if they have the same
(key,value) pairs. Order comparisons ('<','<=','>=','>')
raise TypeError.
Sets and frozensets define comparison operators to mean subset and superset
tests. Those relations do not define total orderings (the two sets {1,2}
and {2,3} are not equal, nor subsets of one another, nor supersets of one
another). Accordingly, sets are not appropriate arguments for functions
which depend on total ordering. For example, min(), max(), and
sorted() produce undefined results given a list of sets as inputs.
Most other objects of built-in types compare unequal unless they are the same
object; the choice whether one object is considered smaller or larger than
another one is made arbitrarily but consistently within one execution of a
program.
Comparison of objects of the differing types depends on whether either
of the types provide explicit support for the comparison. Most numeric types
can be compared with one another, but comparisons of float and
Decimal are not supported to avoid the inevitable confusion arising
from representation issues such as float('1.1') being inexactly represented
and therefore not exactly equal to Decimal('1.1') which is. When
cross-type comparison is not supported, the comparison method returns
NotImplemented. This can create the illusion of non-transitivity between
supported cross-type comparisons and unsupported comparisons. For example,
Decimal(2)==2 and 2==float(2) but Decimal(2)!=float(2).
The operators in and notin test for membership. xins evaluates to true if x is a member of s, and false otherwise. xnotins returns the negation of xins. All built-in sequences and set types
support this as well as dictionary, for which in tests whether a the
dictionary has a given key. For container types such as list, tuple, set,
frozenset, dict, or collections.deque, the expression xiny is equivalent
to any(xiseorx==eforeiny).
For the string and bytes types, xiny is true if and only if x is a
substring of y. An equivalent test is y.find(x)!=-1. Empty strings are
always considered to be a substring of any other string, so ""in"abc" will
return True.
For user-defined classes which define the __contains__() method, xiny is true if and only if y.__contains__(x) is true.
For user-defined classes which do not define __contains__() but do define
__iter__(), xiny is true if some value z with x==z is
produced while iterating over y. If an exception is raised during the
iteration, it is as if in raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines
__getitem__(), xiny is true if and only if there is a non-negative
integer index i such that x==y[i], and all lower integer indices do not
raise IndexError exception. (If any other exception is raised, it is as
if in raised that exception).
The operator notin is defined to have the inverse true value of
in.
The operators is and isnot test for object identity: xisy is true if and only if x and y are the same object. xisnoty
yields the inverse truth value. [4]
In the context of Boolean operations, and also when expressions are used by
control flow statements, the following values are interpreted as false:
False, None, numeric zero of all types, and empty strings and containers
(including strings, tuples, lists, dictionaries, sets and frozensets). All
other values are interpreted as true. User-defined objects can customize their
truth value by providing a __bool__() method.
The operator not yields True if its argument is false, False
otherwise.
The expression xandy first evaluates x; if x is false, its value is
returned; otherwise, y is evaluated and the resulting value is returned.
The expression xory first evaluates x; if x is true, its value is
returned; otherwise, y is evaluated and the resulting value is returned.
(Note that neither and nor or restrict the value and type
they return to False and True, but rather return the last evaluated
argument. This is sometimes useful, e.g., if s is a string that should be
replaced by a default value if it is empty, the expression sor'foo' yields
the desired value. Because not has to invent a value anyway, it does
not bother to return a value of the same type as its argument, so e.g., not'foo' yields False, not ''.)
Conditional expressions (sometimes called a “ternary operator”) have the lowest
priority of all Python operations.
The expression xifCelsey first evaluates the condition, C (notx);
if C is true, x is evaluated and its value is returned; otherwise, y is
evaluated and its value is returned.
See PEP 308 for more details about conditional expressions.
Lambda forms (lambda expressions) have the same syntactic position as
expressions. They are a shorthand to create anonymous functions; the expression
lambdaarguments:expression yields a function object. The unnamed object
behaves like a function object defined with
def <lambda>(arguments):
return expression
See section 函数定义 for the syntax of parameter lists. Note that
functions created with lambda forms cannot contain statements or annotations.
An expression list containing at least one comma yields a tuple. The length of
the tuple is the number of expressions in the list. The expressions are
evaluated from left to right.
The trailing comma is required only to create a single tuple (a.k.a. a
singleton); it is optional in all other cases. A single expression without a
trailing comma doesn’t create a tuple, but rather yields the value of that
expression. (To create an empty tuple, use an empty pair of parentheses:
().)
Python evaluates expressions from left to right. Notice that while evaluating
an assignment, the right-hand side is evaluated before the left-hand side.
In the following lines, expressions will be evaluated in the arithmetic order of
their suffixes:
The following table summarizes the operator precedences in Python, from lowest
precedence (least binding) to highest precedence (most binding). Operators in
the same box have the same precedence. Unless the syntax is explicitly given,
operators are binary. Operators in the same box group left to right (except for
comparisons, including tests, which all have the same precedence and chain from
left to right — see section Comparisons — and exponentiation, which
groups from right to left).
While abs(x%y)<abs(y) is true mathematically, for floats it may not be
true numerically due to roundoff. For example, and assuming a platform on which
a Python float is an IEEE 754 double-precision number, in order that -1e-100%1e100 have the same sign as 1e100, the computed result is -1e-100+1e100, which is numerically exactly equal to 1e100. The function
math.fmod() returns a result whose sign matches the sign of the
first argument instead, and so returns -1e-100 in this case. Which approach
is more appropriate depends on the application.
If x is very close to an exact integer multiple of y, it’s possible for
x//y to be one larger than (x-x%y)//y due to rounding. In such
cases, Python returns the latter result, in order to preserve that
divmod(x,y)[0]*y+x%y be very close to x.
While comparisons between strings make sense at the byte level, they may
be counter-intuitive to users. For example, the strings "\u00C7" and
"\u0327\u0043" compare differently, even though they both represent the
same unicode character (LATIN CAPITAL LETTER C WITH CEDILLA). To compare
strings in a human recognizable way, compare using
unicodedata.normalize().
Due to automatic garbage-collection, free lists, and the dynamic nature of
descriptors, you may notice seemingly unusual behaviour in certain uses of
the is operator, like those involving comparisons between instance
methods, or constants. Check their documentation for more info.
Simple statements are comprised within a single logical line. Several simple
statements may occur on a single line separated by semicolons. The syntax for
simple statements is:
Expression statements are used (mostly interactively) to compute and write a
value, or (usually) to call a procedure (a function that returns no meaningful
result; in Python, procedures return the value None). Other uses of
expression statements are allowed and occasionally useful. The syntax for an
expression statement is:
An expression statement evaluates the expression list (which may be a single
expression).
In interactive mode, if the value is not None, it is converted to a string
using the built-in repr() function and the resulting string is written to
standard output on a line by itself (except if the result is None, so that
procedure calls do not cause any output.)
(See section Primaries for the syntax definitions for the last three
symbols.)
An assignment statement evaluates the expression list (remember that this can be
a single expression or a comma-separated list, the latter yielding a tuple) and
assigns the single resulting object to each of the target lists, from left to
right.
Assignment is defined recursively depending on the form of the target (list).
When a target is part of a mutable object (an attribute reference, subscription
or slicing), the mutable object must ultimately perform the assignment and
decide about its validity, and may raise an exception if the assignment is
unacceptable. The rules observed by various types and the exceptions raised are
given with the definition of the object types (see section 标准类型层次).
Assignment of an object to a target list, optionally enclosed in parentheses or
square brackets, is recursively defined as follows.
If the target list is a single target: The object is assigned to that target.
If the target list is a comma-separated list of targets: The object must be an
iterable with the same number of items as there are targets in the target list,
and the items are assigned, from left to right, to the corresponding targets.
If the target list contains one target prefixed with an asterisk, called a
“starred” target: The object must be a sequence with at least as many items
as there are targets in the target list, minus one. The first items of the
sequence are assigned, from left to right, to the targets before the starred
target. The final items of the sequence are assigned to the targets after
the starred target. A list of the remaining items in the sequence is then
assigned to the starred target (the list can be empty).
Else: The object must be a sequence with the same number of items as there
are targets in the target list, and the items are assigned, from left to
right, to the corresponding targets.
Assignment of an object to a single target is recursively defined as follows.
If the target is an identifier (name):
If the name does not occur in a global or nonlocal
statement in the current code block: the name is bound to the object in the
current local namespace.
Otherwise: the name is bound to the object in the global namespace or the
outer namespace determined by nonlocal, respectively.
The name is rebound if it was already bound. This may cause the reference
count for the object previously bound to the name to reach zero, causing the
object to be deallocated and its destructor (if it has one) to be called.
If the target is a target list enclosed in parentheses or in square brackets:
The object must be an iterable with the same number of items as there are
targets in the target list, and its items are assigned, from left to right,
to the corresponding targets.
If the target is an attribute reference: The primary expression in the
reference is evaluated. It should yield an object with assignable attributes;
if this is not the case, TypeError is raised. That object is then
asked to assign the assigned object to the given attribute; if it cannot
perform the assignment, it raises an exception (usually but not necessarily
AttributeError).
Note: If the object is a class instance and the attribute reference occurs on
both sides of the assignment operator, the RHS expression, a.x can access
either an instance attribute or (if no instance attribute exists) a class
attribute. The LHS target a.x is always set as an instance attribute,
creating it if necessary. Thus, the two occurrences of a.x do not
necessarily refer to the same attribute: if the RHS expression refers to a
class attribute, the LHS creates a new instance attribute as the target of the
assignment:
class Cls:
x = 3 # class variable
inst = Cls()
inst.x = inst.x + 1 # writes inst.x as 4 leaving Cls.x as 3
This description does not necessarily apply to descriptor attributes, such as
properties created with property().
If the target is a subscription: The primary expression in the reference is
evaluated. It should yield either a mutable sequence object (such as a list)
or a mapping object (such as a dictionary). Next, the subscript expression is
evaluated.
If the primary is a mutable sequence object (such as a list), the subscript
must yield an integer. If it is negative, the sequence’s length is added to
it. The resulting value must be a nonnegative integer less than the
sequence’s length, and the sequence is asked to assign the assigned object to
its item with that index. If the index is out of range, IndexError is
raised (assignment to a subscripted sequence cannot add new items to a list).
If the primary is a mapping object (such as a dictionary), the subscript must
have a type compatible with the mapping’s key type, and the mapping is then
asked to create a key/datum pair which maps the subscript to the assigned
object. This can either replace an existing key/value pair with the same key
value, or insert a new key/value pair (if no key with the same value existed).
For user-defined objects, the __setitem__() method is called with
appropriate arguments.
If the target is a slicing: The primary expression in the reference is
evaluated. It should yield a mutable sequence object (such as a list). The
assigned object should be a sequence object of the same type. Next, the lower
and upper bound expressions are evaluated, insofar they are present; defaults
are zero and the sequence’s length. The bounds should evaluate to integers.
If either bound is negative, the sequence’s length is added to it. The
resulting bounds are clipped to lie between zero and the sequence’s length,
inclusive. Finally, the sequence object is asked to replace the slice with
the items of the assigned sequence. The length of the slice may be different
from the length of the assigned sequence, thus changing the length of the
target sequence, if the object allows it.
CPython implementation detail: In the current implementation, the syntax for targets is taken to be the same
as for expressions, and invalid syntax is rejected during the code generation
phase, causing less detailed error messages.
WARNING: Although the definition of assignment implies that overlaps between the
left-hand side and the right-hand side are ‘safe’ (for example a,b=b,a
swaps two variables), overlaps within the collection of assigned-to variables
are not safe! For instance, the following program prints [0,2]:
(See section Primaries for the syntax definitions for the last three
symbols.)
An augmented assignment evaluates the target (which, unlike normal assignment
statements, cannot be an unpacking) and the expression list, performs the binary
operation specific to the type of assignment on the two operands, and assigns
the result to the original target. The target is only evaluated once.
An augmented assignment expression like x+=1 can be rewritten as x=x+1 to achieve a similar, but not exactly equal effect. In the augmented
version, x is only evaluated once. Also, when possible, the actual operation
is performed in-place, meaning that rather than creating a new object and
assigning that to the target, the old object is modified instead.
With the exception of assigning to tuples and multiple targets in a single
statement, the assignment done by augmented assignment statements is handled the
same way as normal assignments. Similarly, with the exception of the possible
in-place behavior, the binary operation performed by augmented assignment is
the same as the normal binary operations.
The simple form, assertexpression, is equivalent to
if __debug__:
if not expression: raise AssertionError
The extended form, assertexpression1,expression2, is equivalent to
if __debug__:
if not expression1: raise AssertionError(expression2)
These equivalences assume that __debug__ and AssertionError refer to
the built-in variables with those names. In the current implementation, the
built-in variable __debug__ is True under normal circumstances,
False when optimization is requested (command line option -O). The current
code generator emits no code for an assert statement when optimization is
requested at compile time. Note that it is unnecessary to include the source
code for the expression that failed in the error message; it will be displayed
as part of the stack trace.
Assignments to __debug__ are illegal. The value for the built-in variable
is determined when the interpreter starts.
pass is a null operation — when it is executed, nothing happens.
It is useful as a placeholder when a statement is required syntactically, but no
code needs to be executed, for example:
def f(arg): pass # a function that does nothing (yet)
class C: pass # a class with no methods (yet)
Deletion is recursively defined very similar to the way assignment is defined.
Rather that spelling it out in full details, here are some hints.
Deletion of a target list recursively deletes each target, from left to right.
Deletion of a name removes the binding of that name from the local or global
namespace, depending on whether the name occurs in a global statement
in the same code block. If the name is unbound, a NameError exception
will be raised.
Deletion of attribute references, subscriptions and slicings is passed to the
primary object involved; deletion of a slicing is in general equivalent to
assignment of an empty slice of the right type (but even this is determined by
the sliced object).
Changed in version 3.2:
Changed in version 3.2: Previously it was illegal to delete a name from the local namespace if it
occurs as a free variable in a nested block.
return may only occur syntactically nested in a function definition,
not within a nested class definition.
If an expression list is present, it is evaluated, else None is substituted.
return leaves the current function call with the expression list (or
None) as return value.
When return passes control out of a try statement with a
finally clause, that finally clause is executed before
really leaving the function.
In a generator function, the return statement is not allowed to
include an expression_list. In that context, a bare return
indicates that the generator is done and will cause StopIteration to be
raised.
The yield statement is only used when defining a generator function,
and is only used in the body of the generator function. Using a yield
statement in a function definition is sufficient to cause that definition to
create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator
iterator, or more commonly, a generator. The body of the generator function is
executed by calling the next() function on the generator repeatedly until
it raises an exception.
When a yield statement is executed, the state of the generator is
frozen and the value of expression_list is returned to next()‘s
caller. By “frozen” we mean that all local state is retained, including the
current bindings of local variables, the instruction pointer, and the internal
evaluation stack: enough information is saved so that the next time next()
is invoked, the function can proceed exactly as if the yield
statement were just another external call.
The yield statement is allowed in the try clause of a
try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or by being
garbage collected), the generator-iterator’s close() method will be
called, allowing any pending finally clauses to execute.
If no expressions are present, raise re-raises the last exception
that was active in the current scope. If no exception is active in the current
scope, a TypeError exception is raised indicating that this is an error
(if running under IDLE, a queue.Empty exception is raised instead).
Otherwise, raise evaluates the first expression as the exception
object. It must be either a subclass or an instance of BaseException.
If it is a class, the exception instance will be obtained when needed by
instantiating the class with no arguments.
The type of the exception is the exception instance’s class, the
value is the instance itself.
A traceback object is normally created automatically when an exception is raised
and attached to it as the __traceback__ attribute, which is writable.
You can create an exception and set your own traceback in one step using the
with_traceback() exception method (which returns the same exception
instance, with its traceback set to its argument), like so:
The from clause is used for exception chaining: if given, the second
expression must be another exception class or instance, which will then be
attached to the raised exception as the __cause__ attribute (which is
writable). If the raised exception is not handled, both exceptions will be
printed:
>>> try:
... print(1 / 0)
... except Exception as exc:
... raise RuntimeError("Something bad happened") from exc
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: int division or modulo by zero
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
RuntimeError: Something bad happened
A similar mechanism works implicitly if an exception is raised inside an
exception handler: the previous exception is then attached as the new
exception’s __context__ attribute:
>>> try:
... print(1 / 0)
... except:
... raise RuntimeError("Something bad happened")
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: int division or modulo by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
RuntimeError: Something bad happened
Additional information on exceptions can be found in section Exceptions,
and information about handling exceptions is in section try 语句.
continue may only occur syntactically nested in a for or
while loop, but not nested in a function or class definition or
finally clause within that loop. It continues with the next
cycle of the nearest enclosing loop.
When continue passes control out of a try statement with a
finally clause, that finally clause is executed before
really starting the next loop cycle.
Import statements are executed in two steps: (1) find a module, and initialize
it if necessary; (2) define a name or names in the local namespace (of the scope
where the import statement occurs). The statement comes in two
forms differing on whether it uses the from keyword. The first form
(without from) repeats these steps for each identifier in the list.
The form with from performs step (1) once, and then performs step
(2) repeatedly. For a reference implementation of step (1), see the
importlib module.
To understand how step (1) occurs, one must first understand how Python handles
hierarchical naming of modules. To help organize modules and provide a
hierarchy in naming, Python has a concept of packages. A package can contain
other packages and modules while modules cannot contain other modules or
packages. From a file system perspective, packages are directories and modules
are files. The original specification for packages is still available to read,
although minor details have changed since the writing of that document.
Once the name of the module is known (unless otherwise specified, the term
“module” will refer to both packages and modules), searching
for the module or package can begin. The first place checked is
sys.modules, the cache of all modules that have been imported
previously. If the module is found there then it is used in step (2) of import
unless None is found in sys.modules, in which case
ImportError is raised.
If the module is not found in the cache, then sys.meta_path is searched
(the specification for sys.meta_path can be found in PEP 302).
The object is a list of finder objects which are queried in order as to
whether they know how to load the module by calling their find_module()
method with the name of the module. If the module happens to be contained
within a package (as denoted by the existence of a dot in the name), then a
second argument to find_module() is given as the value of the
__path__ attribute from the parent package (everything up to the last
dot in the name of the module being imported). If a finder can find the module
it returns a loader (discussed later) or returns None.
If none of the finders on sys.meta_path are able to find the module
then some implicitly defined finders are queried. Implementations of Python
vary in what implicit meta path finders are defined. The one they all do
define, though, is one that handles sys.path_hooks,
sys.path_importer_cache, and sys.path.
The implicit finder searches for the requested module in the “paths” specified
in one of two places (“paths” do not have to be file system paths). If the
module being imported is supposed to be contained within a package then the
second argument passed to find_module(), __path__ on the parent
package, is used as the source of paths. If the module is not contained in a
package then sys.path is used as the source of paths.
Once the source of paths is chosen it is iterated over to find a finder that
can handle that path. The dict at sys.path_importer_cache caches
finders for paths and is checked for a finder. If the path does not have a
finder cached then sys.path_hooks is searched by calling each object in
the list with a single argument of the path, returning a finder or raises
ImportError. If a finder is returned then it is cached in
sys.path_importer_cache and then used for that path entry. If no finder
can be found but the path exists then a value of None is
stored in sys.path_importer_cache to signify that an implicit,
file-based finder that handles modules stored as individual files should be
used for that path. If the path does not exist then a finder which always
returns None is placed in the cache for the path.
If no finder can find the module then ImportError is raised. Otherwise
some finder returned a loader whose load_module() method is called with
the name of the module to load (see PEP 302 for the original definition of
loaders). A loader has several responsibilities to perform on a module it
loads. First, if the module already exists in sys.modules (a
possibility if the loader is called outside of the import machinery) then it
is to use that module for initialization and not a new module. But if the
module does not exist in sys.modules then it is to be added to that
dict before initialization begins. If an error occurs during loading of the
module and it was added to sys.modules it is to be removed from the
dict. If an error occurs but the module was already in sys.modules it
is left in the dict.
The loader must set several attributes on the module. __name__ is to be
set to the name of the module. __file__ is to be the “path” to the file
unless the module is built-in (and thus listed in
sys.builtin_module_names) in which case the attribute is not set.
If what is being imported is a package then __path__ is to be set to a
list of paths to be searched when looking for modules and packages contained
within the package being imported. __package__ is optional but should
be set to the name of package that contains the module or package (the empty
string is used for module not contained in a package). __loader__ is
also optional but should be set to the loader object that is loading the
module.
If an error occurs during loading then the loader raises ImportError if
some other exception is not already being propagated. Otherwise the loader
returns the module that was loaded and initialized.
When step (1) finishes without raising an exception, step (2) can begin.
The first form of import statement binds the module name in the local
namespace to the module object, and then goes on to import the next identifier,
if any. If the module name is followed by as, the name following
as is used as the local name for the module.
The from form does not bind the module name: it goes through the list
of identifiers, looks each one of them up in the module found in step (1), and
binds the name in the local namespace to the object thus found. As with the
first form of import, an alternate local name can be supplied by
specifying “as localname”. If a name is not found,
ImportError is raised. If the list of identifiers is replaced by a star
('*'), all public names defined in the module are bound in the local
namespace of the import statement.
The public names defined by a module are determined by checking the module’s
namespace for a variable named __all__; if defined, it must be a sequence of
strings which are names defined or imported by that module. The names given in
__all__ are all considered public and are required to exist. If __all__
is not defined, the set of public names includes all names found in the module’s
namespace which do not begin with an underscore character ('_').
__all__ should contain the entire public API. It is intended to avoid
accidentally exporting items that are not part of the API (such as library
modules which were imported and used within the module).
The from form with * may only occur in a module scope. The wild
card form of import — import* — is only allowed at the module level.
Attempting to use it in class or function definitions will raise a
SyntaxError.
When specifying what module to import you do not have to specify the absolute
name of the module. When a module or package is contained within another
package it is possible to make a relative import within the same top package
without having to mention the package name. By using leading dots in the
specified module or package after from you can specify how high to
traverse up the current package hierarchy without specifying exact names. One
leading dot means the current package where the module making the import
exists. Two dots means up one package level. Three dots is up two levels, etc.
So if you execute from.importmod from a module in the pkg package
then you will end up importing pkg.mod. If you execute from..subpkg2importmod from within pkg.subpkg1 you will import pkg.subpkg2.mod.
The specification for relative imports is contained within PEP 328.
importlib.import_module() is provided to support applications that
determine which modules need to be loaded dynamically.
A future statement is a directive to the compiler that a particular
module should be compiled using syntax or semantics that will be available in a
specified future release of Python. The future statement is intended to ease
migration to future versions of Python that introduce incompatible changes to
the language. It allows use of the new features on a per-module basis before
the release in which the feature becomes standard.
A future statement must appear near the top of the module. The only lines that
can appear before a future statement are:
the module docstring (if any),
comments,
blank lines, and
other future statements.
The features recognized by Python 3.0 are absolute_import, division,
generators, unicode_literals, print_function, nested_scopes and
with_statement. They are all redundant because they are always enabled, and
only kept for backwards compatibility.
A future statement is recognized and treated specially at compile time: Changes
to the semantics of core constructs are often implemented by generating
different code. It may even be the case that a new feature introduces new
incompatible syntax (such as a new reserved word), in which case the compiler
may need to parse the module differently. Such decisions cannot be pushed off
until runtime.
For any given release, the compiler knows which feature names have been defined,
and raises a compile-time error if a future statement contains a feature not
known to it.
The direct runtime semantics are the same as for any import statement: there is
a standard module __future__, described later, and it will be imported in
the usual way at the time the future statement is executed.
The interesting runtime semantics depend on the specific feature enabled by the
future statement.
Note that there is nothing special about the statement:
import __future__ [as name]
That is not a future statement; it’s an ordinary import statement with no
special semantics or syntax restrictions.
Code compiled by calls to the built-in functions exec() and compile()
that occur in a module M containing a future statement will, by default,
use the new syntax or semantics associated with the future statement. This can
be controlled by optional arguments to compile() — see the documentation
of that function for details.
A future statement typed at an interactive interpreter prompt will take effect
for the rest of the interpreter session. If an interpreter is started with the
-i option, is passed a script name to execute, and the script includes
a future statement, it will be in effect in the interactive session started
after the script is executed.
The global statement is a declaration which holds for the entire
current code block. It means that the listed identifiers are to be interpreted
as globals. It would be impossible to assign to a global variable without
global, although free variables may refer to globals without being
declared global.
Names listed in a global statement must not be used in the same code
block textually preceding that global statement.
Names listed in a global statement must not be defined as formal
parameters or in a for loop control target, class
definition, function definition, or import statement.
CPython implementation detail: The current implementation does not enforce the latter two restrictions, but
programs should not abuse this freedom, as future implementations may enforce
them or silently change the meaning of the program.
Programmer’s note: the global is a directive to the parser. It
applies only to code parsed at the same time as the global statement.
In particular, a global statement contained in a string or code
object supplied to the built-in exec() function does not affect the code
block containing the function call, and code contained in such a string is
unaffected by global statements in the code containing the function
call. The same applies to the eval() and compile() functions.
The nonlocal statement causes the listed identifiers to refer to
previously bound variables in the nearest enclosing scope. This is important
because the default behavior for binding is to search the local namespace
first. The statement allows encapsulated code to rebind variables outside of
the local scope besides the global (module) scope.
Names listed in a nonlocal statement, unlike to those listed in a
global statement, must refer to pre-existing bindings in an
enclosing scope (the scope in which a new binding should be created cannot
be determined unambiguously).
Names listed in a nonlocal statement must not collide with
pre-existing bindings in the local scope.
The Python interpreter can get its input from a number of sources: from a script
passed to it as standard input or as program argument, typed in interactively,
from a module source file, etc. This chapter gives the syntax used in these
cases.
While a language specification need not prescribe how the language interpreter
is invoked, it is useful to have a notion of a complete Python program. A
complete Python program is executed in a minimally initialized environment: all
built-in and standard modules are available, but none have been initialized,
except for sys (various system services), builtins (built-in
functions, exceptions and None) and __main__. The latter is used to
provide the local and global namespace for execution of the complete program.
The syntax for a complete Python program is that for file input, described in
the next section.
The interpreter may also be invoked in interactive mode; in this case, it does
not read and execute a complete program but reads and executes one statement
(possibly compound) at a time. The initial environment is identical to that of
a complete program; each statement is executed in the namespace of
__main__.
Under Unix, a complete program can be passed to the interpreter in three forms:
with the -cstring command line option, as a file passed as the
first command line argument, or as standard input. If the file or standard
input is a tty device, the interpreter enters interactive mode; otherwise, it
executes the file as a complete program.
Note that a (top-level) compound statement must be followed by a blank line in
interactive mode; this is needed to help the parser detect the end of the input.
Windows 平台下的 Python 安装包通常包含整个标准库, 并且经常还包含许多附加的组件.
对于类 Unix 操作系统, Python 一般作为一个*包集* (collection of packages) 提供,
因此有必要使用操作系统提供的包工具来获得一些或全部可选组件.
In addition to the standard library, there is a growing collection of
several thousand components (from individual programs and modules to
packages and entire application development frameworks), available from
the Python Package Index.
这就意味着,不管你是从头阅读,还是因为无聊随便翻到某一章节,你都可以获得一个对于当前章节所讲解的模块以及其应用的合理的、全面的了解。当然,你完全不必像阅读一部小说一样阅读这部手册,你可以查阅目录,或者直接在索引中搜索具体的函数、模块、条目。最后,如果你希望学习随机的章节,你可以选择一个随机页码数(模块 random )然后随便阅读一两节。不论你是按照什么顺序阅读手册的各个部分,最好先阅读内置函数( 内置函数 )这章,因为我们假设当你在阅读手册其他部分的时候你对于内置函数已经很熟悉了。
Return a new array of bytes. The bytearray type is a mutable
sequence of integers in the range 0 <= x < 256. It has most of the usual
methods of mutable sequences, described in Mutable Sequence Types, as well
as most methods that the bytes type has, see Bytes and Byte Array Methods.
The optional source parameter can be used to initialize the array in a few
different ways:
If it is a string, you must also give the encoding (and optionally,
errors) parameters; bytearray() then converts the string to
bytes using str.encode().
If it is an integer, the array will have that size and will be
initialized with null bytes.
If it is an object conforming to the buffer interface, a read-only buffer
of the object will be used to initialize the bytes array.
If it is an iterable, it must be an iterable of integers in the range
0<=x<256, which are used as the initial contents of the array.
Without an argument, an array of size 0 is created.
Return a new “bytes” object, which is an immutable sequence of integers in
the range 0<=x<256. bytes is an immutable version of
bytearray – it has the same non-mutating methods and the same
indexing and slicing behavior.
Accordingly, constructor arguments are interpreted as for bytearray().
Bytes objects can also be created with literals, see 字符串与字节的字面值.
Return True if the object argument appears callable,
False if not. If this returns true, it is still possible that a
call fails, but if it is false, calling object will never succeed.
Note that classes are callable (calling a class returns a new instance);
instances are callable if their class has a __call__() method.
New in version 3.2:
New in version 3.2: This function was first removed in Python 3.0 and then brought back
in Python 3.2.
Return the string representing a character whose Unicode codepoint is the integer
i. For example, chr(97) returns the string 'a'. This is the
inverse of ord(). The valid range for the argument is from 0 through
1,114,111 (0x10FFFF in base 16). ValueError will be raised if i is
outside that range.
Note that on narrow Unicode builds, the result is a string of
length two for i greater than 65,535 (0xFFFF in hexadecimal).
A class method receives the class as implicit first argument, just like an
instance method receives the instance. To declare a class method, use this
idiom:
class C:
@classmethod
def f(cls, arg1, arg2, ...): ...
The @classmethod form is a function decorator – see the description
of function definitions in 函数定义 for details.
It can be called either on the class (such as C.f()) or on an instance (such
as C().f()). The instance is ignored except for its class. If a class
method is called for a derived class, the derived class object is passed as the
implied first argument.
Class methods are different than C++ or Java static methods. If you want those,
see staticmethod() in this section.
For more information on class methods, consult the documentation on the standard
type hierarchy in 标准类型层次.
Compile the source into a code or AST object. Code objects can be executed
by exec() or eval(). source can either be a string or an AST
object. Refer to the ast module documentation for information on how
to work with AST objects.
The filename argument should give the file from which the code was read;
pass some recognizable value if it wasn’t read from a file ('<string>' is
commonly used).
The mode argument specifies what kind of code must be compiled; it can be
'exec' if source consists of a sequence of statements, 'eval' if it
consists of a single expression, or 'single' if it consists of a single
interactive statement (in the latter case, expression statements that
evaluate to something other than None will be printed).
The optional arguments flags and dont_inherit control which future
statements (see PEP 236) affect the compilation of source. If neither
is present (or both are zero) the code is compiled with those future
statements that are in effect in the code that is calling compile. If the
flags argument is given and dont_inherit is not (or is zero) then the
future statements specified by the flags argument are used in addition to
those that would be used anyway. If dont_inherit is a non-zero integer then
the flags argument is it – the future statements in effect around the call
to compile are ignored.
Future statements are specified by bits which can be bitwise ORed together to
specify multiple statements. The bitfield required to specify a given feature
can be found as the compiler_flag attribute on the _Feature
instance in the __future__ module.
The argument optimize specifies the optimization level of the compiler; the
default value of -1 selects the optimization level of the interpreter as
given by -O options. Explicit levels are 0 (no optimization;
__debug__ is true), 1 (asserts are removed, __debug__ is false)
or 2 (docstrings are removed too).
This function raises SyntaxError if the compiled source is invalid,
and TypeError if the source contains null bytes.
Note
When compiling a string with multi-line code in 'single' or
'eval' mode, input must be terminated by at least one newline
character. This is to facilitate detection of incomplete and complete
statements in the code module.
Changed in version 3.2:
Changed in version 3.2: Allowed use of Windows and Mac newlines. Also input in 'exec' mode
does not have to end in a newline anymore. Added the optimize parameter.
Create a complex number with the value real + imag*j or convert a string or
number to a complex number. If the first parameter is a string, it will be
interpreted as a complex number and the function must be called without a second
parameter. The second parameter can never be a string. Each argument may be any
numeric type (including complex). If imag is omitted, it defaults to zero and
the function serves as a numeric conversion function like int()
and float(). If both arguments are omitted, returns 0j.
This is a relative of setattr(). The arguments are an object and a
string. The string must be the name of one of the object’s attributes. The
function deletes the named attribute, provided the object allows it. For
example, delattr(x,'foobar') is equivalent to delx.foobar.
dict([arg])
Create a new data dictionary, optionally with items taken from arg.
The dictionary type is described in Mapping Types — dict.
For other containers see the built in list, set, and
tuple classes, and the collections module.
Without arguments, return the list of names in the current local scope. With an
argument, attempt to return a list of valid attributes for that object.
If the object has a method named __dir__(), this method will be called and
must return the list of attributes. This allows objects that implement a custom
__getattr__() or __getattribute__() function to customize the way
dir() reports their attributes.
If the object does not provide __dir__(), the function tries its best to
gather information from the object’s __dict__ attribute, if defined, and
from its type object. The resulting list is not necessarily complete, and may
be inaccurate when the object has a custom __getattr__().
The default dir() mechanism behaves differently with different types of
objects, as it attempts to produce the most relevant, rather than complete,
information:
If the object is a module object, the list contains the names of the module’s
attributes.
If the object is a type or class object, the list contains the names of its
attributes, and recursively of the attributes of its bases.
Otherwise, the list contains the object’s attributes’ names, the names of its
class’s attributes, and recursively of the attributes of its class’s base
classes.
The resulting list is sorted alphabetically. For example:
>>> import struct
>>> dir() # show the names in the module namespace
['__builtins__', '__doc__', '__name__', 'struct']
>>> dir(struct) # show the names in the struct module
['Struct', '__builtins__', '__doc__', '__file__', '__name__',
'__package__', '_clearcache', 'calcsize', 'error', 'pack', 'pack_into',
'unpack', 'unpack_from']
>>> class Shape(object):
def __dir__(self):
return ['area', 'perimeter', 'location']
>>> s = Shape()
>>> dir(s)
['area', 'perimeter', 'location']
Note
Because dir() is supplied primarily as a convenience for use at an
interactive prompt, it tries to supply an interesting set of names more
than it tries to supply a rigorously or consistently defined set of names,
and its detailed behavior may change across releases. For example,
metaclass attributes are not in the result list when the argument is a
class.
Take two (non complex) numbers as arguments and return a pair of numbers
consisting of their quotient and remainder when using integer division. With
mixed operand types, the rules for binary arithmetic operators apply. For
integers, the result is the same as (a//b,a%b). For floating point
numbers the result is (q,a%b), where q is usually math.floor(a/b) but may be 1 less than that. In any case q*b+a%b is very
close to a, if a%b is non-zero it has the same sign as b, and 0<=abs(a%b)<abs(b).
Return an enumerate object. iterable must be a sequence, an
iterator, or some other object which supports iteration. The
__next__() method of the iterator returned by enumerate() returns a
tuple containing a count (from start which defaults to 0) and the
values obtained from iterating over iterable.
The arguments are a string and optional globals and locals. If provided,
globals must be a dictionary. If provided, locals can be any mapping
object.
The expression argument is parsed and evaluated as a Python expression
(technically speaking, a condition list) using the globals and locals
dictionaries as global and local namespace. If the globals dictionary is
present and lacks ‘__builtins__’, the current globals are copied into globals
before expression is parsed. This means that expression normally has full
access to the standard builtins module and restricted environments are
propagated. If the locals dictionary is omitted it defaults to the globals
dictionary. If both dictionaries are omitted, the expression is executed in the
environment where eval() is called. The return value is the result of
the evaluated expression. Syntax errors are reported as exceptions. Example:
>>> x = 1
>>> eval('x+1')
2
This function can also be used to execute arbitrary code objects (such as
those created by compile()). In this case pass a code object instead
of a string. If the code object has been compiled with 'exec' as the
mode argument, eval()‘s return value will be None.
Hints: dynamic execution of statements is supported by the exec()
function. The globals() and locals() functions
returns the current global and local dictionary, respectively, which may be
useful to pass around for use by eval() or exec().
See ast.literal_eval() for a function that can safely evaluate strings
with expressions containing only literals.
This function supports dynamic execution of Python code. object must be
either a string or a code object. If it is a string, the string is parsed as
a suite of Python statements which is then executed (unless a syntax error
occurs). [1] If it is a code object, it is simply executed. In all cases,
the code that’s executed is expected to be valid as file input (see the
section “File input” in the Reference Manual). Be aware that the
return and yield statements may not be used outside of
function definitions even within the context of code passed to the
exec() function. The return value is None.
In all cases, if the optional parts are omitted, the code is executed in the
current scope. If only globals is provided, it must be a dictionary, which
will be used for both the global and the local variables. If globals and
locals are given, they are used for the global and local variables,
respectively. If provided, locals can be any mapping object.
If the globals dictionary does not contain a value for the key
__builtins__, a reference to the dictionary of the built-in module
builtins is inserted under that key. That way you can control what
builtins are available to the executed code by inserting your own
__builtins__ dictionary into globals before passing it to exec().
Note
The built-in functions globals() and locals() return the current
global and local dictionary, respectively, which may be useful to pass around
for use as the second and third argument to exec().
Note
The default locals act as described for function locals() below:
modifications to the default locals dictionary should not be attempted.
Pass an explicit locals dictionary if you need to see effects of the
code on locals after function exec() returns.
Construct an iterator from those elements of iterable for which function
returns true. iterable may be either a sequence, a container which
supports iteration, or an iterator. If function is None, the identity
function is assumed, that is, all elements of iterable that are false are
removed.
Note that filter(function,iterable) is equivalent to the generator
expression (itemforiteminiterableiffunction(item)) if function is
not None and (itemforiteminiterableifitem) if function is
None.
See itertools.filterfalse() for the complementary function that returns
elements of iterable for which function returns false.
If the argument is a string, it should contain a decimal number, optionally
preceded by a sign, and optionally embedded in whitespace. The optional
sign may be '+' or '-'; a '+' sign has no effect on the value
produced. The argument may also be a string representing a NaN
(not-a-number), or a positive or negative infinity. More precisely, the
input must conform to the following grammar after leading and trailing
whitespace characters are removed:
Here floatnumber is the form of a Python floating-point literal,
described in 浮点型字面值. Case is not significant, so, for example,
“inf”, “Inf”, “INFINITY” and “iNfINity” are all acceptable spellings for
positive infinity.
Otherwise, if the argument is an integer or a floating point number, a
floating point number with the same value (within Python’s floating point
precision) is returned. If the argument is outside the range of a Python
float, an OverflowError will be raised.
For a general Python object x, float(x) delegates to
x.__float__().
Convert a value to a “formatted” representation, as controlled by
format_spec. The interpretation of format_spec will depend on the type
of the value argument, however there is a standard formatting syntax that
is used by most built-in types: Format Specification Mini-Language.
The default format_spec is an empty string which usually gives the same
effect as calling str(value).
A call to format(value,format_spec) is translated to
type(value).__format__(format_spec) which bypasses the instance
dictionary when searching for the value’s __format__() method. A
TypeError exception is raised if the method is not found or if either
the format_spec or the return value are not strings.
frozenset([iterable])
Return a frozenset object, optionally with elements taken from iterable.
The frozenset type is described in Set Types — set, frozenset.
For other containers see the built in dict, list, and
tuple classes, and the collections module.
Return the value of the named attribute of object. name must be a string.
If the string is the name of one of the object’s attributes, the result is the
value of that attribute. For example, getattr(x,'foobar') is equivalent to
x.foobar. If the named attribute does not exist, default is returned if
provided, otherwise AttributeError is raised.
Return a dictionary representing the current global symbol table. This is always
the dictionary of the current module (inside a function or method, this is the
module where it is defined, not the module from which it is called).
The arguments are an object and a string. The result is True if the
string is the name of one of the object’s attributes, False if not. (This
is implemented by calling getattr(object,name) and seeing whether it
raises an AttributeError or not.)
Return the hash value of the object (if it has one). Hash values are integers.
They are used to quickly compare dictionary keys during a dictionary lookup.
Numeric values that compare equal have the same hash value (even if they are of
different types, as is the case for 1 and 1.0).
Invoke the built-in help system. (This function is intended for interactive
use.) If no argument is given, the interactive help system starts on the
interpreter console. If the argument is a string, then the string is looked up
as the name of a module, function, class, method, keyword, or documentation
topic, and a help page is printed on the console. If the argument is any other
kind of object, a help page on the object is generated.
This function is added to the built-in namespace by the site module.
Convert an integer number to a hexadecimal string. The result is a valid Python
expression. If x is not a Python int object, it has to define an
__index__() method that returns an integer.
Note
To obtain a hexadecimal string representation for a float, use the
float.hex() method.
Return the “identity” of an object. This is an integer which
is guaranteed to be unique and constant for this object during its lifetime.
Two objects with non-overlapping lifetimes may have the same id()
value.
CPython implementation detail: This is the address of the object in memory.
If the prompt argument is present, it is written to standard output without
a trailing newline. The function then reads a line from input, converts it
to a string (stripping a trailing newline), and returns that. When EOF is
read, EOFError is raised. Example:
>>> s = input('--> ')
--> Monty Python's Flying Circus
>>> s
"Monty Python's Flying Circus"
If the readline module was loaded, then input() will use it
to provide elaborate line editing and history features.
Convert a number or string to an integer. If no arguments are given, return
0. If a number is given, return number.__int__(). Conversion of
floating point numbers to integers truncates towards zero. A string must be
a base-radix integer literal optionally preceded by ‘+’ or ‘-‘ (with no space
in between) and optionally surrounded by whitespace. A base-n literal
consists of the digits 0 to n-1, with ‘a’ to ‘z’ (or ‘A’ to ‘Z’) having
values 10 to 35. The default base is 10. The allowed values are 0 and 2-36.
Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B,
0o/0O, or 0x/0X, as with integer literals in code. Base 0
means to interpret exactly as a code literal, so that the actual base is 2,
8, 10, or 16, and so that int('010',0) is not legal, while
int('010') is, as well as int('010',8).
Return true if the object argument is an instance of the classinfo
argument, or of a (direct or indirect) subclass thereof. If object is not
an object of the given type, the function always returns false. If
classinfo is not a class (type object), it may be a tuple of type objects,
or may recursively contain other such tuples (other sequence types are not
accepted). If classinfo is not a type or tuple of types and such tuples,
a TypeError exception is raised.
Return true if class is a subclass (direct or indirect) of classinfo. A
class is considered a subclass of itself. classinfo may be a tuple of class
objects, in which case every entry in classinfo will be checked. In any other
case, a TypeError exception is raised.
Return an iterator object. The first argument is interpreted very
differently depending on the presence of the second argument. Without a
second argument, object must be a collection object which supports the
iteration protocol (the __iter__() method), or it must support the
sequence protocol (the __getitem__() method with integer arguments
starting at 0). If it does not support either of those protocols,
TypeError is raised. If the second argument, sentinel, is given,
then object must be a callable object. The iterator created in this case
will call object with no arguments for each call to its __next__()
method; if the value returned is equal to sentinel, StopIteration
will be raised, otherwise the value will be returned.
One useful application of the second form of iter() is to read lines of
a file until a certain line is reached. The following example reads a file
until the readline() method returns an empty string:
with open('mydata.txt') as fp:
for line in iter(fp.readline, ''):
process_line(line)
Return a list whose items are the same and in the same order as iterable‘s
items. iterable may be either a sequence, a container that supports
iteration, or an iterator object. If iterable is already a list, a copy is
made and returned, similar to iterable[:]. For instance, list('abc')
returns ['a','b','c'] and list((1,2,3)) returns [1,2,3].
If no argument is given, returns a new empty list, [].
Update and return a dictionary representing the current local symbol table.
Free variables are returned by locals() when it is called in function
blocks, but not in class blocks.
Note
The contents of this dictionary should not be modified; changes may not
affect the values of local and free variables used by the interpreter.
Return an iterator that applies function to every item of iterable,
yielding the results. If additional iterable arguments are passed,
function must take that many arguments and is applied to the items from all
iterables in parallel. With multiple iterables, the iterator stops when the
shortest iterable is exhausted. For cases where the function inputs are
already arranged into argument tuples, see itertools.starmap().
With a single argument iterable, return the largest item of a non-empty
iterable (such as a string, tuple or list). With more than one argument, return
the largest of the arguments.
The optional keyword-only key argument specifies a one-argument ordering
function like that used for list.sort().
If multiple items are maximal, the function returns the first one
encountered. This is consistent with other sort-stability preserving tools
such as sorted(iterable,key=keyfunc,reverse=True)[0] and
heapq.nlargest(1,iterable,key=keyfunc).
memoryview(obj)
Return a “memory view” object created from the given argument. See
memoryview type for more information.
With a single argument iterable, return the smallest item of a non-empty
iterable (such as a string, tuple or list). With more than one argument, return
the smallest of the arguments.
The optional keyword-only key argument specifies a one-argument ordering
function like that used for list.sort().
If multiple items are minimal, the function returns the first one
encountered. This is consistent with other sort-stability preserving tools
such as sorted(iterable,key=keyfunc)[0] and heapq.nsmallest(1,iterable,key=keyfunc).
Retrieve the next item from the iterator by calling its __next__()
method. If default is given, it is returned if the iterator is exhausted,
otherwise StopIteration is raised.
Return a new featureless object. object is a base for all classes.
It has the methods that are common to all instances of Python classes. This
function does not accept any arguments.
Note
object does not have a __dict__, so you can’t assign
arbitrary attributes to an instance of the object class.
Convert an integer number to an octal string. The result is a valid Python
expression. If x is not a Python int object, it has to define an
__index__() method that returns an integer.
Open file and return a corresponding stream. If the file cannot be opened,
an IOError is raised.
file is either a string or bytes object giving the pathname (absolute or
relative to the current working directory) of the file to be opened or
an integer file descriptor of the file to be wrapped. (If a file descriptor
is given, it is closed when the returned I/O object is closed, unless
closefd is set to False.)
mode is an optional string that specifies the mode in which the file is
opened. It defaults to 'r' which means open for reading in text mode.
Other common values are 'w' for writing (truncating the file if it
already exists), and 'a' for appending (which on some Unix systems,
means that all writes append to the end of the file regardless of the
current seek position). In text mode, if encoding is not specified the
encoding used is platform dependent. (For reading and writing raw bytes use
binary mode and leave encoding unspecified.) The available modes are:
参数
作用
'r'
只读(默认)
'w'
只写,并且先将文件清空
'a'
只写,将写入的内容添加到已有内容的后面
'b'
二进制模式
't'
文本模式(默认)
'+'
open a disk file for updating (reading and writing)
'U'
通用换行模式(方便向后兼容;不推荐在新代码中使用)
The default mode is 'r' (open for reading text, synonym of 'rt').
For binary read-write access, the mode 'w+b' opens and truncates the file
to 0 bytes. 'r+b' opens the file without truncation.
As mentioned in the Overview, Python distinguishes between binary
and text I/O. Files opened in binary mode (including 'b' in the mode
argument) return contents as bytes objects without any decoding. In
text mode (the default, or when 't' is included in the mode argument),
the contents of the file are returned as str, the bytes having been
first decoded using a platform-dependent encoding or using the specified
encoding if given.
Note
Python doesn’t depend on the underlying operating system’s notion of text
files; all the the processing is done by Python itself, and is therefore
platform-independent.
buffering is an optional integer used to set the buffering policy. Pass 0
to switch buffering off (only allowed in binary mode), 1 to select line
buffering (only usable in text mode), and an integer > 1 to indicate the size
of a fixed-size chunk buffer. When no buffering argument is given, the
default buffering policy works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer is
chosen using a heuristic trying to determine the underlying device’s “block
size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems,
the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use
line buffering. Other text files use the policy described above for binary
files.
encoding is the name of the encoding used to decode or encode the file.
This should only be used in text mode. The default encoding is platform
dependent (whatever locale.getpreferredencoding() returns), but any
encoding supported by Python can be used. See the codecs module for
the list of supported encodings.
errors is an optional string that specifies how encoding and decoding
errors are to be handled–this cannot be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding
error (the default of None has the same effect), or pass 'ignore' to
ignore errors. (Note that ignoring encoding errors can lead to data loss.)
'replace' causes a replacement marker (such as '?') to be inserted
where there is malformed data. When writing, 'xmlcharrefreplace'
(replace with the appropriate XML character reference) or
'backslashreplace' (replace with backslashed escape sequences) can be
used. Any other error handling name that has been registered with
codecs.register_error() is also valid.
newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'. It
works as follows:
On input, if newline is None, universal newlines mode is enabled.
Lines in the input can end in '\n', '\r', or '\r\n', and these
are translated into '\n' before being returned to the caller. If it is
'', universal newline mode is enabled, but line endings are returned to
the caller untranslated. If it has any of the other legal values, input
lines are only terminated by the given string, and the line ending is
returned to the caller untranslated.
On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '', no translation takes place. If newline is any of
the other legal values, any '\n' characters written are translated to
the given string.
If closefd is False and a file descriptor rather than a filename was
given, the underlying file descriptor will be kept open when the file is
closed. If a filename is given closefd has no effect and must be True
(the default).
The type of file object returned by the open() function depends on the
mode. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a subclass of
io.TextIOBase (specifically io.TextIOWrapper). When used
to open a file in a binary mode with buffering, the returned class is a
subclass of io.BufferedIOBase. The exact class varies: in read
binary mode, it returns a io.BufferedReader; in write binary and
append binary modes, it returns a io.BufferedWriter, and in
read/write mode, it returns a io.BufferedRandom. When buffering is
disabled, the raw stream, a subclass of io.RawIOBase,
io.FileIO, is returned.
Given a string representing one Uncicode character, return an integer
representing the Unicode code
point of that character. For example, ord('a') returns the integer 97
and ord('\u2020') returns 8224. This is the inverse of chr().
On wide Unicode builds, if the argument length is not one, a
TypeError will be raised. On narrow Unicode builds, strings
of length two are accepted when they form a UTF-16 surrogate pair.
Return x to the power y; if z is present, return x to the power y,
modulo z (computed more efficiently than pow(x,y)%z). The two-argument
form pow(x,y) is equivalent to using the power operator: x**y.
The arguments must have numeric types. With mixed operand types, the
coercion rules for binary arithmetic operators apply. For int
operands, the result has the same type as the operands (after coercion)
unless the second argument is negative; in that case, all arguments are
converted to float and a float result is delivered. For example, 10**2
returns 100, but 10**-2 returns 0.01. If the second argument is
negative, the third argument must be omitted. If z is present, x and y
must be of integer types, and y must be non-negative.
Print object(s) to the stream file, separated by sep and followed by
end. sep, end and file, if present, must be given as keyword
arguments.
All non-keyword arguments are converted to strings like str() does and
written to the stream, separated by sep and followed by end. Both sep
and end must be strings; they can also be None, which means to use the
default values. If no object is given, print() will just write
end.
The file argument must be an object with a write(string) method; if it
is not present or None, sys.stdout will be used.
fget is a function for getting an attribute value, likewise fset is a
function for setting, and fdel a function for del’ing, an attribute. Typical
use is to define a managed attribute x:
class C:
def __init__(self):
self._x = None
def getx(self):
return self._x
def setx(self, value):
self._x = value
def delx(self):
del self._x
x = property(getx, setx, delx, "I'm the 'x' property.")
If then c is an instance of C, c.x will invoke the getter,
c.x=value will invoke the setter and delc.x the deleter.
If given, doc will be the docstring of the property attribute. Otherwise, the
property will copy fget‘s docstring (if it exists). This makes it possible to
create read-only properties easily using property() as a decorator:
class Parrot:
def __init__(self):
self._voltage = 100000
@property
def voltage(self):
"""Get the current voltage."""
return self._voltage
turns the voltage() method into a “getter” for a read-only attribute
with the same name.
A property object has getter, setter, and deleter
methods usable as decorators that create a copy of the property with the
corresponding accessor function set to the decorated function. This is
best explained with an example:
class C:
def __init__(self):
self._x = None
@property
def x(self):
"""I'm the 'x' property."""
return self._x
@x.setter
def x(self, value):
self._x = value
@x.deleter
def x(self):
del self._x
This code is exactly equivalent to the first example. Be sure to give the
additional functions the same name as the original property (x in this
case.)
The returned property also has the attributes fget, fset, and
fdel corresponding to the constructor arguments.
This is a versatile function to create iterables yielding arithmetic
progressions. It is most often used in for loops. The arguments
must be integers. If the step argument is omitted, it defaults to 1.
If the start argument is omitted, it defaults to 0. The full form
returns an iterable of integers [start,start+step,start+2*step,...]. If step is positive, the last element is the largest start+i*step less than stop; if step is negative, the last element is the
smallest start+i*step greater than stop. step must not be zero
(or else ValueError is raised). Example:
Range objects implement the collections.Sequence ABC, and provide
features such as containment tests, element index lookup, slicing and
support for negative indices:
>>> r = range(0, 20, 2)
>>> r
range(0, 20, 2)
>>> 11 in r
False
>>> 10 in r
True
>>> r.index(10)
5
>>> r[5]
10
>>> r[:5]
range(0, 10, 2)
>>> r[-1]
18
Ranges containing absolute values larger than sys.maxsize are permitted
but some features (such as len()) will raise OverflowError.
Changed in version 3.2:
Changed in version 3.2: Implement the Sequence ABC.
Support slicing and negative indices.
Test integers for membership in constant time instead of iterating
through all items.
Return a string containing a printable representation of an object. For many
types, this function makes an attempt to return a string that would yield an
object with the same value when passed to eval(), otherwise the
representation is a string enclosed in angle brackets that contains the name
of the type of the object together with additional information often
including the name and address of the object. A class can control what this
function returns for its instances by defining a __repr__() method.
Return a reverse iterator. seq must be an object which has
a __reversed__() method or supports the sequence protocol (the
__len__() method and the __getitem__() method with integer
arguments starting at 0).
Return the floating point value x rounded to n digits after the decimal
point. If n is omitted, it defaults to zero. Delegates to
x.__round__(n).
For the built-in types supporting round(), values are rounded to the
closest multiple of 10 to the power minus n; if two multiples are equally
close, rounding is done toward the even choice (so, for example, both
round(0.5) and round(-0.5) are 0, and round(1.5) is 2).
The return value is an integer if called with one argument, otherwise of the
same type as x.
Note
The behavior of round() for floats can be surprising: for example,
round(2.675,2) gives 2.67 instead of the expected 2.68.
This is not a bug: it’s a result of the fact that most decimal fractions
can’t be represented exactly as a float. See 浮点算术: 问题和限制 for
more information.
set([iterable])
Return a new set, optionally with elements taken from iterable.
The set type is described in Set Types — set, frozenset.
This is the counterpart of getattr(). The arguments are an object, a
string and an arbitrary value. The string may name an existing attribute or a
new attribute. The function assigns the value to the attribute, provided the
object allows it. For example, setattr(x,'foobar',123) is equivalent to
x.foobar=123.
Return a slice object representing the set of indices specified by
range(start,stop,step). The start and step arguments default to
None. Slice objects have read-only data attributes start,
stop and step which merely return the argument values (or their
default). They have no other explicit functionality; however they are used by
Numerical Python and other third party extensions. Slice objects are also
generated when extended indexing syntax is used. For example:
a[start:stop:step] or a[start:stop,i]. See itertools.islice()
for an alternate version that returns an iterator.
Return a new sorted list from the items in iterable.
Has two optional arguments which must be specified as keyword arguments.
key specifies a function of one argument that is used to extract a comparison
key from each list element: key=str.lower. The default value is None
(compare the elements directly).
reverse is a boolean value. If set to True, then the list elements are
sorted as if each comparison were reversed.
A static method does not receive an implicit first argument. To declare a static
method, use this idiom:
class C:
@staticmethod
def f(arg1, arg2, ...): ...
The @staticmethod form is a function decorator – see the
description of function definitions in 函数定义 for details.
It can be called either on the class (such as C.f()) or on an instance (such
as C().f()). The instance is ignored except for its class.
Static methods in Python are similar to those found in Java or C++. Also see
classmethod() for a variant that is useful for creating alternate class
constructors.
For more information on static methods, consult the documentation on the
standard type hierarchy in 标准类型层次.
Return a string version of an object, using one of the following modes:
If encoding and/or errors are given, str() will decode the
object which can either be a byte string or a character buffer using
the codec for encoding. The encoding parameter is a string giving
the name of an encoding; if the encoding is not known, LookupError
is raised. Error handling is done according to errors; this specifies the
treatment of characters which are invalid in the input encoding. If
errors is 'strict' (the default), a ValueError is raised on
errors, while a value of 'ignore' causes errors to be silently ignored,
and a value of 'replace' causes the official Unicode replacement character,
U+FFFD, to be used to replace input characters which cannot be decoded.
See also the codecs module.
When only object is given, this returns its nicely printable representation.
For strings, this is the string itself. The difference with repr(object)
is that str(object) does not always attempt to return a string that is
acceptable to eval(); its goal is to return a printable string.
With no arguments, this returns the empty string.
Objects can specify what str(object) returns by defining a __str__()
special method.
Sums start and the items of an iterable from left to right and returns the
total. start defaults to 0. The iterable‘s items are normally numbers,
and the start value is not allowed to be a string.
For some use cases, there are good alternatives to sum().
The preferred, fast way to concatenate a sequence of strings is by calling
''.join(sequence). To add floating point values with extended precision,
see math.fsum(). To concatenate a series of iterables, consider using
itertools.chain().
Return a proxy object that delegates method calls to a parent or sibling
class of type. This is useful for accessing inherited methods that have
been overridden in a class. The search order is same as that used by
getattr() except that the type itself is skipped.
The __mro__ attribute of the type lists the method resolution
search order used by both getattr() and super(). The attribute
is dynamic and can change whenever the inheritance hierarchy is updated.
If the second argument is omitted, the super object returned is unbound. If
the second argument is an object, isinstance(obj,type) must be true. If
the second argument is a type, issubclass(type2,type) must be true (this
is useful for classmethods).
There are two typical use cases for super. In a class hierarchy with
single inheritance, super can be used to refer to parent classes without
naming them explicitly, thus making the code more maintainable. This use
closely parallels the use of super in other programming languages.
The second use case is to support cooperative multiple inheritance in a
dynamic execution environment. This use case is unique to Python and is
not found in statically compiled languages or languages that only support
single inheritance. This makes it possible to implement “diamond diagrams”
where multiple base classes implement the same method. Good design dictates
that this method have the same calling signature in every case (because the
order of calls is determined at runtime, because that order adapts
to changes in the class hierarchy, and because that order can include
sibling classes that are unknown prior to runtime).
For both use cases, a typical superclass call looks like this:
class C(B):
def method(self, arg):
super().method(arg) # This does the same thing as:
# super(C, self).method(arg)
Note that super() is implemented as part of the binding process for
explicit dotted attribute lookups such as super().__getitem__(name).
It does so by implementing its own __getattribute__() method for searching
classes in a predictable order that supports cooperative multiple inheritance.
Accordingly, super() is undefined for implicit lookups using statements or
operators such as super()[name].
Also note that super() is not limited to use inside methods. The two
argument form specifies the arguments exactly and makes the appropriate
references. The zero argument form automatically searches the stack frame
for the class (__class__) and the first argument.
Return a tuple whose items are the same and in the same order as iterable‘s
items. iterable may be a sequence, a container that supports iteration, or an
iterator object. If iterable is already a tuple, it is returned unchanged.
For instance, tuple('abc') returns ('a','b','c') and tuple([1,2,3]) returns (1,2,3). If no argument is given, returns a new empty
tuple, ().
Return the type of an object. The return value is a type object and
generally the same object as returned by object.__class__.
The isinstance() built-in function is recommended for testing the type
of an object, because it takes subclasses into account.
With three arguments, type() functions as a constructor as detailed
below.
type(name, bases, dict)
Return a new type object. This is essentially a dynamic form of the
class statement. The name string is the class name and becomes the
__name__ attribute; the bases tuple itemizes the base classes and
becomes the __bases__ attribute; and the dict dictionary is the
namespace containing definitions for class body and becomes the __dict__
attribute. For example, the following two statements create identical
type objects:
>>> class X:
... a = 1
...
>>> X = type('X', (object,), dict(a=1))
Make an iterator that aggregates elements from each of the iterables.
Returns an iterator of tuples, where the i-th tuple contains
the i-th element from each of the argument sequences or iterables. The
iterator stops when the shortest input iterable is exhausted. With a single
iterable argument, it returns an iterator of 1-tuples. With no arguments,
it returns an empty iterator. Equivalent to:
def zip(*iterables):
# zip('ABCD', 'xy') --> Ax By
sentinel = object()
iterables = [iter(it) for it in iterables]
while iterables:
result = []
for it in iterables:
elem = next(it, sentinel)
if elem is sentinel:
return
result.append(elem)
yield tuple(result)
The left-to-right evaluation order of the iterables is guaranteed. This
makes possible an idiom for clustering a data series into n-length groups
using zip(*[iter(s)]*n).
zip() should only be used with unequal length inputs when you don’t
care about trailing, unmatched values from the longer iterables. If those
values are important, use itertools.zip_longest() instead.
zip() in conjunction with the * operator can be used to unzip a
list:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zip(x, y))
>>> x == list(x2) and y == list(y2)
True
This is an advanced function that is not needed in everyday Python
programming, unlike importlib.import_module().
This function is invoked by the import statement. It can be
replaced (by importing the builtins module and assigning to
builtins.__import__) in order to change semantics of the
import statement, but nowadays it is usually simpler to use import
hooks (see PEP 302). Direct use of __import__() is rare, except in
cases where you want to import a module whose name is only known at runtime.
The function imports the module name, potentially using the given globals
and locals to determine how to interpret the name in a package context.
The fromlist gives the names of objects or submodules that should be
imported from the module given by name. The standard implementation does
not use its locals argument at all, and uses its globals only to
determine the package context of the import statement.
level specifies whether to use absolute or relative imports. 0 (the
default) means only perform absolute imports. Positive values for
level indicate the number of parent directories to search relative to the
directory of the module calling __import__().
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not the
module named by name. However, when a non-empty fromlist argument is
given, the module named by name is returned.
For example, the statement importspam results in bytecode resembling the
following code:
Note that the parser only accepts the Unix-style end of line convention.
If you are reading the code from a file, make sure to use newline conversion
mode to convert Windows or Mac-style newlines.
In the current implementation, local variable bindings cannot normally be
affected this way, but variables retrieved from other scopes (such as modules)
can be. This may change.
The sole value of types.NoneType. None is frequently used to
represent the absence of a value, as when default arguments are not passed to a
function. Assignments to None are illegal and raise a SyntaxError.
Special value which can be returned by the “rich comparison” special methods
(__eq__(), __lt__(), and friends), to indicate that the comparison
is not implemented with respect to the other type.
This constant is true if Python was not started with an -O option.
See also the assert statement.
Note
The names None, False, True and __debug__
cannot be reassigned (assignments to them, even as an attribute name, raise
SyntaxError), so they can be considered “true” constants.
The site module (which is imported automatically during startup, except
if the -S command-line option is given) adds several constants to the
built-in namespace. They are useful for the interactive interpreter shell and
should not be used in programs.
Objects that when printed, print a message like “Use quit() or Ctrl-D
(i.e. EOF) to exit”, and when called, raise SystemExit with the
specified exit code.
Objects that when printed, print a message like “Type license() to see the
full license text”, and when called, display the corresponding text in a
pager-like fashion (one screen at a time).
The following sections describe the standard types that are built into the
interpreter.
The principal built-in types are numerics, sequences, mappings, classes,
instances and exceptions.
Some operations are supported by several object types; in particular,
practically all objects can be compared, tested for truth value, and converted
to a string (with the repr() function or the slightly different
str() function). The latter function is implicitly used when an object is
written by the print() function.
Any object can be tested for truth value, for use in an if or
while condition or as operand of the Boolean operations below. The
following values are considered false:
None
False
zero of any numeric type, for example, 0, 0.0, 0j.
any empty sequence, for example, '', (), [].
any empty mapping, for example, {}.
instances of user-defined classes, if the class defines a __bool__() or
__len__() method, when that method returns the integer zero or
bool value False. [1]
All other values are considered true — so objects of many types are always
true.
Operations and built-in functions that have a Boolean result always return 0
or False for false and 1 or True for true, unless otherwise stated.
(Important exception: the Boolean operations or and and always return
one of their operands.)
There are eight comparison operations in Python. They all have the same
priority (which is higher than that of the Boolean operations). Comparisons can
be chained arbitrarily; for example, x<y<=z is equivalent to x<yandy<=z, except that y is evaluated only once (but in both cases z is not
evaluated at all when x<y is found to be false).
This table summarizes the comparison operations:
Operation
Meaning
<
strictly less than
<=
less than or equal
>
strictly greater than
>=
greater than or equal
==
equal
!=
not equal
is
object identity
isnot
negated object identity
Objects of different types, except different numeric types, never compare equal.
Furthermore, some types (for example, function objects) support only a degenerate
notion of comparison where any two objects of that type are unequal. The <,
<=, > and >= operators will raise a TypeError exception when
comparing a complex number with another built-in numeric type, when the objects
are of different types that cannot be compared, or in other cases where there is
no defined ordering.
Non-identical instances of a class normally compare as non-equal unless the
class defines the __eq__() method.
Instances of a class cannot be ordered with respect to other instances of the
same class, or other types of object, unless the class defines enough of the
methods __lt__(), __le__(), __gt__(), and __ge__() (in
general, __lt__() and __eq__() are sufficient, if you want the
conventional meanings of the comparison operators).
The behavior of the is and isnot operators cannot be
customized; also they can be applied to any two objects and never raise an
exception.
Two more operations with the same syntactic priority, in and
notin, are supported only by sequence types (below).
There are three distinct numeric types: integers, floating
point numbers, and complex numbers. In addition, Booleans are a
subtype of integers. Integers have unlimited precision. Floating point
numbers are usually implemented using double in C; information
about the precision and internal representation of floating point
numbers for the machine on which your program is running is available
in sys.float_info. Complex numbers have a real and imaginary
part, which are each a floating point number. To extract these parts
from a complex number z, use z.real and z.imag. (The standard
library includes additional numeric types, fractions that hold
rationals, and decimal that hold floating-point numbers with
user-definable precision.)
Numbers are created by numeric literals or as the result of built-in functions
and operators. Unadorned integer literals (including hex, octal and binary
numbers) yield integers. Numeric literals containing a decimal point or an
exponent sign yield floating point numbers. Appending 'j' or 'J' to a
numeric literal yields an imaginary number (a complex number with a zero real
part) which you can add to an integer or float to get a complex number with real
and imaginary parts.
Python fully supports mixed arithmetic: when a binary arithmetic operator has
operands of different numeric types, the operand with the “narrower” type is
widened to that of the other, where integer is narrower than floating point,
which is narrower than complex. Comparisons between numbers of mixed type use
the same rule. [2] The constructors int(), float(), and
complex() can be used to produce numbers of a specific type.
All numeric types (except complex) support the following operations, sorted by
ascending priority (operations in the same box have the same priority; all
numeric operations have a higher priority than comparison operations):
Also referred to as integer division. The resultant value is a whole
integer, though the result’s type is not necessarily int. The result is
always rounded towards minus infinity: 1//2 is 0, (-1)//2 is
-1, 1//(-2) is -1, and (-1)//(-2) is 0.
Not for complex numbers. Instead convert to floats using abs() if
appropriate.
Conversion from floating point to integer may round or truncate
as in C; see functions floor() and ceil() in the math module
for well-defined conversions.
float also accepts the strings “nan” and “inf” with an optional prefix “+”
or “-” for Not a Number (NaN) and positive or negative infinity.
Python defines pow(0,0) and 0**0 to be 1, as is common for
programming languages.
The numeric literals accepted include the digits 0 to 9 or any
Unicode equivalent (code points with the Nd property).
Integers support additional operations that make sense only for bit-strings.
Negative numbers are treated as their 2’s complement value (this assumes a
sufficiently large number of bits that no overflow occurs during the operation).
The priorities of the binary bitwise operations are all lower than the numeric
operations and higher than the comparisons; the unary operation ~ has the
same priority as the other unary numeric operations (+ and -).
This table lists the bit-string operations sorted in ascending priority
(operations in the same box have the same priority):
Operation
Result
Notes
x|y
bitwise or of x and
y
x^y
bitwise exclusive or of
x and y
x&y
bitwise and of x and
y
x<<n
x shifted left by n bits
(1)(2)
x>>n
x shifted right by n bits
(1)(3)
~x
the bits of x inverted
Notes:
Negative shift counts are illegal and cause a ValueError to be raised.
A left shift by n bits is equivalent to multiplication by pow(2,n)
without overflow check.
A right shift by n bits is equivalent to division by pow(2,n) without
overflow check.
Return the number of bits necessary to represent an integer in binary,
excluding the sign and leading zeros:
>>> n = -37
>>> bin(n)
'-0b100101'
>>> n.bit_length()
6
More precisely, if x is nonzero, then x.bit_length() is the
unique positive integer k such that 2**(k-1)<=abs(x)<2**k.
Equivalently, when abs(x) is small enough to have a correctly
rounded logarithm, then k=1+int(log(abs(x),2)).
If x is zero, then x.bit_length() returns 0.
Equivalent to:
def bit_length(self):
s = bin(self) # binary representation: bin(-37) --> '-0b100101'
s = s.lstrip('-0b') # remove leading zeros and minus sign
return len(s) # len('100101') --> 6
The integer is represented using length bytes. An OverflowError
is raised if the integer is not representable with the given number of
bytes.
The byteorder argument determines the byte order used to represent the
integer. If byteorder is "big", the most significant byte is at the
beginning of the byte array. If byteorder is "little", the most
significant byte is at the end of the byte array. To request the native
byte order of the host system, use sys.byteorder as the byte order
value.
The signed argument determines whether two’s complement is used to
represent the integer. If signed is False and a negative integer is
given, an OverflowError is raised. The default value for signed
is False.
The argument bytes must either support the buffer protocol or be an
iterable producing bytes. bytes and bytearray are
examples of built-in objects that support the buffer protocol.
The byteorder argument determines the byte order used to represent the
integer. If byteorder is "big", the most significant byte is at the
beginning of the byte array. If byteorder is "little", the most
significant byte is at the end of the byte array. To request the native
byte order of the host system, use sys.byteorder as the byte order
value.
The signed argument indicates whether two’s complement is used to
represent the integer.
Return a pair of integers whose ratio is exactly equal to the
original float and with a positive denominator. Raises
OverflowError on infinities and a ValueError on
NaNs.
Two methods support conversion to
and from hexadecimal strings. Since Python’s floats are stored
internally as binary numbers, converting a float to or from a
decimal string usually involves a small rounding error. In
contrast, hexadecimal strings allow exact representation and
specification of floating-point numbers. This can be useful when
debugging, and in numerical work.
Return a representation of a floating-point number as a hexadecimal
string. For finite floating-point numbers, this representation
will always include a leading 0x and a trailing p and
exponent.
where the optional sign may by either + or -, integer
and fraction are strings of hexadecimal digits, and exponent
is a decimal integer with an optional leading sign. Case is not
significant, and there must be at least one hexadecimal digit in
either the integer or the fraction. This syntax is similar to the
syntax specified in section 6.4.4.2 of the C99 standard, and also to
the syntax used in Java 1.5 onwards. In particular, the output of
float.hex() is usable as a hexadecimal floating-point literal in
C or Java code, and hexadecimal strings produced by C’s %a format
character or Java’s Double.toHexString are accepted by
float.fromhex().
Note that the exponent is written in decimal rather than hexadecimal,
and that it gives the power of 2 by which to multiply the coefficient.
For example, the hexadecimal string 0x3.a7p10 represents the
floating-point number (3+10./16+7./16**2)*2.0**10, or
3740.0:
>>> float.fromhex('0x3.a7p10')
3740.0
Applying the reverse conversion to 3740.0 gives a different
hexadecimal string representing the same number:
For numbers x and y, possibly of different types, it’s a requirement
that hash(x)==hash(y) whenever x==y (see the __hash__()
method documentation for more details). For ease of implementation and
efficiency across a variety of numeric types (including int,
float, decimal.Decimal and fractions.Fraction)
Python’s hash for numeric types is based on a single mathematical function
that’s defined for any rational number, and hence applies to all instances of
int and fraction.Fraction, and all finite instances of
float and decimal.Decimal. Essentially, this function is
given by reduction modulo P for a fixed prime P. The value of P is
made available to Python as the modulus attribute of
sys.hash_info.
CPython implementation detail: Currently, the prime used is P=2**31-1 on machines with 32-bit C
longs and P=2**61-1 on machines with 64-bit C longs.
Here are the rules in detail:
If x=m/n is a nonnegative rational number and n is not divisible
by P, define hash(x) as m*invmod(n,P)%P, where invmod(n,P) gives the inverse of n modulo P.
If x=m/n is a nonnegative rational number and n is
divisible by P (but m is not) then n has no inverse
modulo P and the rule above doesn’t apply; in this case define
hash(x) to be the constant value sys.hash_info.inf.
If x=m/n is a negative rational number define hash(x)
as -hash(-x). If the resulting hash is -1, replace it with
-2.
The particular values sys.hash_info.inf, -sys.hash_info.inf
and sys.hash_info.nan are used as hash values for positive
infinity, negative infinity, or nans (respectively). (All hashable
nans have the same hash value.)
For a complex number z, the hash values of the real
and imaginary parts are combined by computing hash(z.real)+sys.hash_info.imag*hash(z.imag), reduced modulo
2**sys.hash_info.width so that it lies in
range(-2**(sys.hash_info.width-1),2**(sys.hash_info.width-1)). Again, if the result is -1, it’s replaced with -2.
To clarify the above rules, here’s some example Python code,
equivalent to the builtin hash, for computing the hash of a rational
number, float, or complex:
import sys, math
def hash_fraction(m, n):
"""Compute the hash of a rational number m / n.
Assumes m and n are integers, with n positive.
Equivalent to hash(fractions.Fraction(m, n)).
"""
P = sys.hash_info.modulus
# Remove common factors of P. (Unnecessary if m and n already coprime.)
while m % P == n % P == 0:
m, n = m // P, n // P
if n % P == 0:
hash_ = sys.hash_info.inf
else:
# Fermat's Little Theorem: pow(n, P-1, P) is 1, so
# pow(n, P-2, P) gives the inverse of n modulo P.
hash_ = (abs(m) % P) * pow(n, P - 2, P) % P
if m < 0:
hash_ = -hash_
if hash_ == -1:
hash_ = -2
return hash_
def hash_float(x):
"""Compute the hash of a float x."""
if math.isnan(x):
return sys.hash_info.nan
elif math.isinf(x):
return sys.hash_info.inf if x > 0 else -sys.hash_info.inf
else:
return hash_fraction(*x.as_integer_ratio())
def hash_complex(z):
"""Compute the hash of a complex number z."""
hash_ = hash_float(z.real) + sys.hash_info.imag * hash_float(z.imag)
# do a signed reduction modulo 2**sys.hash_info.width
M = 2**(sys.hash_info.width - 1)
hash_ = (hash_ & (M - 1)) - (hash & M)
if hash_ == -1:
hash_ == -2
return hash_
Python supports a concept of iteration over containers. This is implemented
using two distinct methods; these are used to allow user-defined classes to
support iteration. Sequences, described below in more detail, always support
the iteration methods.
One method needs to be defined for container objects to provide iteration
support:
Return an iterator object. The object is required to support the iterator
protocol described below. If a container supports different types of
iteration, additional methods can be provided to specifically request
iterators for those iteration types. (An example of an object supporting
multiple forms of iteration would be a tree structure which supports both
breadth-first and depth-first traversal.) This method corresponds to the
tp_iter slot of the type structure for Python objects in the Python/C
API.
The iterator objects themselves are required to support the following two
methods, which together form the iterator protocol:
Return the iterator object itself. This is required to allow both containers
and iterators to be used with the for and in statements.
This method corresponds to the tp_iter slot of the type structure for
Python objects in the Python/C API.
Return the next item from the container. If there are no further items, raise
the StopIteration exception. This method corresponds to the
tp_iternext slot of the type structure for Python objects in the
Python/C API.
Python defines several iterator objects to support iteration over general and
specific sequence types, dictionaries, and other more specialized forms. The
specific types are not important beyond their implementation of the iterator
protocol.
Once an iterator’s __next__() method raises StopIteration, it must
continue to do so on subsequent calls. Implementations that do not obey this
property are deemed broken.
Python’s generators provide a convenient way to implement the iterator
protocol. If a container object’s __iter__() method is implemented as a
generator, it will automatically return an iterator object (technically, a
generator object) supplying the __iter__() and __next__() methods.
More information about generators can be found in the documentation for
the yield expression.
There are six sequence types: strings, byte sequences (bytes objects),
byte arrays (bytearray objects), lists, tuples, and range objects. For
other containers see the built in dict and set classes, and
the collections module.
Strings contain Unicode characters. Their literals are written in single or
double quotes: 'xyzzy', "frobozz". See 字符串与字节的字面值 for more about
string literals. In addition to the functionality described here, there are
also string-specific methods described in the String Methods section.
Bytes and bytearray objects contain single bytes – the former is immutable
while the latter is a mutable sequence. Bytes objects can be constructed the
constructor, bytes(), and from literals; use a b prefix with normal
string syntax: b'xyzzy'. To construct byte arrays, use the
bytearray() function.
While string objects are sequences of characters (represented by strings of
length 1), bytes and bytearray objects are sequences of integers (between 0
and 255), representing the ASCII value of single bytes. That means that for
a bytes or bytearray object b, b[0] will be an integer, while
b[0:1] will be a bytes or bytearray object of length 1. The
representation of bytes objects uses the literal format (b'...') since it
is generally more useful than e.g. bytes([50,19,100]). You can always
convert a bytes object into a list of integers using list(b).
Also, while in previous Python versions, byte strings and Unicode strings
could be exchanged for each other rather freely (barring encoding issues),
strings and bytes are now completely separate concepts. There’s no implicit
en-/decoding if you pass an object of the wrong type. A string always
compares unequal to a bytes or bytearray object.
Lists are constructed with square brackets, separating items with commas: [a,b,c]. Tuples are constructed by the comma operator (not within square
brackets), with or without enclosing parentheses, but an empty tuple must have
the enclosing parentheses, such as a,b,c or (). A single item tuple
must have a trailing comma, such as (d,).
Objects of type range are created using the range() function. They don’t
support concatenation or repetition, and using min() or max() on
them is inefficient.
Most sequence types support the following operations. The in and notin
operations have the same priorities as the comparison operations. The + and
* operations have the same priority as the corresponding numeric operations.
[3] Additional methods are provided for Mutable Sequence Types.
This table lists the sequence operations sorted in ascending priority
(operations in the same box have the same priority). In the table, s and t
are sequences of the same type; n, i, j and k are integers.
Operation
Result
Notes
xins
True if an item of s is
equal to x, else False
(1)
xnotins
False if an item of s is
equal to x, else True
(1)
s+t
the concatenation of s and
t
(6)
s*n,n*s
n shallow copies of s
concatenated
(2)
s[i]
i‘th item of s, origin 0
(3)
s[i:j]
slice of s from i to j
(3)(4)
s[i:j:k]
slice of s from i to j
with step k
(3)(5)
len(s)
length of s
min(s)
smallest item of s
max(s)
largest item of s
s.index(i)
index of the first occurence
of i in s
s.count(i)
total number of occurences of
i in s
Sequence types also support comparisons. In particular, tuples and lists are
compared lexicographically by comparing corresponding elements. This means that
to compare equal, every element must compare equal and the two sequences must be
of the same type and have the same length. (For full details see
Comparisons in the language reference.)
Notes:
When s is a string object, the in and notin operations act like a
substring test.
Values of n less than 0 are treated as 0 (which yields an empty
sequence of the same type as s). Note also that the copies are shallow;
nested structures are not copied. This often haunts new Python programmers;
consider:
What has happened is that [[]] is a one-element list containing an empty
list, so all three elements of [[]]*3 are (pointers to) this single empty
list. Modifying any of the elements of lists modifies this single list.
You can create a list of different lists this way:
>>> lists = [[] for i in range(3)]
>>> lists[0].append(3)
>>> lists[1].append(5)
>>> lists[2].append(7)
>>> lists
[[3], [5], [7]]
If i or j is negative, the index is relative to the end of the string:
len(s)+i or len(s)+j is substituted. But note that -0 is
still 0.
The slice of s from i to j is defined as the sequence of items with index
k such that i<=k<j. If i or j is greater than len(s), use
len(s). If i is omitted or None, use 0. If j is omitted or
None, use len(s). If i is greater than or equal to j, the slice is
empty.
The slice of s from i to j with step k is defined as the sequence of
items with index x=i+n*k such that 0<=n<(j-i)/k. In other words,
the indices are i, i+k, i+2*k, i+3*k and so on, stopping when
j is reached (but never including j). If i or j is greater than
len(s), use len(s). If i or j are omitted or None, they become
“end” values (which end depends on the sign of k). Note, k cannot be zero.
If k is None, it is treated like 1.
CPython implementation detail: If s and t are both strings, some Python implementations such as
CPython can usually perform an in-place optimization for assignments of
the form s=s+t or s+=t. When applicable, this optimization
makes quadratic run-time much less likely. This optimization is both
version and implementation dependent. For performance sensitive code, it
is preferable to use the str.join() method which assures consistent
linear concatenation performance across versions and implementations.
Return the number of non-overlapping occurrences of substring sub in the
range [start, end]. Optional arguments start and end are
interpreted as in slice notation.
Return an encoded version of the string as a bytes object. Default encoding
is 'utf-8'. errors may be given to set a different error handling scheme.
The default for errors is 'strict', meaning that encoding errors raise
a UnicodeError. Other possible
values are 'ignore', 'replace', 'xmlcharrefreplace',
'backslashreplace' and any other name registered via
codecs.register_error(), see section Codec Base Classes. For a
list of possible encodings, see section Standard Encodings.
Changed in version 3.1:
Changed in version 3.1: Support for keyword arguments added.
Return True if the string ends with the specified suffix, otherwise return
False. suffix can also be a tuple of suffixes to look for. With optional
start, test beginning at that position. With optional end, stop comparing
at that position.
Return a copy of the string where all tab characters are replaced by one or
more spaces, depending on the current column and the given tab size. The
column number is reset to zero after each newline occurring in the string.
If tabsize is not given, a tab size of 8 characters is assumed. This
doesn’t understand other non-printing characters or escape sequences.
Return the lowest index in the string where substring sub is found, such
that sub is contained in the slice s[start:end]. Optional arguments
start and end are interpreted as in slice notation. Return -1 if
sub is not found.
Note
The find() method should be used only if you need to know the
position of sub. To check if sub is a substring or not, use the
in operator:
Perform a string formatting operation. The string on which this method is
called can contain literal text or replacement fields delimited by braces
{}. Each replacement field contains either the numeric index of a
positional argument, or the name of a keyword argument. Returns a copy of
the string where each replacement field is replaced with the string value of
the corresponding argument.
>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'
See Format String Syntax for a description of the various formatting options
that can be specified in format strings.
Similar to str.format(**mapping), except that mapping is
used directly and not copied to a dict . This is useful
if for example mapping is a dict subclass:
>>> class Default(dict):
... def __missing__(self, key):
... return key
...
>>> '{name} was born in {country}'.format_map(Default(name='Guido'))
'Guido was born in country'
Return true if all characters in the string are alphanumeric and there is at
least one character, false otherwise. A character c is alphanumeric if one
of the following returns True: c.isalpha(), c.isdecimal(),
c.isdigit(), or c.isnumeric().
Return true if all characters in the string are alphabetic and there is at least
one character, false otherwise. Alphabetic characters are those characters defined
in the Unicode character database as “Letter”, i.e., those with general category
property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different
from the “Alphabetic” property defined in the Unicode Standard.
Return true if all characters in the string are decimal
characters and there is at least one character, false
otherwise. Decimal characters are those from general category “Nd”. This category
includes digit characters, and all characters
that that can be used to form decimal-radix numbers, e.g. U+0660,
ARABIC-INDIC DIGIT ZERO.
Return true if all characters in the string are digits and there is at least one
character, false otherwise. Digits include decimal characters and digits that need
special handling, such as the compatibility superscript digits. Formally, a digit
is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
Return true if all cased characters in the string are lowercase and there is at
least one cased character, false otherwise. Cased characters are those with
general category property being one of “Lu”, “Ll”, or “Lt” and lowercase characters
are those with general category property “Ll”.
Return true if all characters in the string are numeric
characters, and there is at least one character, false
otherwise. Numeric characters include digit characters, and all characters
that have the Unicode numeric value property, e.g. U+2155,
VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property
value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.
Return true if all characters in the string are printable or the string is
empty, false otherwise. Nonprintable characters are those characters defined
in the Unicode character database as “Other” or “Separator”, excepting the
ASCII space (0x20) which is considered printable. (Note that printable
characters in this context are those which should not be escaped when
repr() is invoked on a string. It has no bearing on the handling of
strings written to sys.stdout or sys.stderr.)
Return true if there are only whitespace characters in the string and there is
at least one character, false otherwise. Whitespace characters are those
characters defined in the Unicode character database as “Other” or “Separator”
and those with bidirectional property being one of “WS”, “B”, or “S”.
Return true if the string is a titlecased string and there is at least one
character, for example uppercase characters may only follow uncased characters
and lowercase characters only cased ones. Return false otherwise.
Return true if all cased characters in the string are uppercase and there is at
least one cased character, false otherwise. Cased characters are those with
general category property being one of “Lu”, “Ll”, or “Lt” and uppercase characters
are those with general category property “Lu”.
Return a string which is the concatenation of the strings in the
iterableiterable. A TypeError will be raised if there are
any non-string values in seq, including bytes objects. The
separator between elements is the string providing this method.
Return the string left justified in a string of length width. Padding is done
using the specified fillchar (default is a space). The original string is
returned if width is less than len(s).
Return a copy of the string with leading characters removed. The chars
argument is a string specifying the set of characters to be removed. If omitted
or None, the chars argument defaults to removing whitespace. The chars
argument is not a prefix; rather, all combinations of its values are stripped:
This static method returns a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode
ordinals (integers) or characters (strings of length 1) to Unicode ordinals,
strings (of arbitrary lengths) or None. Character keys will then be
converted to ordinals.
If there are two arguments, they must be strings of equal length, and in the
resulting dictionary, each character in x will be mapped to the character at
the same position in y. If there is a third argument, it must be a string,
whose characters will be mapped to None in the result.
Split the string at the first occurrence of sep, and return a 3-tuple
containing the part before the separator, the separator itself, and the part
after the separator. If the separator is not found, return a 3-tuple containing
the string itself, followed by two empty strings.
Return a copy of the string with all occurrences of substring old replaced by
new. If the optional argument count is given, only the first count
occurrences are replaced.
Return the highest index in the string where substring sub is found, such
that sub is contained within s[start:end]. Optional arguments start
and end are interpreted as in slice notation. Return -1 on failure.
Return the string right justified in a string of length width. Padding is done
using the specified fillchar (default is a space). The original string is
returned if width is less than len(s).
Split the string at the last occurrence of sep, and return a 3-tuple
containing the part before the separator, the separator itself, and the part
after the separator. If the separator is not found, return a 3-tuple containing
two empty strings, followed by the string itself.
Return a list of the words in the string, using sep as the delimiter string.
If maxsplit is given, at most maxsplit splits are done, the rightmost
ones. If sep is not specified or None, any whitespace string is a
separator. Except for splitting from the right, rsplit() behaves like
split() which is described in detail below.
Return a copy of the string with trailing characters removed. The chars
argument is a string specifying the set of characters to be removed. If omitted
or None, the chars argument defaults to removing whitespace. The chars
argument is not a suffix; rather, all combinations of its values are stripped:
Return a list of the words in the string, using sep as the delimiter
string. If maxsplit is given, at most maxsplit splits are done (thus,
the list will have at most maxsplit+1 elements). If maxsplit is not
specified, then there is no limit on the number of splits (all possible
splits are made).
If sep is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings (for example, '1,,2'.split(',') returns
['1','','2']). The sep argument may consist of multiple characters
(for example, '1<>2<>3'.split('<>') returns ['1','2','3']).
Splitting an empty string with a specified separator returns [''].
If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single separator,
and the result will contain no empty strings at the start or end if the
string has leading or trailing whitespace. Consequently, splitting an empty
string or a string consisting of just whitespace with a None separator
returns [].
For example, '123'.split() returns ['1','2','3'], and
'123'.split(None,1) returns ['1','23'].
Return a list of the lines in the string, breaking at line boundaries. Line
breaks are not included in the resulting list unless keepends is given and
true.
Return True if string starts with the prefix, otherwise return False.
prefix can also be a tuple of prefixes to look for. With optional start,
test string beginning at that position. With optional end, stop comparing
string at that position.
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
If omitted or None, the chars argument defaults to removing whitespace.
The chars argument is not a prefix or suffix; rather, all combinations of its
values are stripped:
Return a titlecased version of the string where words start with an uppercase
character and the remaining characters are lowercase.
The algorithm uses a simple language-independent definition of a word as
groups of consecutive letters. The definition works in many contexts but
it means that apostrophes in contractions and possessives form word
boundaries, which may not be the desired result:
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
A workaround for apostrophes can be constructed using regular expressions:
Return a copy of the s where all characters have been mapped through the
map which must be a dictionary of Unicode ordinals (integers) to Unicode
ordinals, strings or None. Unmapped characters are left untouched.
Characters mapped to None are deleted.
You can use str.maketrans() to create a translation map from
character-to-character mappings in different formats.
Note
An even more flexible approach is to create a custom character mapping
codec using the codecs module (see encodings.cp1251 for an
example).
Return the numeric string left filled with zeros in a string of length
width. A sign prefix is handled correctly. The original string is
returned if width is less than len(s).
The formatting operations described here are obsolete and may go away in future
versions of Python. Use the new String Formatting in new code.
String objects have one unique built-in operation: the % operator (modulo).
This is also known as the string formatting or interpolation operator.
Given format%values (where format is a string), % conversion
specifications in format are replaced with zero or more elements of values.
The effect is similar to the using sprintf() in the C language.
If format requires a single argument, values may be a single non-tuple
object. [4] Otherwise, values must be a tuple with exactly the number of
items specified by the format string, or a single mapping object (for example, a
dictionary).
A conversion specifier contains two or more characters and has the following
components, which must occur in this order:
The '%' character, which marks the start of the specifier.
Mapping key (optional), consisting of a parenthesised sequence of characters
(for example, (somename)).
Conversion flags (optional), which affect the result of some conversion
types.
Minimum field width (optional). If specified as an '*' (asterisk), the
actual width is read from the next element of the tuple in values, and the
object to convert comes after the minimum field width and optional precision.
Precision (optional), given as a '.' (dot) followed by the precision. If
specified as '*' (an asterisk), the actual precision is read from the next
element of the tuple in values, and the value to convert comes after the
precision.
Length modifier (optional).
Conversion type.
When the right argument is a dictionary (or other mapping type), then the
formats in the string must include a parenthesised mapping key into that
dictionary inserted immediately after the '%' character. The mapping key
selects the value to be formatted from the mapping. For example:
>>> print('%(language)s has %(number)03d quote types.' %
... {'language': "Python", "number": 2})
Python has 002 quote types.
In this case no * specifiers may occur in a format (since they require a
sequential parameter list).
The conversion flag characters are:
Flag
Meaning
'#'
The value conversion will use the “alternate form” (where defined
below).
'0'
The conversion will be zero padded for numeric values.
'-'
The converted value is left adjusted (overrides the '0'
conversion if both are given).
''
(a space) A blank should be left before a positive number (or empty
string) produced by a signed conversion.
'+'
A sign character ('+' or '-') will precede the conversion
(overrides a “space” flag).
A length modifier (h, l, or L) may be present, but is ignored as it
is not necessary for Python – so e.g. %ld is identical to %d.
The conversion types are:
Conversion
Meaning
Notes
'd'
Signed integer decimal.
'i'
Signed integer decimal.
'o'
Signed octal value.
(1)
'u'
Obsolete type – it is identical to 'd'.
(7)
'x'
Signed hexadecimal (lowercase).
(2)
'X'
Signed hexadecimal (uppercase).
(2)
'e'
Floating point exponential format (lowercase).
(3)
'E'
Floating point exponential format (uppercase).
(3)
'f'
Floating point decimal format.
(3)
'F'
Floating point decimal format.
(3)
'g'
Floating point format. Uses lowercase exponential
format if exponent is less than -4 or not less than
precision, decimal format otherwise.
(4)
'G'
Floating point format. Uses uppercase exponential
format if exponent is less than -4 or not less than
precision, decimal format otherwise.
(4)
'c'
Single character (accepts integer or single
character string).
String (converts any Python object using
ascii()).
(5)
'%'
No argument is converted, results in a '%'
character in the result.
Notes:
The alternate form causes a leading zero ('0') to be inserted between
left-hand padding and the formatting of the number if the leading character
of the result is not already a zero.
The alternate form causes a leading '0x' or '0X' (depending on whether
the 'x' or 'X' format was used) to be inserted between left-hand padding
and the formatting of the number if the leading character of the result is not
already a zero.
The alternate form causes the result to always contain a decimal point, even if
no digits follow it.
The precision determines the number of digits after the decimal point and
defaults to 6.
The alternate form causes the result to always contain a decimal point, and
trailing zeroes are not removed as they would otherwise be.
The precision determines the number of significant digits before and after the
decimal point and defaults to 6.
If precision is N, the output is truncated to N characters.
The range type is an immutable sequence which is commonly used for
looping. The advantage of the range type is that an range
object will always take the same amount of memory, no matter the size of the
range it represents.
Range objects have relatively little behavior: they support indexing, contains,
iteration, the len() function, and the following methods:
List and bytearray objects support additional operations that allow in-place
modification of the object. Other mutable sequence types (when added to the
language) should also support these operations. Strings and tuples are
immutable sequence types: such objects cannot be modified once created. The
following operations are defined on mutable sequence types (where x is an
arbitrary object).
Note that while lists allow their items to be of any type, bytearray object
“items” are all integers in the range 0 <= x < 256.
Operation
Result
Notes
s[i]=x
item i of s is replaced by
x
s[i:j]=t
slice of s from i to j
is replaced by the contents of
the iterable t
dels[i:j]
same as s[i:j]=[]
s[i:j:k]=t
the elements of s[i:j:k]
are replaced by those of t
(1)
dels[i:j:k]
removes the elements of
s[i:j:k] from the list
s.append(x)
same as s[len(s):len(s)]=[x]
s.extend(x)
same as s[len(s):len(s)]=x
(2)
s.count(x)
return number of i‘s for
which s[i]==x
s.index(x[,i[,j]])
return smallest k such that
s[k]==x and i<=k<j
(3)
s.insert(i,x)
same as s[i:i]=[x]
(4)
s.pop([i])
same as x=s[i];dels[i];returnx
(5)
s.remove(x)
same as dels[s.index(x)]
(3)
s.reverse()
reverses the items of s in
place
(6)
s.sort([key[,reverse]])
sort the items of s in place
(6), (7), (8)
Notes:
t must have the same length as the slice it is replacing.
x can be any iterable object.
Raises ValueError when x is not found in s. When a negative index is
passed as the second or third parameter to the index() method, the sequence
length is added, as for slice indices. If it is still negative, it is truncated
to zero, as for slice indices.
When a negative index is passed as the first parameter to the insert()
method, the sequence length is added, as for slice indices. If it is still
negative, it is truncated to zero, as for slice indices.
The optional argument i defaults to -1, so that by default the last
item is removed and returned.
The sort() and reverse() methods modify the sequence in place for
economy of space when sorting or reversing a large sequence. To remind you
that they operate by side effect, they don’t return the sorted or reversed
sequence.
The sort() method takes optional arguments for controlling the
comparisons. Each must be specified as a keyword argument.
key specifies a function of one argument that is used to extract a comparison
key from each list element: key=str.lower. The default value is None.
Use functools.cmp_to_key() to convert an
old-style cmp function to a key function.
reverse is a boolean value. If set to True, then the list elements are
sorted as if each comparison were reversed.
The sort() method is guaranteed to be stable. A
sort is stable if it guarantees not to change the relative order of elements
that compare equal — this is helpful for sorting in multiple passes (for
example, sort by department, then by salary grade).
CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even
inspect, the list is undefined. The C implementation of Python makes the
list appear empty for the duration, and raises ValueError if it can
detect that the list has been mutated during a sort.
Bytes and bytearray objects, being “strings of bytes”, have all methods found on
strings, with the exception of encode(), format() and
isidentifier(), which do not make sense with these types. For converting
the objects to strings, they have a decode() method.
Wherever one of these methods needs to interpret the bytes as characters
(e.g. the is...() methods), the ASCII character set is assumed.
Note
The methods on bytes and bytearray objects don’t accept strings as their
arguments, just as the methods on strings don’t accept bytes as their
arguments. For example, you have to write
Return a string decoded from the given bytes. Default encoding is
'utf-8'. errors may be given to set a different
error handling scheme. The default for errors is 'strict', meaning
that encoding errors raise a UnicodeError. Other possible values are
'ignore', 'replace' and any other name registered via
codecs.register_error(), see section Codec Base Classes. For a
list of possible encodings, see section Standard Encodings.
Changed in version 3.1:
Changed in version 3.1: Added support for keyword arguments.
The bytes and bytearray types have an additional class method:
This bytes class method returns a bytes or bytearray object,
decoding the given string object. The string must contain two hexadecimal
digits per byte, spaces are ignored.
>>> bytes.fromhex('f0 f1f2 ')
b'\xf0\xf1\xf2'
The maketrans and translate methods differ in semantics from the versions
available on strings:
Return a copy of the bytes or bytearray object where all bytes occurring in
the optional argument delete are removed, and the remaining bytes have been
mapped through the given translation table, which must be a bytes object of
length 256.
You can use the bytes.maketrans() method to create a translation table.
Set the table argument to None for translations that only delete
characters:
>>> b'read this short text'.translate(None, b'aeiou')
b'rd ths shrt txt'
This static method returns a translation table usable for
bytes.translate() that will map each character in from into the
character at the same position in to; from and to must be bytes objects
and have the same length.
A set object is an unordered collection of distinct hashable objects.
Common uses include membership testing, removing duplicates from a sequence, and
computing mathematical operations such as intersection, union, difference, and
symmetric difference.
(For other containers see the built in dict, list,
and tuple classes, and the collections module.)
Like other collections, sets support xinset, len(set), and forxinset. Being an unordered collection, sets do not record element position or
order of insertion. Accordingly, sets do not support indexing, slicing, or
other sequence-like behavior.
There are currently two built-in set types, set and frozenset.
The set type is mutable — the contents can be changed using methods
like add() and remove(). Since it is mutable, it has no hash value
and cannot be used as either a dictionary key or as an element of another set.
The frozenset type is immutable and hashable — its contents cannot be
altered after it is created; it can therefore be used as a dictionary key or as
an element of another set.
Non-empty sets (not frozensets) can be created by placing a comma-separated list
of elements within braces, for example: {'jack','sjoerd'}, in addition to the
set constructor.
Return a new set or frozenset object whose elements are taken from
iterable. The elements of a set must be hashable. To represent sets of
sets, the inner sets must be frozenset objects. If iterable is
not specified, a new empty set is returned.
Instances of set and frozenset provide the following
operations:
Note, the non-operator versions of union(), intersection(),
difference(), and symmetric_difference(), issubset(), and
issuperset() methods will accept any iterable as an argument. In
contrast, their operator based counterparts require their arguments to be
sets. This precludes error-prone constructions like set('abc')&'cbs'
in favor of the more readable set('abc').intersection('cbs').
Both set and frozenset support set to set comparisons. Two
sets are equal if and only if every element of each set is contained in the
other (each is a subset of the other). A set is less than another set if and
only if the first set is a proper subset of the second set (is a subset, but
is not equal). A set is greater than another set if and only if the first set
is a proper superset of the second set (is a superset, but is not equal).
Instances of set are compared to instances of frozenset
based on their members. For example, set('abc')==frozenset('abc')
returns True and so does set('abc')inset([frozenset('abc')]).
The subset and equality comparisons do not generalize to a complete ordering
function. For example, any two disjoint sets are not equal and are not
subsets of each other, so all of the following return False: a<b,
a==b, or a>b.
Since sets only define partial ordering (subset relationships), the output of
the list.sort() method is undefined for lists of sets.
Set elements, like dictionary keys, must be hashable.
Binary operations that mix set instances with frozenset
return the type of the first operand. For example: frozenset('ab')|set('bc') returns an instance of frozenset.
The following table lists operations available for set that do not
apply to immutable instances of frozenset:
Note, the elem argument to the __contains__(), remove(), and
discard() methods may be a set. To support searching for an equivalent
frozenset, the elem set is temporarily mutated during the search and then
restored. During the search, the elem set should not be read or mutated
since it does not have a meaningful value.
A mapping object maps hashable values to arbitrary objects.
Mappings are mutable objects. There is currently only one standard mapping
type, the dictionary. (For other containers see the built in
list, set, and tuple classes, and the
collections module.)
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object identity) may
not be used as keys. Numeric types used for keys obey the normal rules for
numeric comparison: if two numbers compare equal (such as 1 and 1.0)
then they can be used interchangeably to index the same dictionary entry. (Note
however, that since computers store floating-point numbers as approximations it
is usually unwise to use them as dictionary keys.)
Dictionaries can be created by placing a comma-separated list of key:value
pairs within braces, for example: {'jack':4098,'sjoerd':4127} or {4098:'jack',4127:'sjoerd'}, or by the dict constructor.
Return a new dictionary initialized from an optional positional argument or
from a set of keyword arguments. If no arguments are given, return a new
empty dictionary. If the positional argument arg is a mapping object,
return a dictionary mapping the same keys to the same values as does the
mapping object. Otherwise the positional argument must be a sequence, a
container that supports iteration, or an iterator object. The elements of
the argument must each also be of one of those kinds, and each must in turn
contain exactly two objects. The first is used as a key in the new
dictionary, and the second as the key’s value. If a given key is seen more
than once, the last value associated with it is retained in the new
dictionary.
If keyword arguments are given, the keywords themselves with their associated
values are added as items to the dictionary. If a key is specified both in
the positional argument and as a keyword argument, the value associated with
the keyword is retained in the dictionary. For example, these all return a
dictionary equal to {"one":1,"two":2}:
dict(one=1,two=2)
dict({'one':1,'two':2})
dict(zip(('one','two'),(1,2)))
dict([['two',2],['one',1]])
The first example only works for keys that are valid Python identifiers; the
others work with any valid keys.
These are the operations that dictionaries support (and therefore, custom
mapping types should support too):
len(d)
Return the number of items in the dictionary d.
d[key]
Return the item of d with key key. Raises a KeyError if key is
not in the map.
If a subclass of dict defines a method __missing__(), if the key key
is not present, the d[key] operation calls that method with the key key
as argument. The d[key] operation then returns or raises whatever is
returned or raised by the __missing__(key) call if the key is not
present. No other operations or methods invoke __missing__(). If
__missing__() is not defined, KeyError is raised.
__missing__() must be a method; it cannot be an instance variable:
Return the value for key if key is in the dictionary, else default.
If default is not given, it defaults to None, so that this method
never raises a KeyError.
If key is in the dictionary, remove it and return its value, else return
default. If default is not given and key is not in the dictionary,
a KeyError is raised.
Remove and return an arbitrary (key,value) pair from the dictionary.
popitem() is useful to destructively iterate over a dictionary, as
often used in set algorithms. If the dictionary is empty, calling
popitem() raises a KeyError.
Update the dictionary with the key/value pairs from other, overwriting
existing keys. Return None.
update() accepts either another dictionary object or an iterable of
key/value pairs (as tuples or other iterables of length two). If keyword
arguments are specified, the dictionary is then updated with those
key/value pairs: d.update(red=1,blue=2).
The objects returned by dict.keys(), dict.values() and
dict.items() are view objects. They provide a dynamic view on the
dictionary’s entries, which means that when the dictionary changes, the view
reflects these changes.
Dictionary views can be iterated over to yield their respective data, and
support membership tests:
len(dictview)
Return the number of entries in the dictionary.
iter(dictview)
Return an iterator over the keys, values or items (represented as tuples of
(key,value)) in the dictionary.
Keys and values are iterated over in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary’s history
of insertions and deletions. If keys, values and items views are iterated
over with no intervening modifications to the dictionary, the order of items
will directly correspond. This allows the creation of (value,key) pairs
using zip(): pairs=zip(d.values(),d.keys()). Another way to
create the same list is pairs=[(v,k)for(k,v)ind.items()].
Iterating views while adding or deleting entries in the dictionary may raise
a RuntimeError or fail to iterate over all entries.
x in dictview
Return True if x is in the underlying dictionary’s keys, values or
items (in the latter case, x should be a (key,value) tuple).
Keys views are set-like since their entries are unique and hashable. If all
values are hashable, so that (key,value) pairs are unique and hashable,
then the items view is also set-like. (Values views are not treated as set-like
since the entries are generally not unique.) For set-like views, all of the
operations defined for the abstract base class collections.Set are
available (for example, ==, <, or ^).
An example of dictionary view usage:
>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.keys()
>>> values = dishes.values()
>>> # iteration
>>> n = 0
>>> for val in values:
... n += val
>>> print(n)
504
>>> # keys and values are iterated over in the same order
>>> list(keys)
['eggs', 'bacon', 'sausage', 'spam']
>>> list(values)
[2, 1, 1, 500]
>>> # view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> del dishes['sausage']
>>> list(keys)
['spam', 'bacon']
>>> # set operations
>>> keys & {'eggs', 'bacon', 'salad'}
{'bacon'}
>>> keys ^ {'sausage', 'juice'}
{'juice', 'sausage', 'bacon', 'spam'}
memoryview objects allow Python code to access the internal data
of an object that supports the buffer protocol without
copying. Memory is generally interpreted as simple bytes.
Create a memoryview that references obj. obj must support the
buffer protocol. Built-in objects that support the buffer protocol include
bytes and bytearray.
A memoryview has the notion of an element, which is the
atomic memory unit handled by the originating object obj. For many
simple types such as bytes and bytearray, an element
is a single byte, but other types such as array.array may have
bigger elements.
len(view) returns the total number of elements in the memoryview,
view. The itemsize attribute will give you the
number of bytes in a single element.
A memoryview supports slicing to expose its data. Taking a single
index will return a single element as a bytes object. Full
slicing will result in a subview:
>>> v = memoryview(b'abcefg')
>>> v[1]
b'b'
>>> v[-1]
b'g'
>>> v[1:4]
<memory at 0x77ab28>
>>> bytes(v[1:4])
b'bce'
If the object the memoryview is over supports changing its data, the
memoryview supports slice assignment:
>>> data = bytearray(b'abcefg')
>>> v = memoryview(data)
>>> v.readonly
False
>>> v[0] = b'z'
>>> data
bytearray(b'zbcefg')
>>> v[1:4] = b'123'
>>> data
bytearray(b'a123fg')
>>> v[2] = b'spam'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot modify size of memoryview object
Notice how the size of the memoryview object cannot be changed.
Release the underlying buffer exposed by the memoryview object. Many
objects take special actions when a view is held on them (for example,
a bytearray would temporarily forbid resizing); therefore,
calling release() is handy to remove these restrictions (and free any
dangling resources) as soon as possible.
After this method has been called, any further operation on the view
raises a ValueError (except release() itself which can
be called multiple times):
>>> m = memoryview(b'abc')
>>> m.release()
>>> m[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operation forbidden on released memoryview object
The context management protocol can be used for a similar effect,
using the with statement:
>>> with memoryview(b'abc') as m:
... m[0]
...
b'a'
>>> m[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operation forbidden on released memoryview object
New in version 3.2:
New in version 3.2.
There are also several readonly attributes available:
Python’s with statement supports the concept of a runtime context
defined by a context manager. This is implemented using a pair of methods
that allow user-defined classes to define a runtime context that is entered
before the statement body is executed and exited when the statement ends:
Enter the runtime context and return either this object or another object
related to the runtime context. The value returned by this method is bound to
the identifier in the as clause of with statements using
this context manager.
An example of a context manager that returns itself is a file object.
File objects return themselves from __enter__() to allow open() to be
used as the context expression in a with statement.
An example of a context manager that returns a related object is the one
returned by decimal.localcontext(). These managers set the active
decimal context to a copy of the original decimal context and then return the
copy. This allows changes to be made to the current decimal context in the body
of the with statement without affecting code outside the
with statement.
Exit the runtime context and return a Boolean flag indicating if any exception
that occurred should be suppressed. If an exception occurred while executing the
body of the with statement, the arguments contain the exception type,
value and traceback information. Otherwise, all three arguments are None.
Returning a true value from this method will cause the with statement
to suppress the exception and continue execution with the statement immediately
following the with statement. Otherwise the exception continues
propagating after this method has finished executing. Exceptions that occur
during execution of this method will replace any exception that occurred in the
body of the with statement.
The exception passed in should never be reraised explicitly - instead, this
method should return a false value to indicate that the method completed
successfully and does not want to suppress the raised exception. This allows
context management code (such as contextlib.nested) to easily detect whether
or not an __exit__() method has actually failed.
Python defines several context managers to support easy thread synchronisation,
prompt closure of files or other objects, and simpler manipulation of the active
decimal arithmetic context. The specific types are not treated specially beyond
their implementation of the context management protocol. See the
contextlib module for some examples.
Python’s generators and the contextlib.contextmanager decorator
provide a convenient way to implement these protocols. If a generator function is
decorated with the contextlib.contextmanager decorator, it will return a
context manager implementing the necessary __enter__() and
__exit__() methods, rather than the iterator produced by an undecorated
generator function.
Note that there is no specific slot for any of these methods in the type
structure for Python objects in the Python/C API. Extension types wanting to
define these methods must provide them as a normal Python accessible method.
Compared to the overhead of setting up the runtime context, the overhead of a
single class dictionary lookup is negligible.
The only special operation on a module is attribute access: m.name, where
m is a module and name accesses a name defined in m‘s symbol table.
Module attributes can be assigned to. (Note that the import
statement is not, strictly speaking, an operation on a module object; importfoo does not require a module object named foo to exist, rather it requires
an (external) definition for a module named foo somewhere.)
A special attribute of every module is __dict__. This is the dictionary
containing the module’s symbol table. Modifying this dictionary will actually
change the module’s symbol table, but direct assignment to the __dict__
attribute is not possible (you can write m.__dict__['a']=1, which defines
m.a to be 1, but you can’t write m.__dict__={}). Modifying
__dict__ directly is not recommended.
Modules built into the interpreter are written like this: <module'sys'(built-in)>. If loaded from a file, they are written as <module'os'from'/usr/local/lib/pythonX.Y/os.pyc'>.
Function objects are created by function definitions. The only operation on a
function object is to call it: func(argument-list).
There are really two flavors of function objects: built-in functions and
user-defined functions. Both support the same operation (to call the function),
but the implementation is different, hence the different object types.
Methods are functions that are called using the attribute notation. There are
two flavors: built-in methods (such as append() on lists) and class
instance methods. Built-in methods are described with the types that support
them.
If you access a method (a function defined in a class namespace) through an
instance, you get a special object: a bound method (also called
instance method) object. When called, it will add the self argument
to the argument list. Bound methods have two special read-only attributes:
m.__self__ is the object on which the method operates, and m.__func__ is
the function implementing the method. Calling m(arg-1,arg-2,...,arg-n)
is completely equivalent to calling m.__func__(m.__self__,arg-1,arg-2,...,arg-n).
Like function objects, bound method objects support getting arbitrary
attributes. However, since method attributes are actually stored on the
underlying function object (meth.__func__), setting method attributes on
bound methods is disallowed. Attempting to set a method attribute results in a
TypeError being raised. In order to set a method attribute, you need to
explicitly set it on the underlying function object:
class C:
def method(self):
pass
c = C()
c.method.__func__.whoami = 'my name is c'
Code objects are used by the implementation to represent “pseudo-compiled”
executable Python code such as a function body. They differ from function
objects because they don’t contain a reference to their global execution
environment. Code objects are returned by the built-in compile() function
and can be extracted from function objects through their __code__
attribute. See also the code module.
A code object can be executed or evaluated by passing it (instead of a source
string) to the exec() or eval() built-in functions.
Type objects represent the various object types. An object’s type is accessed
by the built-in function type(). There are no special operations on
types. The standard module types defines names for all standard built-in
types.
This object is returned by functions that don’t explicitly return a value. It
supports no special operations. There is exactly one null object, named
None (a built-in name).
This object is commonly used by slicing (see Slicings). It supports no
special operations. There is exactly one ellipsis object, named
Ellipsis (a built-in name).
This object is returned from comparisons and binary operations when they are
asked to operate on types they don’t support. See Comparisons for more
information.
Boolean values are the two constant objects False and True. They are
used to represent truth values (although other values can also be considered
false or true). In numeric contexts (for example when used as the argument to
an arithmetic operator), they behave like the integers 0 and 1, respectively.
The built-in function bool() can be used to cast any value to a Boolean,
if the value can be interpreted as a truth value (see section Truth Value
Testing above).
The implementation adds a few special read-only attributes to several object
types, where they are relevant. Some of these are not reported by the
dir() built-in function.
This method can be overridden by a metaclass to customize the method
resolution order for its instances. It is called at class instantiation, and
its result is stored in __mro__.
Each new-style class keeps a list of weak references to its immediate
subclasses. This method returns a list of all those references still alive.
Example:
In Python, all exceptions must be instances of a class that derives from
BaseException. In a try statement with an except
clause that mentions a particular class, that clause also handles any exception
classes derived from that class (but not exception classes from which it is
derived). Two exception classes that are not related via subclassing are never
equivalent, even if they have the same name.
The built-in exceptions listed below can be generated by the interpreter or
built-in functions. Except where mentioned, they have an “associated value”
indicating the detailed cause of the error. This may be a string or a tuple of
several items of information (e.g., an error code and a string explaining the
code). The associated value is usually passed as arguments to the exception
class’s constructor.
User code can raise built-in exceptions. This can be used to test an exception
handler or to report an error condition “just like” the situation in which the
interpreter raises the same exception; but beware that there is nothing to
prevent user code from raising an inappropriate error.
The built-in exception classes can be sub-classed to define new exceptions;
programmers are encouraged to at least derive new exceptions from the
Exception class and not BaseException. More information on
defining exceptions is available in the Python Tutorial under
自定义异常.
The following exceptions are used mostly as base classes for other exceptions.
The base class for all built-in exceptions. It is not meant to be directly
inherited by user-defined classes (for that, use Exception). If
bytes() or str() is called on an instance of this class, the
representation of the argument(s) to the instance are returned, or the empty
string when there were no arguments.
The tuple of arguments given to the exception constructor. Some built-in
exceptions (like IOError) expect a certain number of arguments and
assign a special meaning to the elements of this tuple, while others are
usually called only with a single string giving an error message.
The base class for the exceptions that are raised when a key or index used on
a mapping or sequence is invalid: IndexError, KeyError. This
can be raised directly by codecs.lookup().
The base class for exceptions that can occur outside the Python system:
IOError, OSError. When exceptions of this type are created with a
2-tuple, the first item is available on the instance’s errno attribute
(it is assumed to be an error number), and the second item is available on the
strerror attribute (it is usually the associated error message). The
tuple itself is also available on the args attribute.
When an EnvironmentError exception is instantiated with a 3-tuple, the
first two items are available as above, while the third item is available on the
filename attribute. However, for backwards compatibility, the
args attribute contains only a 2-tuple of the first two constructor
arguments.
The filename attribute is None when this exception is created with
other than 3 arguments. The errno and strerror attributes are
also None when the instance was created with other than 2 or 3 arguments.
In this last case, args contains the verbatim constructor arguments as a
tuple.
The following exceptions are the exceptions that are usually raised.
Raised when an attribute reference (see Attribute references) or
assignment fails. (When an object does not support attribute references or
attribute assignments at all, TypeError is raised.)
Raised when one of the built-in functions (input() or raw_input())
hits an end-of-file condition (EOF) without reading any data. (N.B.: the
file.read() and file.readline() methods return an empty string
when they hit EOF.)
Raised when a floating point operation fails. This exception is always defined,
but can only be raised when Python is configured with the
--with-fpectl option, or the WANT_SIGFPE_HANDLER symbol is
defined in the pyconfig.h file.
Raised when an I/O operation (such as the built-in print() or
open() functions or a method of a file object) fails for an
I/O-related reason, e.g., “file not found” or “disk full”.
This class is derived from EnvironmentError. See the discussion above
for more information on exception instance attributes.
Raised when a sequence subscript is out of range. (Slice indices are
silently truncated to fall in the allowed range; if an index is not an
integer, TypeError is raised.)
Raised when the user hits the interrupt key (normally Control-C or
Delete). During execution, a check for interrupts is made
regularly. The exception inherits from BaseException so as to not be
accidentally caught by code that catches Exception and thus prevent
the interpreter from exiting.
Raised when an operation runs out of memory but the situation may still be
rescued (by deleting some objects). The associated value is a string indicating
what kind of (internal) operation ran out of memory. Note that because of the
underlying memory management architecture (C’s malloc() function), the
interpreter may not always be able to completely recover from this situation; it
nevertheless raises an exception so that a stack traceback can be printed, in
case a run-away program was the cause.
Raised when a local or global name is not found. This applies only to
unqualified names. The associated value is an error message that includes the
name that could not be found.
This exception is derived from RuntimeError. In user defined base
classes, abstract methods should raise this exception when they require derived
classes to override the method.
This exception is derived from EnvironmentError. It is raised when a
function returns a system-related error (not for illegal argument types or
other incidental errors). The errno attribute is a numeric error
code from errno, and the strerror attribute is the
corresponding string, as would be printed by the C function perror().
See the module errno, which contains names for the error codes defined
by the underlying operating system.
For exceptions that involve a file system path (such as chdir() or
unlink()), the exception instance will contain a third attribute,
filename, which is the file name passed to the function.
Raised when the result of an arithmetic operation is too large to be
represented. This cannot occur for integers (which would rather raise
MemoryError than give up). Because of the lack of standardization of
floating point exception handling in C, most floating point operations also
aren’t checked.
This exception is raised when a weak reference proxy, created by the
weakref.proxy() function, is used to access an attribute of the referent
after it has been garbage collected. For more information on weak references,
see the weakref module.
Raised when an error is detected that doesn’t fall in any of the other
categories. The associated value is a string indicating what precisely went
wrong. (This exception is mostly a relic from a previous version of the
interpreter; it is not used very much any more.)
Raised when the parser encounters a syntax error. This may occur in an
import statement, in a call to the built-in functions exec()
or eval(), or when reading the initial script or standard input
(also interactively).
Instances of this class have attributes filename, lineno,
offset and text for easier access to the details. str()
of the exception instance returns only the message.
Raised when the interpreter finds an internal error, but the situation does not
look so serious to cause it to abandon all hope. The associated value is a
string indicating what went wrong (in low-level terms).
You should report this to the author or maintainer of your Python interpreter.
Be sure to report the version of the Python interpreter (sys.version; it is
also printed at the start of an interactive Python session), the exact error
message (the exception’s associated value) and if possible the source of the
program that triggered the error.
This exception is raised by the sys.exit() function. When it is not
handled, the Python interpreter exits; no stack traceback is printed. If the
associated value is an integer, it specifies the system exit status (passed
to C’s exit() function); if it is None, the exit status is zero;
if it has another type (such as a string), the object’s value is printed and
the exit status is one.
Instances have an attribute code which is set to the proposed exit
status or error message (defaulting to None). Also, this exception derives
directly from BaseException and not Exception, since it is not
technically an error.
A call to sys.exit() is translated into an exception so that clean-up
handlers (finally clauses of try statements) can be
executed, and so that a debugger can execute a script without running the risk
of losing control. The os._exit() function can be used if it is
absolutely positively necessary to exit immediately (for example, in the child
process after a call to fork()).
The exception inherits from BaseException instead of Exception so
that it is not accidentally caught by code that catches Exception. This
allows the exception to properly propagate up and cause the interpreter to exit.
Raised when an operation or function is applied to an object of inappropriate
type. The associated value is a string giving details about the type mismatch.
Raised when a reference is made to a local variable in a function or method, but
no value has been bound to that variable. This is a subclass of
NameError.
Raised when a built-in operation or function receives an argument that has the
right type but an inappropriate value, and the situation is not described by a
more precise exception such as IndexError.
Raised when a Windows-specific error occurs or when the error number does not
correspond to an errno value. The winerror and
strerror values are created from the return values of the
GetLastError() and FormatMessage() functions from the Windows
Platform API. The errno value maps the winerror value to
corresponding errno.h values. This is a subclass of OSError.
Raised when the second argument of a division or modulo operation is zero. The
associated value is a string indicating the type of the operands and the
operation.
The following exceptions are used as warning categories; see the warnings
module for more information.
The modules described in this chapter provide a wide range of string
manipulation operations.
In addition, Python’s built-in string classes support the sequence type methods
described in the Sequence Types — str, bytes, bytearray, list, tuple, range section, and also the string-specific methods
described in the String Methods section. To output formatted strings,
see the String Formatting section. Also, see the re module for
string functions based on regular expressions.
A string containing all ASCII characters that are considered whitespace.
This includes the characters space, tab, linefeed, return, formfeed, and
vertical tab.
The built-in string class provides the ability to do complex variable
substitutions and value formatting via the format() method described in
PEP 3101. The Formatter class in the string module allows
you to create and customize your own string formatting behaviors using the same
implementation as the built-in format() method.
format() is the primary API method. It takes a format template
string, and an arbitrary set of positional and keyword argument.
format() is just a wrapper that calls vformat().
This function does the actual work of formatting. It is exposed as a
separate function for cases where you want to pass in a predefined
dictionary of arguments, rather than unpacking and repacking the
dictionary as individual arguments using the *args and **kwds
syntax. vformat() does the work of breaking up the format template
string into character data and replacement fields. It calls the various
methods described below.
In addition, the Formatter defines a number of methods that are
intended to be replaced by subclasses:
Loop over the format_string and return an iterable of tuples
(literal_text, field_name, format_spec, conversion). This is used
by vformat() to break the string into either literal text, or
replacement fields.
The values in the tuple conceptually represent a span of literal text
followed by a single replacement field. If there is no literal text
(which can happen if two replacement fields occur consecutively), then
literal_text will be a zero-length string. If there is no replacement
field, then the values of field_name, format_spec and conversion
will be None.
Given field_name as returned by parse() (see above), convert it to
an object to be formatted. Returns a tuple (obj, used_key). The default
version takes strings of the form defined in PEP 3101, such as
“0[name]” or “label.title”. args and kwargs are as passed in to
vformat(). The return value used_key has the same meaning as the
key parameter to get_value().
Retrieve a given field value. The key argument will be either an
integer or a string. If it is an integer, it represents the index of the
positional argument in args; if it is a string, then it represents a
named argument in kwargs.
The args parameter is set to the list of positional arguments to
vformat(), and the kwargs parameter is set to the dictionary of
keyword arguments.
For compound field names, these functions are only called for the first
component of the field name; Subsequent components are handled through
normal attribute and indexing operations.
So for example, the field expression ‘0.name’ would cause
get_value() to be called with a key argument of 0. The name
attribute will be looked up after get_value() returns by calling the
built-in getattr() function.
If the index or keyword refers to an item that does not exist, then an
IndexError or KeyError should be raised.
Implement checking for unused arguments if desired. The arguments to this
function is the set of all argument keys that were actually referred to in
the format string (integers for positional arguments, and strings for
named arguments), and a reference to the args and kwargs that was
passed to vformat. The set of unused args can be calculated from these
parameters. check_unused_args() is assumed to raise an exception if
the check fails.
Converts the value (returned by get_field()) given a conversion type
(as in the tuple returned by the parse() method). The default
version understands ‘r’ (repr) and ‘s’ (str) conversion types.
The str.format() method and the Formatter class share the same
syntax for format strings (although in the case of Formatter,
subclasses can define their own format string syntax).
Format strings contain “replacement fields” surrounded by curly braces {}.
Anything that is not contained in braces is considered literal text, which is
copied unchanged to the output. If you need to include a brace character in the
literal text, it can be escaped by doubling: {{ and }}.
The grammar for a replacement field is as follows:
In less formal terms, the replacement field can start with a field_name that specifies
the object whose value is to be formatted and inserted
into the output instead of the replacement field.
The field_name is optionally followed by a conversion field, which is
preceded by an exclamation point '!', and a format_spec, which is preceded
by a colon ':'. These specify a non-default format for the replacement value.
The field_name itself begins with an arg_name that is either either a number or a
keyword. If it’s a number, it refers to a positional argument, and if it’s a keyword,
it refers to a named keyword argument. If the numerical arg_names in a format string
are 0, 1, 2, ... in sequence, they can all be omitted (not just some)
and the numbers 0, 1, 2, ... will be automatically inserted in that order.
Because arg_name is not quote-delimited, it is not possible to specify arbitrary
dictionary keys (e.g., the strings '10' or ':-]') within a format string.
The arg_name can be followed by any number of index or
attribute expressions. An expression of the form '.name' selects the named
attribute using getattr(), while an expression of the form '[index]'
does an index lookup using __getitem__().
Changed in version 3.1:
Changed in version 3.1: The positional argument specifiers can be omitted, so '{}{}' is
equivalent to '{0}{1}'.
Some simple format string examples:
"First, thou shalt count to {0}" # References first positional argument
"Bring me a {}" # Implicitly references the first positional argument
"From {} to {}" # Same as "From {0} to {1}"
"My quest is {name}" # References keyword argument 'name'
"Weight in tons {0.weight}" # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}" # First element of keyword argument 'players'.
The conversion field causes a type coercion before formatting. Normally, the
job of formatting a value is done by the __format__() method of the value
itself. However, in some cases it is desirable to force a type to be formatted
as a string, overriding its own definition of formatting. By converting the
value to a string before calling __format__(), the normal formatting logic
is bypassed.
Three conversion flags are currently supported: '!s' which calls str()
on the value, '!r' which calls repr() and '!a' which calls
ascii().
Some examples:
"Harold's a clever {0!s}" # Calls str() on the argument first
"Bring out the holy {name!r}" # Calls repr() on the argument first
"More {!a}" # Calls ascii() on the argument first
The format_spec field contains a specification of how the value should be
presented, including such details as field width, alignment, padding, decimal
precision and so on. Each value type can define its own “formatting
mini-language” or interpretation of the format_spec.
Most built-in types support a common formatting mini-language, which is
described in the next section.
A format_spec field can also include nested replacement fields within it.
These nested replacement fields can contain only a field name; conversion flags
and format specifications are not allowed. The replacement fields within the
format_spec are substituted before the format_spec string is interpreted.
This allows the formatting of a value to be dynamically specified.
“Format specifications” are used within replacement fields contained within a
format string to define how individual values are presented (see
Format String Syntax). They can also be passed directly to the built-in
format() function. Each formattable type may define how the format
specification is to be interpreted.
Most built-in types implement the following options for format specifications,
although some of the formatting options are only supported by the numeric types.
A general convention is that an empty format string ("") produces
the same result as if you had called str() on the value. A
non-empty format string typically modifies the result.
The general form of a standard format specifier is:
format_spec ::= [[fill]align][sign][#][0][width][,][.precision][type]
fill ::= <a character other than '}'>
align ::= "<" | ">" | "=" | "^"
sign ::= "+" | "-" | " "
width ::= integerprecision ::= integertype ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
The fill character can be any character other than ‘{‘ or ‘}’. The presence
of a fill character is signaled by the character following it, which must be
one of the alignment options. If the second character of format_spec is not
a valid alignment option, then it is assumed that both the fill character and
the alignment option are absent.
The meaning of the various alignment options is as follows:
Option
Meaning
'<'
Forces the field to be left-aligned within the available
space (this is the default for most objects).
'>'
Forces the field to be right-aligned within the
available space (this is the default for numbers).
'='
Forces the padding to be placed after the sign (if any)
but before the digits. This is used for printing fields
in the form ‘+000000120’. This alignment option is only
valid for numeric types.
'^'
Forces the field to be centered within the available
space.
Note that unless a minimum field width is defined, the field width will always
be the same size as the data to fill it, so that the alignment option has no
meaning in this case.
The sign option is only valid for number types, and can be one of the
following:
Option
Meaning
'+'
indicates that a sign should be used for both
positive as well as negative numbers.
'-'
indicates that a sign should be used only for negative
numbers (this is the default behavior).
space
indicates that a leading space should be used on
positive numbers, and a minus sign on negative numbers.
The '#' option causes the “alternate form” to be used for the
conversion. The alternate form is defined differently for different
types. This option is only valid for integer, float, complex and
Decimal types. For integers, when binary, octal, or hexadecimal output
is used, this option adds the prefix respective '0b', '0o', or
'0x' to the output value. For floats, complex and Decimal the
alternate form causes the result of the conversion to always contain a
decimal-point character, even if no digits follow it. Normally, a
decimal-point character appears in the result of these conversions
only if a digit follows it. In addition, for 'g' and 'G'
conversions, trailing zeros are not removed from the result.
The ',' option signals the use of a comma for a thousands separator.
For a locale aware separator, use the 'n' integer presentation type
instead.
Changed in version 3.1:
Changed in version 3.1: Added the ',' option (see also PEP 378).
width is a decimal integer defining the minimum field width. If not
specified, then the field width will be determined by the content.
If the width field is preceded by a zero ('0') character, this enables
zero-padding. This is equivalent to an alignment type of '=' and a fill
character of '0'.
The precision is a decimal number indicating how many digits should be
displayed after the decimal point for a floating point value formatted with
'f' and 'F', or before and after the decimal point for a floating point
value formatted with 'g' or 'G'. For non-number types the field
indicates the maximum field size - in other words, how many characters will be
used from the field content. The precision is not allowed for integer values.
Finally, the type determines how the data should be presented.
The available string presentation types are:
Type
Meaning
's'
String format. This is the default type for strings and
may be omitted.
None
The same as 's'.
The available integer presentation types are:
Type
Meaning
'b'
Binary format. Outputs the number in base 2.
'c'
Character. Converts the integer to the corresponding
unicode character before printing.
'd'
Decimal Integer. Outputs the number in base 10.
'o'
Octal format. Outputs the number in base 8.
'x'
Hex format. Outputs the number in base 16, using lower-
case letters for the digits above 9.
'X'
Hex format. Outputs the number in base 16, using upper-
case letters for the digits above 9.
'n'
Number. This is the same as 'd', except that it uses
the current locale setting to insert the appropriate
number separator characters.
None
The same as 'd'.
In addition to the above presentation types, integers can be formatted
with the floating point presentation types listed below (except
'n' and None). When doing so, float() is used to convert the
integer to a floating point number before formatting.
The available presentation types for floating point and decimal values are:
Type
Meaning
'e'
Exponent notation. Prints the number in scientific
notation using the letter ‘e’ to indicate the exponent.
'E'
Exponent notation. Same as 'e' except it uses an
upper case ‘E’ as the separator character.
'f'
Fixed point. Displays the number as a fixed-point
number.
'F'
Fixed point. Same as 'f', but converts nan to
NAN and inf to INF.
'g'
General format. For a given precision p>=1,
this rounds the number to p significant digits and
then formats the result in either fixed-point format
or in scientific notation, depending on its magnitude.
The precise rules are as follows: suppose that the
result formatted with presentation type 'e' and
precision p-1 would have exponent exp. Then
if -4<=exp<p, the number is formatted
with presentation type 'f' and precision
p-1-exp. Otherwise, the number is formatted
with presentation type 'e' and precision p-1.
In both cases insignificant trailing zeros are removed
from the significand, and the decimal point is also
removed if there are no remaining digits following it.
Positive and negative infinity, positive and negative
zero, and nans, are formatted as inf, -inf,
0, -0 and nan respectively, regardless of
the precision.
A precision of 0 is treated as equivalent to a
precision of 1.
'G'
General format. Same as 'g' except switches to
'E' if the number gets too large. The
representations of infinity and NaN are uppercased, too.
'n'
Number. This is the same as 'g', except that it uses
the current locale setting to insert the appropriate
number separator characters.
'%'
Percentage. Multiplies the number by 100 and displays
in fixed ('f') format, followed by a percent sign.
None
Similar to 'g', except with at least one digit past
the decimal point and a default precision of 12. This is
intended to match str(), except you can add the
other format modifiers.
This section contains examples of the new format syntax and comparison with
the old %-formatting.
In most of the cases the syntax is similar to the old %-formatting, with the
addition of the {} and with : used instead of %.
For example, '%03.2f' can be translated to '{:03.2f}'.
The new format syntax also supports new and different options, shown in the
follow examples.
Accessing arguments by position:
>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{}, {}, {}'.format('a', 'b', 'c') # 3.1+ only
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc') # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad') # arguments' indices can be repeated
'abracadabra'
>>> c = 3-5j
>>> ('The complex number {0} is formed from the real part {0.real} '
... 'and the imaginary part {0.imag}.').format(c)
'The complex number (3-5j) is formed from the real part 3.0 and the imaginary part -5.0.'
>>> class Point:
... def __init__(self, x, y):
... self.x, self.y = x, y
... def __str__(self):
... return 'Point({self.x}, {self.y})'.format(self=self)
...
>>> str(Point(4, 2))
'Point(4, 2)'
>>> '{:<30}'.format('left aligned')
'left aligned '
>>> '{:>30}'.format('right aligned')
' right aligned'
>>> '{:^30}'.format('centered')
' centered '
>>> '{:*^30}'.format('centered') # use '*' as a fill char
'***********centered***********'
Replacing %+f, %-f, and %f and specifying a sign:
>>> '{:+f}; {:+f}'.format(3.14, -3.14) # show it always
'+3.140000; -3.140000'
>>> '{: f}; {: f}'.format(3.14, -3.14) # show a space for positive numbers
' 3.140000; -3.140000'
>>> '{:-f}; {:-f}'.format(3.14, -3.14) # show only the minus -- same as '{:f}; {:f}'
'3.140000; -3.140000'
Replacing %x and %o and converting the value to different bases:
>>> # format also supports binary numbers
>>> "int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}".format(42)
'int: 42; hex: 2a; oct: 52; bin: 101010'
>>> # with 0x, 0o, or 0b as prefix:
>>> "int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: {0:#b}".format(42)
'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010'
Templates provide simpler string substitutions as described in PEP 292.
Instead of the normal %-based substitutions, Templates support $-based substitutions, using the following rules:
$$ is an escape; it is replaced with a single $.
$identifier names a substitution placeholder matching a mapping key of
"identifier". By default, "identifier" must spell a Python
identifier. The first non-identifier character after the $ character
terminates this placeholder specification.
${identifier} is equivalent to $identifier. It is required when valid
identifier characters follow the placeholder but are not part of the
placeholder, such as "${noun}ification".
Any other appearance of $ in the string will result in a ValueError
being raised.
The string module provides a Template class that implements
these rules. The methods of Template are:
Performs the template substitution, returning a new string. mapping is
any dictionary-like object with keys that match the placeholders in the
template. Alternatively, you can provide keyword arguments, where the
keywords are the placeholders. When both mapping and kwds are given
and there are duplicates, the placeholders from kwds take precedence.
Like substitute(), except that if placeholders are missing from
mapping and kwds, instead of raising a KeyError exception, the
original placeholder will appear in the resulting string intact. Also,
unlike with substitute(), any other appearances of the $ will
simply return $ instead of raising ValueError.
While other exceptions may still occur, this method is called “safe”
because substitutions always tries to return a usable string instead of
raising an exception. In another sense, safe_substitute() may be
anything other than safe, since it will silently ignore malformed
templates containing dangling delimiters, unmatched braces, or
placeholders that are not valid Python identifiers.
Template instances also provide one public data attribute:
This is the object passed to the constructor’s template argument. In
general, you shouldn’t change it, but read-only access is not enforced.
Here is an example of how to use a Template:
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'
Advanced usage: you can derive subclasses of Template to customize the
placeholder syntax, delimiter character, or the entire regular expression used
to parse template strings. To do this, you can override these class attributes:
delimiter – This is the literal string describing a placeholder introducing
delimiter. The default value is $. Note that this should not be a
regular expression, as the implementation will call re.escape() on this
string as needed.
idpattern – This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as
appropriate). The default value is the regular expression
[_a-z][_a-z0-9]*.
flags – The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value
is re.IGNORECASE. Note that re.VERBOSE will always be added to the
flags, so custom idpatterns must follow conventions for verbose regular
expressions.
New in version 3.2:
New in version 3.2.
Alternatively, you can provide the entire regular expression pattern by
overriding the class attribute pattern. If you do this, the value must be a
regular expression object with four named capturing groups. The capturing
groups correspond to the rules given above, along with the invalid placeholder
rule:
escaped – This group matches the escape sequence, e.g. $$, in the
default pattern.
named – This group matches the unbraced placeholder name; it should not
include the delimiter in capturing group.
braced – This group matches the brace enclosed placeholder name; it should
not include either the delimiter or braces in the capturing group.
invalid – This group matches any other delimiter pattern (usually a single
delimiter), and it should appear last in the regular expression.
Split the argument into words using str.split(), capitalize each word
using str.capitalize(), and join the capitalized words using
str.join(). If the optional second argument sep is absent
or None, runs of whitespace characters are replaced by a single space
and leading and trailing whitespace are removed, otherwise sep is used to
split and join the words.
This module provides regular expression matching operations similar to
those found in Perl.
Both patterns and strings to be searched can be Unicode strings as well as
8-bit strings. However, Unicode strings and 8-bit strings cannot be mixed:
that is, you cannot match an Unicode string with a byte pattern or
vice-versa; similarly, when asking for a substitution, the replacement
string must be of the same type as both the pattern and the search string.
Regular expressions use the backslash character ('\') to indicate
special forms or to allow special characters to be used without invoking
their special meaning. This collides with Python’s usage of the same
character for the same purpose in string literals; for example, to match
a literal backslash, one might have to write '\\\\' as the pattern
string, because the regular expression must be \\, and each
backslash must be expressed as \\ inside a regular Python string
literal.
The solution is to use Python’s raw string notation for regular expression
patterns; backslashes are not handled in any special way in a string literal
prefixed with 'r'. So r"\n" is a two-character string containing
'\' and 'n', while "\n" is a one-character string containing a
newline. Usually patterns will be expressed in Python code using this raw
string notation.
It is important to note that most regular expression operations are available as
module-level functions and methods on
compiled regular expressions. The functions are shortcuts
that don’t require you to compile a regex object first, but miss some
fine-tuning parameters.
See also
Mastering Regular Expressions
Book on regular expressions by Jeffrey Friedl, published by O’Reilly. The
second edition of the book no longer covers Python at all, but the first
edition covered writing good regular expression patterns in great detail.
A regular expression (or RE) specifies a set of strings that matches it; the
functions in this module let you check if a particular string matches a given
regular expression (or if a given regular expression matches a particular
string, which comes down to the same thing).
Regular expressions can be concatenated to form new regular expressions; if A
and B are both regular expressions, then AB is also a regular expression.
In general, if a string p matches A and another string q matches B, the
string pq will match AB. This holds unless A or B contain low precedence
operations; boundary conditions between A and B; or have numbered group
references. Thus, complex expressions can easily be constructed from simpler
primitive expressions like the ones described here. For details of the theory
and implementation of regular expressions, consult the Friedl book referenced
above, or almost any textbook about compiler construction.
A brief explanation of the format of regular expressions follows. For further
information and a gentler presentation, consult the Regular Expression HOWTO.
Regular expressions can contain both special and ordinary characters. Most
ordinary characters, like 'A', 'a', or '0', are the simplest regular
expressions; they simply match themselves. You can concatenate ordinary
characters, so last matches the string 'last'. (In the rest of this
section, we’ll write RE’s in thisspecialstyle, usually without quotes, and
strings to be matched 'insinglequotes'.)
Some characters, like '|' or '(', are special. Special
characters either stand for classes of ordinary characters, or affect
how the regular expressions around them are interpreted. Regular
expression pattern strings may not contain null bytes, but can specify
the null byte using the \number notation, e.g., '\x00'.
The special characters are:
'.'
(Dot.) In the default mode, this matches any character except a newline. If
the DOTALL flag has been specified, this matches any character
including a newline.
'^'
(Caret.) Matches the start of the string, and in MULTILINE mode also
matches immediately after each newline.
'$'
Matches the end of the string or just before the newline at the end of the
string, and in MULTILINE mode also matches before a newline. foo
matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches
only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n'
matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for
a single $ in 'foo\n' will find two (empty) matches: one just before
the newline, and one at the end of the string.
'*'
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed
by any number of ‘b’s.
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE.
ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not
match just ‘a’.
'?'
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
ab? will match either ‘a’ or ‘ab’.
*?, +?, ??
The '*', '+', and '?' qualifiers are all greedy; they match
as much text as possible. Sometimes this behaviour isn’t desired; if the RE
<.*> is matched against '<H1>title</H1>', it will match the entire
string, and not just '<H1>'. Adding '?' after the qualifier makes it
perform the match in non-greedy or minimal fashion; as few
characters as possible will be matched. Using .*? in the previous
expression will match only '<H1>'.
{m}
Specifies that exactly m copies of the previous RE should be matched; fewer
matches cause the entire RE not to match. For example, a{6} will match
exactly six 'a' characters, but not five.
{m,n}
Causes the resulting RE to match from m to n repetitions of the preceding
RE, attempting to match as many repetitions as possible. For example,
a{3,5} will match from 3 to 5 'a' characters. Omitting m specifies a
lower bound of zero, and omitting n specifies an infinite upper bound. As an
example, a{4,}b will match aaaab or a thousand 'a' characters
followed by a b, but not aaab. The comma may not be omitted or the
modifier would be confused with the previously described form.
{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding
RE, attempting to match as few repetitions as possible. This is the
non-greedy version of the previous qualifier. For example, on the
6-character string 'aaaaaa', a{3,5} will match 5 'a' characters,
while a{3,5}? will only match 3 characters.
'\'
Either escapes special characters (permitting you to match characters like
'*', '?', and so forth), or signals a special sequence; special
sequences are discussed below.
If you’re not using a raw string to express the pattern, remember that Python
also uses the backslash as an escape sequence in string literals; if the escape
sequence isn’t recognized by Python’s parser, the backslash and subsequent
character are included in the resulting string. However, if Python would
recognize the resulting sequence, the backslash should be repeated twice. This
is complicated and hard to understand, so it’s highly recommended that you use
raw strings for all but the simplest expressions.
[]
Used to indicate a set of characters. Characters can be listed individually, or
a range of characters can be indicated by giving two characters and separating
them by a '-'. Special characters are not active inside sets. For example,
[akm$] will match any of the characters 'a', 'k',
'm', or '$'; [a-z] will match any lowercase letter, and
[a-zA-Z0-9] matches any letter or digit. Character classes such
as \w or \S (defined below) are also acceptable inside a
range, although the characters they match depends on whether
ASCII or LOCALE mode is in force. If you want to
include a ']' or a '-' inside a set, precede it with a
backslash, or place it as the first character. The pattern []]
will match ']', for example.
You can match the characters not within a range by complementing the set.
This is indicated by including a '^' as the first character of the set;
'^' elsewhere will simply match the '^' character. For example,
[^5] will match any character except '5', and [^^] will match any
character except '^'.
Note that inside [] the special forms and special characters lose
their meanings and only the syntaxes described here are valid. For
example, +, *, (, ), and so on are treated as
literals inside [], and backreferences cannot be used inside
[].
'|'
A|B, where A and B can be arbitrary REs, creates a regular expression that
will match either A or B. An arbitrary number of REs can be separated by the
'|' in this way. This can be used inside groups (see below) as well. As
the target string is scanned, REs separated by '|' are tried from left to
right. When one pattern completely matches, that branch is accepted. This means
that once A matches, B will not be tested further, even if it would
produce a longer overall match. In other words, the '|' operator is never
greedy. To match a literal '|', use \|, or enclose it inside a
character class, as in [|].
(...)
Matches whatever regular expression is inside the parentheses, and indicates the
start and end of a group; the contents of a group can be retrieved after a match
has been performed, and can be matched later in the string with the \number
special sequence, described below. To match the literals '(' or ')',
use \( or \), or enclose them inside a character class: [(][)].
(?...)
This is an extension notation (a '?' following a '(' is not meaningful
otherwise). The first character after the '?' determines what the meaning
and further syntax of the construct is. Extensions usually do not create a new
group; (?P<name>...) is the only exception to this rule. Following are the
currently supported extensions.
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm',
's', 'u', 'x'.) The group matches the empty string; the
letters set the corresponding flags: re.A (ASCII-only matching),
re.I (ignore case), re.L (locale dependent),
re.M (multi-line), re.S (dot matches all),
and re.X (verbose), for the entire regular expression. (The
flags are described in Module Contents.) This
is useful if you wish to include the flags as part of the regular
expression, instead of passing a flag argument to the
re.compile() function.
Note that the (?x) flag changes how the expression is parsed. It should be
used first in the expression string, or after one or more whitespace characters.
If there are non-whitespace characters before the flag, the results are
undefined.
(?:...)
A non-capturing version of regular parentheses. Matches whatever regular
expression is inside the parentheses, but the substring matched by the group
cannot be retrieved after performing a match or referenced later in the
pattern.
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is
accessible within the rest of the regular expression via the symbolic group
name name. Group names must be valid Python identifiers, and each group
name must be defined only once within a regular expression. A symbolic group
is also a numbered group, just as if the group were not named. So the group
named id in the example below can also be referenced as the numbered group
1.
For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be
referenced by its name in arguments to methods of match objects, such as
m.group('id') or m.end('id'), and also by name in the regular
expression itself (using (?P=id)) and replacement text given to
.sub() (using \g<id>).
(?P=name)
Matches whatever text was matched by the earlier group named name.
(?#...)
A comment; the contents of the parentheses are simply ignored.
(?=...)
Matches if ... matches next, but doesn’t consume any of the string. This is
called a lookahead assertion. For example, Isaac(?=Asimov) will match
'Isaac' only if it’s followed by 'Asimov'.
(?!...)
Matches if ... doesn’t match next. This is a negative lookahead assertion.
For example, Isaac(?!Asimov) will match 'Isaac' only if it’s not
followed by 'Asimov'.
(?<=...)
Matches if the current position in the string is preceded by a match for ...
that ends at the current position. This is called a positive lookbehind
assertion. (?<=abc)def will find a match in abcdef, since the
lookbehind will back up 3 characters and check if the contained pattern matches.
The contained pattern must only match strings of some fixed length, meaning that
abc or a|b are allowed, but a* and a{3,4} are not. Note that
patterns which start with positive lookbehind assertions will never match at the
beginning of the string being searched; you will most likely want to use the
search() function rather than the match() function:
>>> import re
>>> m = re.search('(?<=abc)def', 'abcdef')
>>> m.group(0)
'def'
This example looks for a word following a hyphen:
>>> m = re.search('(?<=-)\w+', 'spam-egg')
>>> m.group(0)
'egg'
(?<!...)
Matches if the current position in the string is not preceded by a match for
.... This is called a negative lookbehind assertion. Similar to
positive lookbehind assertions, the contained pattern must only match strings of
some fixed length. Patterns which start with negative lookbehind assertions may
match at the beginning of the string being searched.
(?(id/name)yes-pattern|no-pattern)
Will try to match with yes-pattern if the group with given id or
name exists, and with no-pattern if it doesn’t. no-pattern is
optional and can be omitted. For example,
(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) is a poor email matching pattern, which
will match with '<user@host.com>' as well as 'user@host.com', but
not with '<user@host.com' nor 'user@host.com>' .
The special sequences consist of '\' and a character from the list below.
If the ordinary character is not on the list, then the resulting RE will match
the second character. For example, \$ matches the character '$'.
\number
Matches the contents of the group of the same number. Groups are numbered
starting from 1. For example, (.+)\1 matches 'thethe' or '5555',
but not 'theend' (note the space after the group). This special sequence
can only be used to match one of the first 99 groups. If the first digit of
number is 0, or number is 3 octal digits long, it will not be interpreted as
a group match, but as the character with octal value number. Inside the
'[' and ']' of a character class, all numeric escapes are treated as
characters.
\A
Matches only at the start of the string.
\b
Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of Unicode alphanumeric or underscore
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric, non-underscore Unicode character. Note that
formally, \b is defined as the boundary between a \w and a
\W character (or vice versa). By default Unicode alphanumerics
are the ones used, but this can be changed by using the ASCII
flag. Inside a character range, \b represents the backspace
character, for compatibility with Python’s string literals.
\B
Matches the empty string, but only when it is not at the beginning or end of a
word. This is just the opposite of \b, so word characters are
Unicode alphanumerics or the underscore, although this can be changed
by using the ASCII flag.
\d
For Unicode (str) patterns:
Matches any Unicode decimal digit (that is, any character in
Unicode character category [Nd]). This includes [0-9], and
also many other digit characters. If the ASCII flag is
used only [0-9] is matched (but the flag affects the entire
regular expression, so in such cases using an explicit [0-9]
may be a better choice).
For 8-bit (bytes) patterns:
Matches any decimal digit; this is equivalent to [0-9].
\D
Matches any character which is not a Unicode decimal digit. This is
the opposite of \d. If the ASCII flag is used this
becomes the equivalent of [^0-9] (but the flag affects the entire
regular expression, so in such cases using an explicit [^0-9] may
be a better choice).
\s
For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes
[\t\n\r\f\v], and also many other characters, for example the
non-breaking spaces mandated by typography rules in many
languages). If the ASCII flag is used, only
[\t\n\r\f\v] is matched (but the flag affects the entire
regular expression, so in such cases using an explicit
[\t\n\r\f\v] may be a better choice).
For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set;
this is equivalent to [\t\n\r\f\v].
\S
Matches any character which is not a Unicode whitespace character. This is
the opposite of \s. If the ASCII flag is used this
becomes the equivalent of [^\t\n\r\f\v] (but the flag affects the entire
regular expression, so in such cases using an explicit [^\t\n\r\f\v] may
be a better choice).
\w
For Unicode (str) patterns:
Matches Unicode word characters; this includes most characters
that can be part of a word in any language, as well as numbers and
the underscore. If the ASCII flag is used, only
[a-zA-Z0-9_] is matched (but the flag affects the entire
regular expression, so in such cases using an explicit
[a-zA-Z0-9_] may be a better choice).
For 8-bit (bytes) patterns:
Matches characters considered alphanumeric in the ASCII character set;
this is equivalent to [a-zA-Z0-9_].
\W
Matches any character which is not a Unicode word character. This is
the opposite of \w. If the ASCII flag is used this
becomes the equivalent of [^a-zA-Z0-9_] (but the flag affects the
entire regular expression, so in such cases using an explicit
[^a-zA-Z0-9_] may be a better choice).
\Z
Matches only at the end of the string.
Most of the standard escapes supported by Python string literals are also
accepted by the regular expression parser:
\a \b \f \n
\r \t \v \x
\\
Octal escapes are included in a limited form: If the first digit is a 0, or if
there are three octal digits, it is considered an octal escape. Otherwise, it is
a group reference. As for string literals, octal escapes are always at most
three digits in length.
Python offers two different primitive operations based on regular expressions:
match checks for a match only at the beginning of the string, while
search checks for a match anywhere in the string (this is what Perl does
by default).
Note that match may differ from search even when using a regular expression
beginning with '^': '^' matches only at the start of the string, or in
MULTILINE mode also immediately following a newline. The “match”
operation succeeds only if the pattern matches at the start of the string
regardless of mode, or at the starting position given by the optional pos
argument regardless of whether a newline precedes it.
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<_sre.SRE_Match object at ...>
The module defines several functions, constants, and an exception. Some of the
functions are simplified versions of the full featured methods for compiled
regular expressions. Most non-trivial applications always use the compiled
form.
Compile a regular expression pattern into a regular expression object, which
can be used for matching using its match() and search() methods,
described below.
The expression’s behaviour can be modified by specifying a flags value.
Values can be any of the following variables, combined using bitwise OR (the
| operator).
The sequence
prog = re.compile(pattern)
result = prog.match(string)
is equivalent to
result = re.match(pattern, string)
but using re.compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be used several
times in a single program.
Note
The compiled versions of the most recent patterns passed to
re.match(), re.search() or re.compile() are cached, so
programs that use only a few regular expressions at a time needn’t worry
about compiling regular expressions.
Make \w, \W, \b, \B, \d, \D, \s and \S
perform ASCII-only matching instead of full Unicode matching. This is only
meaningful for Unicode patterns, and is ignored for byte patterns.
Note that for backward compatibility, the re.U flag still
exists (as well as its synonym re.UNICODE and its embedded
counterpart (?u)), but these are redundant in Python 3 since
matches are Unicode by default for strings (and Unicode matching
isn’t allowed for bytes).
Perform case-insensitive matching; expressions like [A-Z] will match
lowercase letters, too. This is not affected by the current locale
and works for Unicode characters as expected.
Make \w, \W, \b, \B, \s and \S dependent on the
current locale. The use of this flag is discouraged as the locale mechanism
is very unreliable, and it only handles one “culture” at a time anyway;
you should use Unicode matching instead, which is the default in Python 3
for Unicode (str) patterns.
When specified, the pattern character '^' matches at the beginning of the
string and at the beginning of each line (immediately following each newline);
and the pattern character '$' matches at the end of the string and at the
end of each line (immediately preceding each newline). By default, '^'
matches only at the beginning of the string, and '$' only at the end of the
string and immediately before the newline (if any) at the end of the string.
This flag allows you to write regular expressions that look nicer. Whitespace
within the pattern is ignored, except when in a character class or preceded by
an unescaped backslash, and, when a line contains a '#' neither in a
character class or preceded by an unescaped backslash, all characters from the
leftmost such '#' through the end of the line are ignored.
That means that the two following regular expression objects that match a
decimal number are functionally equal:
a = re.compile(r"""\d + # the integral part
\. # the decimal point
\d * # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")
Scan through string looking for a location where the regular expression
pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the
pattern; note that this is different from finding a zero-length match at some
point in the string.
If zero or more characters at the beginning of string match the regular
expression pattern, return a corresponding match object. Return None if the string does not match the pattern;
note that this is different from a zero-length match.
Note
If you want to locate a match anywhere in string, use search()
instead.
Split string by the occurrences of pattern. If capturing parentheses are
used in pattern, then the text of all groups in the pattern are also returned
as part of the resulting list. If maxsplit is nonzero, at most maxsplit
splits occur, and the remainder of the string is returned as the final element
of the list.
If there are capturing groups in the separator and it matches at the start of
the string, the result will start with an empty string. The same holds for
the end of the string:
That way, separator components are always found at the same relative
indices within the result list (e.g., if there’s one capturing group
in the separator, the 0th, the 2nd and so forth).
Note that split will never split a string on an empty pattern match.
For example:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned in
the order found. If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern has more than
one group. Empty matches are included in the result unless they touch the
beginning of another match.
Return an iterator yielding match objects over
all non-overlapping matches for the RE pattern in string. The string
is scanned left-to-right, and matches are returned in the order found. Empty
matches are included in the result unless they touch the beginning of another
match.
Return the string obtained by replacing the leftmost non-overlapping occurrences
of pattern in string by the replacement repl. If the pattern isn’t found,
string is returned unchanged. repl can be a string or a function; if it is
a string, any backslash escapes in it are processed. That is, \n is
converted to a single newline character, \r is converted to a carriage return, and
so forth. Unknown escapes such as \j are left alone. Backreferences, such
as \6, are replaced with the substring matched by group 6 in the pattern.
For example:
If repl is a function, it is called for every non-overlapping occurrence of
pattern. The function takes a single match object argument, and returns the
replacement string. For example:
The optional argument count is the maximum number of pattern occurrences to be
replaced; count must be a non-negative integer. If omitted or zero, all
occurrences will be replaced. Empty matches for the pattern are replaced only
when not adjacent to a previous match, so sub('x*','-','abc') returns
'-a-b-c-'.
In addition to character escapes and backreferences as described above,
\g<name> will use the substring matched by the group named name, as
defined by the (?P<name>...) syntax. \g<number> uses the corresponding
group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous
in a replacement such as \g<2>0. \20 would be interpreted as a
reference to group 20, not a reference to group 2 followed by the literal
character '0'. The backreference \g<0> substitutes in the entire
substring matched by the RE.
Changed in version 3.1:
Changed in version 3.1: Added the optional flags argument.
Return string with all non-alphanumerics backslashed; this is useful if you
want to match an arbitrary literal string that may have regular expression
metacharacters in it.
Exception raised when a string passed to one of the functions here is not a
valid regular expression (for example, it might contain unmatched parentheses)
or when some other error occurs during compilation or matching. It is never an
error if a string contains no match for a pattern.
Scan through string looking for a location where this regular expression
produces a match, and return a corresponding match object. Return None if no position in the string matches the
pattern; note that this is different from finding a zero-length match at some
point in the string.
The optional second parameter pos gives an index in the string where the
search is to start; it defaults to 0. This is not completely equivalent to
slicing the string; the '^' pattern character matches at the real beginning
of the string and at positions just after a newline, but not necessarily at the
index where the search is to start.
The optional parameter endpos limits how far the string will be searched; it
will be as if the string is endpos characters long, so only the characters
from pos to endpos-1 will be searched for a match. If endpos is less
than pos, no match will be found, otherwise, if rx is a compiled regular
expression object, rx.search(string,0,50) is equivalent to
rx.search(string[:50],0).
>>> pattern = re.compile("d")
>>> pattern.search("dog") # Match at index 0
<_sre.SRE_Match object at ...>
>>> pattern.search("dog", 1) # No match; search doesn't include the "d"
If zero or more characters at the beginning of string match this regular
expression, return a corresponding match object.
Return None if the string does not match the pattern; note that this is
different from a zero-length match.
The optional pos and endpos parameters have the same meaning as for the
search() method.
Note
If you want to locate a match anywhere in string, use
search() instead.
>>> pattern = re.compile("o")
>>> pattern.match("dog") # No match as "o" is not at the start of "dog".
>>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog".
<_sre.SRE_Match object at ...>
Similar to the findall() function, using the compiled pattern, but
also accepts optional pos and endpos parameters that limit the search
region like for match().
Similar to the finditer() function, using the compiled pattern, but
also accepts optional pos and endpos parameters that limit the search
region like for match().
A dictionary mapping any symbolic group names defined by (?P<id>) to group
numbers. The dictionary is empty if no symbolic groups were used in the
pattern.
Match objects always have a boolean value of True, so that you can test
whether e.g. match() resulted in a match with a simple if statement. They
support the following methods and attributes:
Return the string obtained by doing backslash substitution on the template
string template, as done by the sub() method.
Escapes such as \n are converted to the appropriate characters,
and numeric backreferences (\1, \2) and named backreferences
(\g<1>, \g<name>) are replaced by the contents of the
corresponding group.
Returns one or more subgroups of the match. If there is a single argument, the
result is a single string; if there are multiple arguments, the result is a
tuple with one item per argument. Without arguments, group1 defaults to zero
(the whole match is returned). If a groupN argument is zero, the corresponding
return value is the entire matching string; if it is in the inclusive range
[1..99], it is the string matching the corresponding parenthesized group. If a
group number is negative or larger than the number of groups defined in the
pattern, an IndexError exception is raised. If a group is contained in a
part of the pattern that did not match, the corresponding result is None.
If a group is contained in a part of the pattern that matched multiple times,
the last match is returned.
>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0) # The entire match
'Isaac Newton'
>>> m.group(1) # The first parenthesized subgroup.
'Isaac'
>>> m.group(2) # The second parenthesized subgroup.
'Newton'
>>> m.group(1, 2) # Multiple arguments give us a tuple.
('Isaac', 'Newton')
If the regular expression uses the (?P<name>...) syntax, the groupN
arguments may also be strings identifying groups by their group name. If a
string argument is not used as a group name in the pattern, an IndexError
exception is raised.
Return a tuple containing all the subgroups of the match, from 1 up to however
many groups are in the pattern. The default argument is used for groups that
did not participate in the match; it defaults to None.
For example:
>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
>>> m.groups()
('24', '1632')
If we make the decimal place and everything after it optional, not all groups
might participate in the match. These groups will default to None unless
the default argument is given:
>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups() # Second group defaults to None.
('24', None)
>>> m.groups('0') # Now, the second group defaults to '0'.
('24', '0')
Return a dictionary containing all the named subgroups of the match, keyed by
the subgroup name. The default argument is used for groups that did not
participate in the match; it defaults to None. For example:
Return the indices of the start and end of the substring matched by group;
group defaults to zero (meaning the whole matched substring). Return -1 if
group exists but did not contribute to the match. For a match object m, and
a group g that did contribute to the match, the substring matched by group g
(equivalent to m.group(g)) is
m.string[m.start(g):m.end(g)]
Note that m.start(group) will equal m.end(group) if group matched a
null string. For example, after m=re.search('b(c?)','cba'),
m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both
2, and m.start(2) raises an IndexError exception.
An example that will remove remove_this from email addresses:
For a match m, return the 2-tuple (m.start(group),m.end(group)). Note
that if group did not contribute to the match, this is (-1,-1).
group defaults to zero, the entire match.
The value of pos which was passed to the search() or
match() method of a match object. This
is the index into the string at which the RE engine started looking for a
match.
The value of endpos which was passed to the search() or
match() method of a match object. This
is the index into the string beyond which the RE engine will not go.
The integer index of the last matched capturing group, or None if no group
was matched at all. For example, the expressions (a)b, ((a)(b)), and
((ab)) will have lastindex==1 if applied to the string 'ab', while
the expression (a)(b) will have lastindex==2, if applied to the same
string.
In this example, we’ll use the following helper function to display match
objects a little more gracefully:
def displaymatch(match):
if match is None:
return None
return '<Match: %r, groups=%r>' % (match.group(), match.groups())
Suppose you are writing a poker program where a player’s hand is represented as
a 5-character string with each character representing a card, “a” for ace, “k”
for king, “q” for queen, j for jack, “0” for 10, and “1” through “9”
representing the card with that value.
To see if a given string is a valid hand, one could do the following:
That last hand, "727ak", contained a pair, or two of the same valued cards.
To match this with a regular expression, one could use backreferences as such:
>>> pair = re.compile(r".*(.).*\1")
>>> displaymatch(pair.match("717ak")) # Pair of 7s.
"<Match: '717', groups=('7',)>"
>>> displaymatch(pair.match("718ak")) # No pairs.
>>> displaymatch(pair.match("354aa")) # Pair of aces.
"<Match: '354aa', groups=('a',)>"
To find out what card the pair consists of, one could use the
group() method of the match object in the following manner:
>>> pair.match("717ak").group(1)
'7'
# Error because re.match() returns None, which doesn't have a group() method:
>>> pair.match("718ak").group(1)
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
re.match(r".*(.).*\1", "718ak").group(1)
AttributeError: 'NoneType' object has no attribute 'group'
>>> pair.match("354aa").group(1)
'a'
Python does not currently have an equivalent to scanf(). Regular
expressions are generally more powerful, though also more verbose, than
scanf() format strings. The table below offers some more-or-less
equivalent mappings between scanf() format tokens and regular
expressions.
scanf() Token
Regular Expression
%c
.
%5c
.{5}
%d
[-+]?\d+
%e, %E, %f, %g
[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
%i
[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)
%o
0[0-7]*
%s
\S+
%u
\d+
%x, %X
0[xX][\dA-Fa-f]+
To extract the filename and numbers from a string like
If you create regular expressions that require the engine to perform a lot of
recursion, you may encounter a RuntimeError exception with the message
maximumrecursionlimit exceeded. For example,
>>> s = 'Begin ' + 1000*'a very long string ' + 'end'
>>> re.match('Begin (\w| )*? end', s).end()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python3.2/re.py", line 132, in match
return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded
You can often restructure your regular expression to avoid recursion.
Simple uses of the *? pattern are special-cased to avoid recursion. Thus,
the above regular expression can avoid recursion by being recast as Begin[a-zA-Z0-9_]*?end. As a further benefit, such regular expressions will run
faster than their recursive equivalents.
In a nutshell, match() only attempts to match a pattern at the beginning
of a string where search() will match a pattern anywhere in a string.
For example:
>>> re.match("o", "dog") # No match as "o" is not the first letter of "dog".
>>> re.search("o", "dog") # Match as search() looks everywhere in the string.
<_sre.SRE_Match object at ...>
Note
The following applies only to regular expression objects like those created
with re.compile("pattern"), not the primitives re.match(pattern,string) or re.search(pattern,string).
match() has an optional second parameter that gives an index in the string
where the search is to start:
>>> pattern = re.compile("o")
>>> pattern.match("dog") # No match as "o" is not at the start of "dog."
# Equivalent to the above expression as 0 is the default starting index:
>>> pattern.match("dog", 0)
# Match as "o" is the 2nd character of "dog" (index 0 is the first):
>>> pattern.match("dog", 1)
<_sre.SRE_Match object at ...>
>>> pattern.match("dog", 2) # No match as "o" is not the 3rd character of "dog."
split() splits a string into a list delimited by the passed pattern. The
method is invaluable for converting textual data into data structures that can be
easily read and modified by Python as demonstrated in the following example that
creates a phonebook.
First, here is the input. Normally it may come from a file, here we are using
triple-quoted string syntax:
>>> input = """Ross McFluff: 834.345.1254 155 Elm Street
...
... Ronald Heathmore: 892.345.3428 436 Finley Avenue
... Frank Burger: 925.541.7625 662 South Dogwood Way
...
...
... Heather Albrecht: 548.326.4584 919 Park Place"""
The entries are separated by one or more newlines. Now we convert the string
into a list with each nonempty line having its own entry:
Finally, split each entry into a list with first name, last name, telephone
number, and address. We use the maxsplit parameter of split()
because the address has spaces, our splitting pattern, in it:
>>> [re.split(":? ", entry, 3) for entry in entries]
[['Ross', 'McFluff', '834.345.1254', '155 Elm Street'],
['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'],
['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'],
['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]
The :? pattern matches the colon after the last name, so that it does not
occur in the result list. With a maxsplit of 4, we could separate the
house number from the street name:
sub() replaces every occurrence of a pattern with a string or the
result of a function. This example demonstrates using sub() with
a function to “munge” text, or randomize the order of all the characters
in each word of a sentence except for the first and last characters:
findall() matches all occurrences of a pattern, not just the first
one as search() does. For example, if one was a writer and wanted to
find all of the adverbs in some text, he or she might use findall() in
the following manner:
>>> text = "He was carefully disguised but captured quickly by police."
>>> re.findall(r"\w+ly", text)
['carefully', 'quickly']
If one wants more information about all matches of a pattern than the matched
text, finditer() is useful as it provides match objects instead of strings. Continuing with the previous example, if
one was a writer who wanted to find all of the adverbs and their positions in
some text, he or she would use finditer() in the following manner:
>>> text = "He was carefully disguised but captured quickly by police."
>>> for m in re.finditer(r"\w+ly", text):
... print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0)))
07-16: carefully
40-47: quickly
Raw string notation (r"text") keeps regular expressions sane. Without it,
every backslash ('\') in a regular expression would have to be prefixed with
another one to escape it. For example, the two following lines of code are
functionally identical:
>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object at ...>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object at ...>
When one wants to match a literal backslash, it must be escaped in the regular
expression. With raw string notation, this means r"\\". Without raw string
notation, one must use "\\\\", making the following lines of code
functionally identical:
>>> re.match(r"\\", r"\\")
<_sre.SRE_Match object at ...>
>>> re.match("\\\\", r"\\")
<_sre.SRE_Match object at ...>
A tokenizer or scanner
analyzes a string to categorize groups of characters. This is a useful first
step in writing a compiler or interpreter.
The text categories are specified with regular expressions. The technique is
to combine those into a single master regular expression and to loop over
successive matches:
import collections
import re
Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])
def tokenize(s):
keywords = {'IF', 'THEN', 'ENDIF', 'FOR', 'NEXT', 'GOSUB', 'RETURN'}
token_specification = [
('NUMBER', r'\d+(\.\d*)?'), # Integer or decimal number
('ASSIGN', r':='), # Assignment operator
('END', r';'), # Statement terminator
('ID', r'[A-Za-z]+'), # Identifiers
('OP', r'[+*\/\-]'), # Arithmetic operators
('NEWLINE', r'\n'), # Line endings
('SKIP', r'[ \t]'), # Skip over spaces and tabs
]
tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
get_token = re.compile(tok_regex).match
line = 1
pos = line_start = 0
mo = get_token(s)
while mo is not None:
typ = mo.lastgroup
if typ == 'NEWLINE':
line_start = pos
line += 1
elif typ != 'SKIP':
val = mo.group(typ)
if typ == 'ID' and val in keywords:
typ = val
yield Token(typ, val, line, mo.start()-line_start)
pos = mo.end()
mo = get_token(s, pos)
if pos != len(s):
raise RuntimeError('Unexpected character %r on line %d' %(s[pos], line))
statements = '''
IF quantity THEN
total := total + price * quantity;
tax := price * 0.05;
ENDIF;
'''
for token in tokenize(statements):
print(token)
This module performs conversions between Python values and C structs represented
as Python bytes objects. This can be used in handling binary data
stored in files or from network connections, among other sources. It uses
Format Strings as compact descriptions of the layout of the C
structs and the intended conversion to/from Python values.
Note
By default, the result of packing a given C struct includes pad bytes in
order to maintain proper alignment for the C types involved; similarly,
alignment is taken into account when unpacking. This behavior is chosen so
that the bytes of a packed struct correspond exactly to the layout in memory
of the corresponding C struct. To handle platform-independent data formats
or omit implicit pad bytes, use standard size and alignment instead of
native size and alignment: see Byte Order, Size, and Alignment for details.
Return a bytes object containing the values v1, v2, ... packed according
to the format string fmt. The arguments must match the values required by
the format exactly.
Pack the values v1, v2, ... according to the format string fmt and
write the packed bytes into the writable buffer buffer starting at
position offset. Note that offset is a required argument.
Unpack from the buffer buffer (presumably packed by pack(fmt,...))
according to the format string fmt. The result is a tuple even if it
contains exactly one item. The buffer must contain exactly the amount of
data required by the format (len(bytes) must equal calcsize(fmt)).
Unpack from buffer starting at position offset, according to the format
string fmt. The result is a tuple even if it contains exactly one
item. buffer must contain at least the amount of data required by the
format (len(buffer[offset:]) must be at least calcsize(fmt)).
Format strings are the mechanism used to specify the expected layout when
packing and unpacking data. They are built up from Format Characters,
which specify the type of data being packed/unpacked. In addition, there are
special characters for controlling the Byte Order, Size, and Alignment.
By default, C types are represented in the machine’s native format and byte
order, and properly aligned by skipping pad bytes if necessary (according to the
rules used by the C compiler).
Alternatively, the first character of the format string can be used to indicate
the byte order, size and alignment of the packed data, according to the
following table:
Character
Byte order
Size
Alignment
@
native
native
native
=
native
standard
none
<
little-endian
standard
none
>
big-endian
standard
none
!
network (= big-endian)
standard
none
If the first character is not one of these, '@' is assumed.
Native byte order is big-endian or little-endian, depending on the host
system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
switchable endianness (bi-endian). Use sys.byteorder to check the
endianness of your system.
Native size and alignment are determined using the C compiler’s
sizeof expression. This is always combined with native byte order.
Standard size depends only on the format character; see the table in
the Format Characters section.
Note the difference between '@' and '=': both use native byte order, but
the size and alignment of the latter is standardized.
The form '!' is available for those poor souls who claim they can’t remember
whether network byte order is big-endian or little-endian.
There is no way to indicate non-native byte order (force byte-swapping); use the
appropriate choice of '<' or '>'.
Notes:
Padding is only automatically added between successive structure members.
No padding is added at the beginning or the end of the encoded struct.
No padding is added when using non-native size and alignment, e.g.
with ‘<’, ‘>’, ‘=’, and ‘!’.
To align the end of a structure to the alignment requirement of a
particular type, end the format with the code for that type with a repeat
count of zero. See Examples.
Format characters have the following meaning; the conversion between C and
Python values should be obvious given their types. The ‘Standard size’ column
refers to the size of the packed value in bytes when using standard size; that
is, when the format string starts with one of '<', '>', '!' or
'='. When using native size, the size of the packed value is
platform-dependent.
Format
C Type
Python type
Standard size
Notes
x
pad byte
no value
c
char
bytes of length 1
1
b
signedchar
integer
1
(1),(3)
B
unsignedchar
integer
1
(3)
?
_Bool
bool
1
(1)
h
short
integer
2
(3)
H
unsignedshort
integer
2
(3)
i
int
integer
4
(3)
I
unsignedint
integer
4
(3)
l
long
integer
4
(3)
L
unsignedlong
integer
4
(3)
q
longlong
integer
8
(2), (3)
Q
unsignedlonglong
integer
8
(2), (3)
f
float
float
4
(4)
d
double
float
8
(4)
s
char[]
bytes
p
char[]
bytes
P
void*
integer
(5)
Notes:
The '?' conversion code corresponds to the _Bool type defined by
C99. If this type is not available, it is simulated using a char. In
standard mode, it is always represented by one byte.
The 'q' and 'Q' conversion codes are available in native mode only if
the platform C compiler supports C longlong, or, on Windows,
__int64. They are always available in standard modes.
When attempting to pack a non-integer using any of the integer conversion
codes, if the non-integer has a __index__() method then that method is
called to convert the argument to an integer before packing.
Changed in version 3.2:
Changed in version 3.2: Use of the __index__() method for non-integers is new in 3.2.
For the 'f' and 'd' conversion codes, the packed representation uses
the IEEE 754 binary32 (for 'f') or binary64 (for 'd') format,
regardless of the floating-point format used by the platform.
The 'P' format character is only available for the native byte ordering
(selected as the default or with the '@' byte order character). The byte
order character '=' chooses to use little- or big-endian ordering based
on the host system. The struct module does not interpret this as native
ordering, so the 'P' format is not available.
A format character may be preceded by an integral repeat count. For example,
the format string '4h' means exactly the same as 'hhhh'.
Whitespace characters between formats are ignored; a count and its format must
not contain whitespace though.
For the 's' format character, the count is interpreted as the length of the
bytes, not a repeat count like for the other format characters; for example,
'10s' means a single 10-byte string, while '10c' means 10 characters.
If a count is not given, it defaults to 1. For packing, the string is
truncated or padded with null bytes as appropriate to make it fit. For
unpacking, the resulting bytes object always has exactly the specified number
of bytes. As a special case, '0s' means a single, empty string (while
'0c' means 0 characters).
When packing a value x using one of the integer formats ('b',
'B', 'h', 'H', 'i', 'I', 'l', 'L',
'q', 'Q'), if x is outside the valid range for that format
then struct.error is raised.
Changed in version 3.1:
Changed in version 3.1: In 3.0, some of the integer formats wrapped out-of-range values and
raised DeprecationWarning instead of struct.error.
The 'p' format character encodes a “Pascal string”, meaning a short
variable-length string stored in a fixed number of bytes, given by the count.
The first byte stored is the length of the string, or 255, whichever is
smaller. The bytes of the string follow. If the string passed in to
pack() is too long (longer than the count minus 1), only the leading
count-1 bytes of the string are stored. If the string is shorter than
count-1, it is padded with null bytes so that exactly count bytes in all
are used. Note that for unpack(), the 'p' format character consumes
count bytes, but that the string returned can never contain more than 255
bytes.
For the '?' format character, the return value is either True or
False. When packing, the truth value of the argument object is used.
Either 0 or 1 in the native or standard bool representation will be packed, and
any non-zero value will be True when unpacking.
Return a new Struct object which writes and reads binary data according to
the format string format. Creating a Struct object once and calling its
methods is more efficient than calling the struct functions with the
same format since the format string only needs to be compiled once.
Compiled Struct objects support the following methods and attributes:
This module provides classes and functions for comparing sequences. It
can be used for example, for comparing files, and can produce difference
information in various formats, including HTML and context and unified
diffs. For comparing directories and files, see also, the filecmp module.
This is a flexible class for comparing pairs of sequences of any type, so long
as the sequence elements are hashable. The basic algorithm predates, and is a
little fancier than, an algorithm published in the late 1980’s by Ratcliff and
Obershelp under the hyperbolic name “gestalt pattern matching.” The idea is to
find the longest contiguous matching subsequence that contains no “junk”
elements (the Ratcliff and Obershelp algorithm doesn’t address junk). The same
idea is then applied recursively to the pieces of the sequences to the left and
to the right of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that “look right” to people.
Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
case and quadratic time in the expected case. SequenceMatcher is
quadratic time for the worst case and has expected-case behavior dependent in a
complicated way on how many elements the sequences have in common; best case
time is linear.
Automatic junk heuristic:SequenceMatcher supports a heuristic that
automatically treats certain sequence items as junk. The heuristic counts how many
times each individual item appears in the sequence. If an item’s duplicates (after
the first one) account for more than 1% of the sequence and the sequence is at least
200 items long, this item is marked as “popular” and is treated as junk for
the purpose of sequence matching. This heuristic can be turned off by setting
the autojunk argument to False when creating the SequenceMatcher.
This is a class for comparing sequences of lines of text, and producing
human-readable differences or deltas. Differ uses SequenceMatcher
both to compare sequences of lines, and to compare sequences of characters
within similar (near-matching) lines.
Each line of a Differ delta begins with a two-letter code:
Code
Meaning
'-'
line unique to sequence 1
'+'
line unique to sequence 2
''
line common to both sequences
'?'
line not present in either input sequence
Lines beginning with ‘?‘ attempt to guide the eye to intraline differences,
and were not present in either input sequence. These lines can be confusing if
the sequences contain tab characters.
This class can be used to create an HTML table (or a complete HTML file
containing the table) showing a side by side, line by line comparison of text
with inter-line and intra-line change highlights. The table can be generated in
either full or contextual difference mode.
tabsize is an optional keyword argument to specify tab stop spacing and
defaults to 8.
wrapcolumn is an optional keyword to specify column number where lines are
broken and wrapped, defaults to None where lines are not wrapped.
linejunk and charjunk are optional keyword arguments passed into ndiff()
(used by HtmlDiff to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions.
Compares fromlines and tolines (lists of strings) and returns a string which
is a complete HTML file containing a table showing line by line differences with
inter-line and intra-line changes highlighted.
fromdesc and todesc are optional keyword arguments to specify from/to file
column header strings (both default to an empty string).
context and numlines are both optional keyword arguments. Set context to
True when contextual differences are to be shown, else the default is
False to show the full files. numlines defaults to 5. When context
is Truenumlines controls the number of context lines which surround the
difference highlights. When context is Falsenumlines controls the
number of lines which are shown before a difference highlight when using the
“next” hyperlinks (setting to zero would cause the “next” hyperlinks to place
the next difference highlight at the top of the browser without any leading
context).
Compares fromlines and tolines (lists of strings) and returns a string which
is a complete HTML table showing line by line differences with inter-line and
intra-line changes highlighted.
The arguments for this method are the same as those for the make_file()
method.
Tools/scripts/diff.py is a command-line front-end to this class and
contains a good example of its use.
Compare a and b (lists of strings); return a delta (a generator
generating the delta lines) in context diff format.
Context diffs are a compact way of showing just the lines that have changed plus
a few lines of context. The changes are shown in a before/after style. The
number of context lines is set by n which defaults to three.
By default, the diff control lines (those with *** or ---) are created
with a trailing newline. This is helpful so that inputs created from
file.readlines() result in diffs that are suitable for use with
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm argument to
"" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and modification
times. Any or all of these may be specified using strings for fromfile,
tofile, fromfiledate, and tofiledate. The modification times are normally
expressed in the ISO 8601 format. If not specified, the
strings default to blanks.
Return a list of the best “good enough” matches. word is a sequence for which
close matches are desired (typically a string), and possibilities is a list of
sequences against which to match word (typically a list of strings).
Optional argument n (default 3) is the maximum number of close matches to
return; n must be greater than 0.
Optional argument cutoff (default 0.6) is a float in the range [0, 1].
Possibilities that don’t score at least that similar to word are ignored.
The best (no more than n) matches among the possibilities are returned in a
list, sorted by similarity score, most similar first.
Compare a and b (lists of strings); return a Differ-style
delta (a generator generating the delta lines).
Optional keyword parameters linejunk and charjunk are for filter functions
(or None):
linejunk: A function that accepts a single string argument, and returns
true if the string is junk, or false if not. The default is None. There
is also a module-level function IS_LINE_JUNK(), which filters out lines
without visible characters, except for at most one pound character ('#')
– however the underlying SequenceMatcher class does a dynamic
analysis of which lines are so frequent as to constitute noise, and this
usually works better than using this function.
charjunk: A function that accepts a character (a string of length 1), and
returns if the character is junk, or false if not. The default is module-level
function IS_CHARACTER_JUNK(), which filters out whitespace characters (a
blank or tab; note: bad idea to include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))
>>> print(''.join(diff), end="")
- one
? ^
+ ore
? ^
- two
- three
? -
+ tree
+ emu
Return one of the two sequences that generated a delta.
Given a sequence produced by Differ.compare() or ndiff(), extract
lines originating from file 1 or 2 (parameter which), stripping off line
prefixes.
Example:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))
>>> diff = list(diff) # materialize the generated delta into a list
>>> print(''.join(restore(diff, 1)), end="")
one
two
three
>>> print(''.join(restore(diff, 2)), end="")
ore
tree
emu
Compare a and b (lists of strings); return a delta (a generator
generating the delta lines) in unified diff format.
Unified diffs are a compact way of showing just the lines that have changed plus
a few lines of context. The changes are shown in a inline style (instead of
separate before/after blocks). The number of context lines is set by n which
defaults to three.
By default, the diff control lines (those with ---, +++, or @@) are
created with a trailing newline. This is helpful so that inputs created from
file.readlines() result in diffs that are suitable for use with
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm argument to
"" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and modification
times. Any or all of these may be specified using strings for fromfile,
tofile, fromfiledate, and tofiledate. The modification times are normally
expressed in the ISO 8601 format. If not specified, the
strings default to blanks.
Return true for ignorable lines. The line line is ignorable if line is
blank or contains a single '#', otherwise it is not ignorable. Used as a
default for parameter linejunk in ndiff() in older versions.
Return true for ignorable characters. The character ch is ignorable if ch
is a space or tab, otherwise it is not ignorable. Used as a default for
parameter charjunk in ndiff().
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Optional argument isjunk must be None (the default) or a one-argument
function that takes a sequence element and returns true if and only if the
element is “junk” and should be ignored. Passing None for isjunk is
equivalent to passing lambdax:0; in other words, no elements are ignored.
For example, pass:
lambda x: x in " \t"
if you’re comparing lines as sequences of characters, and don’t want to synch up
on blanks or hard tabs.
The optional arguments a and b are sequences to be compared; both default to
empty strings. The elements of both sequences must be hashable.
The optional argument autojunk can be used to disable the automatic junk
heuristic.
New in version 3.2:
New in version 3.2: The autojunk parameter.
SequenceMatcher objects get three data attributes: bjunk is the
set of elements of b for which isjunk is True; bpopular is the set of
non-junk elements considered popular by the heuristic (if it is not
disabled); b2j is a dict mapping the remaining elements of b to a list
of positions where they occur. All three are reset whenever b is reset
with set_seqs() or set_seq2().
New in version 3.2:
New in version 3.2: The bjunk and bpopular attributes.
SequenceMatcher computes and caches detailed information about the
second sequence, so if you want to compare one sequence against many
sequences, use set_seq2() to set the commonly used sequence once and
call set_seq1() repeatedly, once for each of the other sequences.
Find longest matching block in a[alo:ahi] and b[blo:bhi].
If isjunk was omitted or None, find_longest_match() returns
(i,j,k) such that a[i:i+k] is equal to b[j:j+k], where alo<=i<=i+k<=ahi and blo<=j<=j+k<=bhi. For all (i',j',k') meeting those conditions, the additional conditions k>=k', i<=i', and if i==i', j<=j' are also met. In other words, of
all maximal matching blocks, return one that starts earliest in a, and
of all those maximal matching blocks that start earliest in a, return
the one that starts earliest in b.
If isjunk was provided, first the longest matching block is determined
as above, but with the additional restriction that no junk element appears
in the block. Then that block is extended as far as possible by matching
(only) junk elements on both sides. So the resulting block never matches
on junk except as identical junk happens to be adjacent to an interesting
match.
Here’s the same example as before, but considering blanks to be junk. That
prevents 'abcd' from matching the 'abcd' at the tail end of the
second sequence directly. Instead only the 'abcd' can match, and
matches the leftmost 'abcd' in the second sequence:
Return list of triples describing matching subsequences. Each triple is of
the form (i,j,n), and means that a[i:i+n]==b[j:j+n]. The
triples are monotonically increasing in i and j.
The last triple is a dummy, and has the value (len(a),len(b),0). It
is the only triple with n==0. If (i,j,n) and (i',j',n')
are adjacent triples in the list, and the second is not the last triple in
the list, then i+n!=i' or j+n!=j'; in other words, adjacent
triples always describe non-adjacent equal blocks.
Return list of 5-tuples describing how to turn a into b. Each tuple is
of the form (tag,i1,i2,j1,j2). The first tuple has i1==j1==0, and remaining tuples have i1 equal to the i2 from the preceding
tuple, and, likewise, j1 equal to the previous j2.
The tag values are strings, with these meanings:
Value
Meaning
'replace'
a[i1:i2] should be replaced by
b[j1:j2].
'delete'
a[i1:i2] should be deleted. Note that
j1==j2 in this case.
'insert'
b[j1:j2] should be inserted at
a[i1:i1]. Note that i1==i2 in
this case.
'equal'
a[i1:i2]==b[j1:j2] (the sub-sequences
are equal).
For example:
>>> a = "qabxcd"
>>> b = "abycdf"
>>> s = SequenceMatcher(None, a, b)
>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
Return a generator of groups with up to n lines of context.
Starting with the groups returned by get_opcodes(), this method
splits out smaller change clusters and eliminates intervening ranges which
have no changes.
The groups are returned in the same format as get_opcodes().
Return a measure of the sequences’ similarity as a float in the range [0,
1].
Where T is the total number of elements in both sequences, and M is the
number of matches, this is 2.0*M / T. Note that this is 1.0 if the
sequences are identical, and 0.0 if they have nothing in common.
The three methods that return the ratio of matching to total characters can give
different results due to differing levels of approximation, although
quick_ratio() and real_quick_ratio() are always at least as large as
ratio():
This example compares two strings, considering blanks to be “junk”:
>>> s = SequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
ratio() returns a float in [0, 1], measuring the similarity of the
sequences. As a rule of thumb, a ratio() value over 0.6 means the
sequences are close matches:
>>> print(round(s.ratio(), 3))
0.866
If you’re only interested in where the sequences match,
get_matching_blocks() is handy:
>>> for block in s.get_matching_blocks():
... print("a[%d] and b[%d] match for %d elements" % block)
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
Note that the last tuple returned by get_matching_blocks() is always a
dummy, (len(a),len(b),0), and this is the only case in which the last
tuple element (number of elements matched) is 0.
If you want to know how to change the first sequence into the second, use
get_opcodes():
>>> for opcode in s.get_opcodes():
... print("%6s a[%d:%d] b[%d:%d]" % opcode)
equal a[0:8] b[0:8]
insert a[8:8] b[8:17]
equal a[8:29] b[17:38]
Note that Differ-generated deltas make no claim to be minimal
diffs. To the contrary, minimal diffs are often counter-intuitive, because they
synch up anywhere possible, sometimes accidental matches 100 pages apart.
Restricting synch points to contiguous matches preserves some notion of
locality, at the occasional cost of producing a longer diff.
class difflib.Differ(linejunk=None, charjunk=None)
Optional keyword parameters linejunk and charjunk are for filter functions
(or None):
linejunk: A function that accepts a single string argument, and returns true
if the string is junk. The default is None, meaning that no line is
considered junk.
charjunk: A function that accepts a single character argument (a string of
length 1), and returns true if the character is junk. The default is None,
meaning that no character is considered junk.
Differ objects are used (deltas generated) via a single method:
Compare two sequences of lines, and generate the delta (a sequence of lines).
Each sequence must contain individual single-line strings ending with newlines.
Such sequences can be obtained from the readlines() method of file-like
objects. The delta generated also consists of newline-terminated strings, ready
to be printed as-is via the writelines() method of a file-like object.
This example compares two texts. First we set up the texts, sequences of
individual single-line strings ending with newlines (such sequences can also be
obtained from the readlines() method of file-like objects):
>>> text1 = ''' 1. Beautiful is better than ugly.
... 2. Explicit is better than implicit.
... 3. Simple is better than complex.
... 4. Complex is better than complicated.
... '''.splitlines(1)
>>> len(text1)
4
>>> text1[0][-1]
'\n'
>>> text2 = ''' 1. Beautiful is better than ugly.
... 3. Simple is better than complex.
... 4. Complicated is better than complex.
... 5. Flat is better than nested.
... '''.splitlines(1)
Next we instantiate a Differ object:
>>> d = Differ()
Note that when instantiating a Differ object we may pass functions to
filter out line and character “junk.” See the Differ() constructor for
details.
Finally, we compare the two:
>>> result = list(d.compare(text1, text2))
result is a list of strings, so let’s pretty-print it:
>>> from pprint import pprint
>>> pprint(result)
[' 1. Beautiful is better than ugly.\n',
'- 2. Explicit is better than implicit.\n',
'- 3. Simple is better than complex.\n',
'+ 3. Simple is better than complex.\n',
'? ++\n',
'- 4. Complex is better than complicated.\n',
'? ^ ---- ^\n',
'+ 4. Complicated is better than complex.\n',
'? ++++ ^ ^\n',
'+ 5. Flat is better than nested.\n']
As a single multi-line string it looks like this:
>>> import sys
>>> sys.stdout.writelines(result)
1. Beautiful is better than ugly.
- 2. Explicit is better than implicit.
- 3. Simple is better than complex.
+ 3. Simple is better than complex.
? ++
- 4. Complex is better than complicated.
? ^ ---- ^
+ 4. Complicated is better than complex.
? ++++ ^ ^
+ 5. Flat is better than nested.
This example shows how to use difflib to create a diff-like utility.
It is also contained in the Python source distribution, as
Tools/scripts/diff.py.
""" Command line interface to difflib.py providing diffs in four formats:
* ndiff: lists every line and highlights interline changes.
* context: highlights clusters of changes in a before/after format.
* unified: highlights clusters of changes in an inline format.
* html: generates side by side comparison with change highlights.
"""
import sys, os, time, difflib, optparse
def main():
# Configure the option parser
usage = "usage: %prog [options] fromfile tofile"
parser = optparse.OptionParser(usage)
parser.add_option("-c", action="store_true", default=False,
help='Produce a context format diff (default)')
parser.add_option("-u", action="store_true", default=False,
help='Produce a unified format diff')
hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
parser.add_option("-m", action="store_true", default=False, help=hlp)
parser.add_option("-n", action="store_true", default=False,
help='Produce a ndiff format diff')
parser.add_option("-l", "--lines", type="int", default=3,
help='Set number of context lines (default 3)')
(options, args) = parser.parse_args()
if len(args) == 0:
parser.print_help()
sys.exit(1)
if len(args) != 2:
parser.error("need to specify both a fromfile and tofile")
n = options.lines
fromfile, tofile = args # as specified in the usage string
# we're passing these as arguments to the diff function
fromdate = time.ctime(os.stat(fromfile).st_mtime)
todate = time.ctime(os.stat(tofile).st_mtime)
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()
if options.u:
diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
fromdate, todate, n=n)
elif options.n:
diff = difflib.ndiff(fromlines, tolines)
elif options.m:
diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
tofile, context=options.c,
numlines=n)
else:
diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
fromdate, todate, n=n)
# we're using writelines because diff is a generator
sys.stdout.writelines(diff)
if __name__ == '__main__':
main()
The textwrap module provides two convenience functions, wrap() and
fill(), as well as TextWrapper, the class that does all the work,
and a utility function dedent(). If you’re just wrapping or filling one
or two text strings, the convenience functions should be good enough;
otherwise, you should use an instance of TextWrapper for efficiency.
Wraps the single paragraph in text, and returns a single string containing the
wrapped paragraph. fill() is shorthand for
"\n".join(wrap(text, ...))
In particular, fill() accepts exactly the same keyword arguments as
wrap().
Both wrap() and fill() work by creating a TextWrapper
instance and calling a single method on it. That instance is not reused, so for
applications that wrap/fill many text strings, it will be more efficient for you
to create your own TextWrapper object.
Text is preferably wrapped on whitespaces and right after the hyphens in
hyphenated words; only then will long words be broken if necessary, unless
TextWrapper.break_long_words is set to false.
An additional utility function, dedent(), is provided to remove
indentation from strings that have unwanted whitespace to the left of the text.
Remove any common leading whitespace from every line in text.
This can be used to make triple-quoted strings line up with the left edge of the
display, while still presenting them in the source code in indented form.
Note that tabs and spaces are both treated as whitespace, but they are not
equal: the lines "hello" and "\thello" are considered to have no
common leading whitespace.
For example:
def test():
# end first line with \ to avoid the empty line!
s = '''\
hello
world
'''
print(repr(s)) # prints ' hello\n world\n '
print(repr(dedent(s))) # prints 'hello\n world\n'
You can re-use the same TextWrapper object many times, and you can
change any of its options through direct assignment to instance attributes
between uses.
The TextWrapper instance attributes (and keyword arguments to the
constructor) are as follows:
(default: 70) The maximum length of wrapped lines. As long as there
are no individual words in the input text longer than width,
TextWrapper guarantees that no output line will be longer than
width characters.
(default: True) If true, each whitespace character (as defined by
string.whitespace) remaining after tab expansion will be replaced by a
single space.
Note
If expand_tabs is false and replace_whitespace is true,
each tab character will be replaced by a single space, which is not
the same as tab expansion.
Note
If replace_whitespace is false, newlines may appear in the
middle of a line and cause strange output. For this reason, text should
be split into paragraphs (using str.splitlines() or similar)
which are wrapped separately.
(default: True) If true, whitespace that, after wrapping, happens to
end up at the beginning or end of a line is dropped (leading whitespace in
the first line is always preserved, though).
(default: False) If true, TextWrapper attempts to detect
sentence endings and ensure that sentences are always separated by exactly
two spaces. This is generally desired for text in a monospaced font.
However, the sentence detection algorithm is imperfect: it assumes that a
sentence ending consists of a lowercase letter followed by one of '.',
'!', or '?', possibly followed by one of '"' or "'",
followed by a space. One problem with this is algorithm is that it is
unable to detect the difference between “Dr.” in
Since the sentence detection algorithm relies on string.lowercase for
the definition of “lowercase letter,” and a convention of using two spaces
after a period to separate sentences on the same line, it is specific to
English-language texts.
(default: True) If true, then words longer than width will be
broken in order to ensure that no lines are longer than width. If
it is false, long words will not be broken, and some lines may be longer
than width. (Long words will be put on a line by themselves, in
order to minimize the amount by which width is exceeded.)
(default: True) If true, wrapping will occur preferably on whitespaces
and right after hyphens in compound words, as it is customary in English.
If false, only whitespaces will be considered as potentially good places
for line breaks, but you need to set break_long_words to false if
you want truly insecable words. Default behaviour in previous versions
was to always allow breaking hyphenated words.
TextWrapper also provides two public methods, analogous to the
module-level convenience functions:
Wraps the single paragraph in text (a string) so every line is at most
width characters long. All wrapping options are taken from
instance attributes of the TextWrapper instance. Returns a list
of output lines, without final newlines.
This module defines base classes for standard Python codecs (encoders and
decoders) and provides access to the internal Python codec registry which
manages the codec and error handling lookup process.
Register a codec search function. Search functions are expected to take one
argument, the encoding name in all lower case letters, and return a
CodecInfo object having the following attributes:
name The name of the encoding;
encode The stateless encoding function;
decode The stateless decoding function;
incrementalencoder An incremental encoder class or factory function;
incrementaldecoder An incremental decoder class or factory function;
streamwriter A stream writer class or factory function;
streamreader A stream reader class or factory function.
The various functions or classes take the following arguments:
encode and decode: These must be functions or methods which have the same
interface as the encode()/decode() methods of Codec instances (see
Codec Interface). The functions/methods are expected to work in a stateless
mode.
incrementalencoder and incrementaldecoder: These have to be factory
functions providing the following interface:
factory(errors='strict')
The factory functions must return objects providing the interfaces defined by
the base classes IncrementalEncoder and IncrementalDecoder,
respectively. Incremental codecs can maintain state.
streamreader and streamwriter: These have to be factory functions providing
the following interface:
factory(stream,errors='strict')
The factory functions must return objects providing the interfaces defined by
the base classes StreamWriter and StreamReader, respectively.
Stream codecs can maintain state.
Possible values for errors are
'strict': raise an exception in case of an encoding error
'replace': replace malformed data with a suitable replacement marker,
such as '?' or '\ufffd'
'ignore': ignore malformed data and continue without further notice
'xmlcharrefreplace': replace with the appropriate XML character
reference (for encoding only)
'backslashreplace': replace with backslashed escape sequences (for
encoding only)
'surrogateescape': replace with surrogate U+DCxx, see PEP 383
as well as any other error handling name defined via register_error().
In case a search function cannot find a given encoding, it should return
None.
Looks up the codec info in the Python codec registry and returns a
CodecInfo object as defined above.
Encodings are first looked up in the registry’s cache. If not found, the list of
registered search functions is scanned. If no CodecInfo object is
found, a LookupError is raised. Otherwise, the CodecInfo object
is stored in the cache and returned to the caller.
To simplify access to the various codecs, the module provides these additional
functions which use lookup() for the codec lookup:
Register the error handling function error_handler under the name name.
error_handler will be called during encoding and decoding in case of an error,
when name is specified as the errors parameter.
For encoding error_handler will be called with a UnicodeEncodeError
instance, which contains information about the location of the error. The error
handler must either raise this or a different exception or return a tuple with a
replacement for the unencodable part of the input and a position where encoding
should continue. The encoder will encode the replacement and continue encoding
the original input at the specified position. Negative position values will be
treated as being relative to the end of the input string. If the resulting
position is out of bound an IndexError will be raised.
Decoding and translating works similar, except UnicodeDecodeError or
UnicodeTranslateError will be passed to the handler and that the
replacement from the error handler will be put into the output directly.
Implements the replace error handling: malformed data is replaced with a
suitable replacement character such as '?' in bytestrings and
'\ufffd' in Unicode strings.
Open an encoded file using the given mode and return a wrapped version
providing transparent encoding/decoding. The default file mode is 'r'
meaning to open the file in read mode.
Note
The wrapped version’s methods will accept and return strings only. Bytes
arguments will be rejected.
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done
on reading and writing.
encoding specifies the encoding which is to be used for the file.
errors may be given to define the error handling. It defaults to 'strict'
which causes a ValueError to be raised in case an encoding error occurs.
buffering has the same meaning as for the built-in open() function. It
defaults to line buffered.
Return a wrapped version of file which provides transparent encoding
translation.
Bytes written to the wrapped file are interpreted according to the given
data_encoding and then written to the original file as bytes using the
file_encoding.
If file_encoding is not given, it defaults to data_encoding.
errors may be given to define the error handling. It defaults to
'strict', which causes ValueError to be raised in case an encoding
error occurs.
Uses an incremental encoder to iteratively encode the input provided by
iterator. This function is a generator. errors (as well as any
other keyword argument) is passed through to the incremental encoder.
Uses an incremental decoder to iteratively decode the input provided by
iterator. This function is a generator. errors (as well as any
other keyword argument) is passed through to the incremental decoder.
The module also provides the following constants which are useful for reading
and writing to platform dependent files:
These constants define various encodings of the Unicode byte order mark (BOM)
used in UTF-16 and UTF-32 data streams to indicate the byte order used in the
stream or file and in UTF-8 as a Unicode signature. BOM_UTF16 is either
BOM_UTF16_BE or BOM_UTF16_LE depending on the platform’s
native byte order, BOM is an alias for BOM_UTF16,
BOM_LE for BOM_UTF16_LE and BOM_BE for
BOM_UTF16_BE. The others represent the BOM in UTF-8 and UTF-32
encodings.
The codecs module defines a set of base classes which define the
interface and can also be used to easily write your own codecs for use in
Python.
Each codec has to define four interfaces to make it usable as codec in Python:
stateless encoder, stateless decoder, stream reader and stream writer. The
stream reader and writers typically reuse the stateless encoder/decoder to
implement the file protocols.
The Codec class defines the interface for stateless encoders/decoders.
To simplify and standardize error handling, the encode() and
decode() methods may implement different error handling schemes by
providing the errors string argument. The following string values are defined
and implemented by all standard Python codecs:
Value
Meaning
'strict'
Raise UnicodeError (or a subclass);
this is the default.
'ignore'
Ignore the character and continue with the
next.
'replace'
Replace with a suitable replacement
character; Python will use the official
U+FFFD REPLACEMENT CHARACTER for the built-in
Unicode codecs on decoding and ‘?’ on
encoding.
'xmlcharrefreplace'
Replace with the appropriate XML character
reference (only for encoding).
'backslashreplace'
Replace with backslashed escape sequences
(only for encoding).
'surrogateescape'
Replace byte with surrogate U+DCxx, as defined
in PEP 383.
In addition, the following error handlers are specific to a single codec:
Value
Codec
Meaning
'surrogatepass'
utf-8
Allow encoding and decoding of surrogate
codes in UTF-8.
New in version 3.1:
New in version 3.1: The 'surrogateescape' and 'surrogatepass' error handlers.
Encodes the object input and returns a tuple (output object, length consumed).
Encoding converts a string object to a bytes object using a particular
character set encoding (e.g., cp1252 or iso-8859-1).
errors defines the error handling to apply. It defaults to 'strict'
handling.
The method may not store state in the Codec instance. Use
StreamCodec for codecs which have to keep state in order to make
encoding/decoding efficient.
The encoder must be able to handle zero length input and return an empty object
of the output object type in this situation.
Decodes the object input and returns a tuple (output object, length
consumed). Decoding converts a bytes object encoded using a particular
character set encoding to a string object.
input must be a bytes object or one which provides the read-only character
buffer interface – for example, buffer objects and memory mapped files.
errors defines the error handling to apply. It defaults to 'strict'
handling.
The method may not store state in the Codec instance. Use
StreamCodec for codecs which have to keep state in order to make
encoding/decoding efficient.
The decoder must be able to handle zero length input and return an empty object
of the output object type in this situation.
The IncrementalEncoder and IncrementalDecoder classes provide
the basic interface for incremental encoding and decoding. Encoding/decoding the
input isn’t done with one call to the stateless encoder/decoder function, but
with multiple calls to the encode()/decode() method of the
incremental encoder/decoder. The incremental encoder/decoder keeps track of the
encoding/decoding process during method calls.
The joined output of calls to the encode()/decode() method is the
same as if all the single inputs were joined into one, and this input was
encoded/decoded with the stateless encoder/decoder.
The IncrementalEncoder class is used for encoding an input in multiple
steps. It defines the following methods which every incremental encoder must
define in order to be compatible with the Python codec registry.
All incremental encoders must provide this constructor interface. They are free
to add additional keyword arguments, but only the ones defined here are used by
the Python codec registry.
The IncrementalEncoder may implement different error handling schemes
by providing the errors keyword argument. These parameters are predefined:
'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character
'xmlcharrefreplace' Replace with the appropriate XML character reference
'backslashreplace' Replace with backslashed escape sequences.
The errors argument will be assigned to an attribute of the same name.
Assigning to this attribute makes it possible to switch between different error
handling strategies during the lifetime of the IncrementalEncoder
object.
The set of allowed values for the errors argument can be extended with
register_error().
Encodes object (taking the current state of the encoder into account)
and returns the resulting encoded object. If this is the last call to
encode()final must be true (the default is false).
Return the current state of the encoder which must be an integer. The
implementation should make sure that 0 is the most common state. (States
that are more complicated than integers can be converted into an integer by
marshaling/pickling the state and encoding the bytes of the resulting string
into an integer).
The IncrementalDecoder class is used for decoding an input in multiple
steps. It defines the following methods which every incremental decoder must
define in order to be compatible with the Python codec registry.
All incremental decoders must provide this constructor interface. They are free
to add additional keyword arguments, but only the ones defined here are used by
the Python codec registry.
The IncrementalDecoder may implement different error handling schemes
by providing the errors keyword argument. These parameters are predefined:
'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character.
The errors argument will be assigned to an attribute of the same name.
Assigning to this attribute makes it possible to switch between different error
handling strategies during the lifetime of the IncrementalDecoder
object.
The set of allowed values for the errors argument can be extended with
register_error().
Decodes object (taking the current state of the decoder into account)
and returns the resulting decoded object. If this is the last call to
decode()final must be true (the default is false). If final is
true the decoder must decode the input completely and must flush all
buffers. If this isn’t possible (e.g. because of incomplete byte sequences
at the end of the input) it must initiate error handling just like in the
stateless case (which might raise an exception).
Return the current state of the decoder. This must be a tuple with two
items, the first must be the buffer containing the still undecoded
input. The second must be an integer and can be additional state
info. (The implementation should make sure that 0 is the most common
additional state info.) If this additional state info is 0 it must be
possible to set the decoder to the state which has no input buffered and
0 as the additional state info, so that feeding the previously
buffered input to the decoder returns it to the previous state without
producing any output. (Additional state info that is more complicated than
integers can be converted into an integer by marshaling/pickling the info
and encoding the bytes of the resulting string into an integer.)
Set the state of the encoder to state. state must be a decoder state
returned by getstate().
The StreamWriter and StreamReader classes provide generic
working interfaces which can be used to implement new encoding submodules very
easily. See encodings.utf_8 for an example of how this is done.
The StreamWriter class is a subclass of Codec and defines the
following methods which every stream writer must define in order to be
compatible with the Python codec registry.
All stream writers must provide this constructor interface. They are free to add
additional keyword arguments, but only the ones defined here are used by the
Python codec registry.
stream must be a file-like object open for writing binary data.
The StreamWriter may implement different error handling schemes by
providing the errors keyword argument. These parameters are predefined:
'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character
'xmlcharrefreplace' Replace with the appropriate XML character reference
'backslashreplace' Replace with backslashed escape sequences.
The errors argument will be assigned to an attribute of the same name.
Assigning to this attribute makes it possible to switch between different error
handling strategies during the lifetime of the StreamWriter object.
The set of allowed values for the errors argument can be extended with
register_error().
Flushes and resets the codec buffers used for keeping state.
Calling this method should ensure that the data on the output is put into
a clean state that allows appending of new fresh data without having to
rescan the whole stream to recover state.
In addition to the above methods, the StreamWriter must also inherit
all other methods and attributes from the underlying stream.
The StreamReader class is a subclass of Codec and defines the
following methods which every stream reader must define in order to be
compatible with the Python codec registry.
All stream readers must provide this constructor interface. They are free to add
additional keyword arguments, but only the ones defined here are used by the
Python codec registry.
stream must be a file-like object open for reading (binary) data.
The StreamReader may implement different error handling schemes by
providing the errors keyword argument. These parameters are defined:
'strict' Raise ValueError (or a subclass); this is the default.
'ignore' Ignore the character and continue with the next.
'replace' Replace with a suitable replacement character.
The errors argument will be assigned to an attribute of the same name.
Assigning to this attribute makes it possible to switch between different error
handling strategies during the lifetime of the StreamReader object.
The set of allowed values for the errors argument can be extended with
register_error().
Decodes data from the stream and returns the resulting object.
chars indicates the number of characters to read from the
stream. read() will never return more than chars characters, but
it might return less, if there are not enough characters available.
size indicates the approximate maximum number of bytes to read from the
stream for decoding purposes. The decoder can modify this setting as
appropriate. The default value -1 indicates to read and decode as much as
possible. size is intended to prevent having to decode huge files in
one step.
firstline indicates that it would be sufficient to only return the first
line, if there are decoding errors on later lines.
The method should use a greedy read strategy meaning that it should read
as much data as is allowed within the definition of the encoding and the
given size, e.g. if optional encoding endings or state markers are
available on the stream, these should be read too.
The StreamReaderWriter allows wrapping streams which work in both read
and write modes.
The design is such that one can use the factory functions returned by the
lookup() function to construct the instance.
class codecs.StreamReaderWriter(stream, Reader, Writer, errors)¶
Creates a StreamReaderWriter instance. stream must be a file-like
object. Reader and Writer must be factory functions or classes providing the
StreamReader and StreamWriter interface resp. Error handling
is done in the same way as defined for the stream readers and writers.
StreamReaderWriter instances define the combined interfaces of
StreamReader and StreamWriter classes. They inherit all other
methods and attributes from the underlying stream.
The StreamRecoder provide a frontend - backend view of encoding data
which is sometimes useful when dealing with different encoding environments.
The design is such that one can use the factory functions returned by the
lookup() function to construct the instance.
class codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors)¶
Creates a StreamRecoder instance which implements a two-way conversion:
encode and decode work on the frontend (the input to read() and output
of write()) while Reader and Writer work on the backend (reading and
writing to the stream).
You can use these objects to do transparent direct recodings from e.g. Latin-1
to UTF-8 and back.
stream must be a file-like object.
encode, decode must adhere to the Codec interface. Reader,
Writer must be factory functions or classes providing objects of the
StreamReader and StreamWriter interface respectively.
encode and decode are needed for the frontend translation, Reader and
Writer for the backend translation.
Error handling is done in the same way as defined for the stream readers and
writers.
StreamRecoder instances define the combined interfaces of
StreamReader and StreamWriter classes. They inherit all other
methods and attributes from the underlying stream.
Strings are stored internally as sequences of codepoints (to be precise
as Py_UNICODE arrays). Depending on the way Python is compiled (either
via --without-wide-unicode or --with-wide-unicode, with the
former being the default) Py_UNICODE is either a 16-bit or 32-bit data
type. Once a string object is used outside of CPU and memory, CPU endianness
and how these arrays are stored as bytes become an issue. Transforming a
string object into a sequence of bytes is called encoding and recreating the
string object from the sequence of bytes is known as decoding. There are many
different methods for how this transformation can be done (these methods are
also called encodings). The simplest method is to map the codepoints 0-255 to
the bytes 0x0-0xff. This means that a string object that contains
codepoints above U+00FF can’t be encoded with this method (which is called
'latin-1' or 'iso-8859-1'). str.encode() will raise a
UnicodeEncodeError that looks like this: UnicodeEncodeError:'latin-1'codeccan'tencodecharacter'\u1234'inposition3:ordinalnotinrange(256).
There’s another group of encodings (the so called charmap encodings) that choose
a different subset of all Unicode code points and how these codepoints are
mapped to the bytes 0x0-0xff. To see how this is done simply open
e.g. encodings/cp1252.py (which is an encoding that is used primarily on
Windows). There’s a string constant with 256 characters that shows you which
character is mapped to which byte value.
All of these encodings can only encode 256 of the 65536 (or 1114111) codepoints
defined in Unicode. A simple and straightforward way that can store each Unicode
code point, is to store each codepoint as two consecutive bytes. There are two
possibilities: Store the bytes in big endian or in little endian order. These
two encodings are called UTF-16-BE and UTF-16-LE respectively. Their
disadvantage is that if e.g. you use UTF-16-BE on a little endian machine you
will always have to swap bytes on encoding and decoding. UTF-16 avoids this
problem: Bytes will always be in natural endianness. When these bytes are read
by a CPU with a different endianness, then bytes have to be swapped though. To
be able to detect the endianness of a UTF-16 byte sequence, there’s the so
called BOM (the “Byte Order Mark”). This is the Unicode character U+FEFF.
This character will be prepended to every UTF-16 byte sequence. The byte swapped
version of this character (0xFFFE) is an illegal character that may not
appear in a Unicode text. So when the first character in an UTF-16 byte sequence
appears to be a U+FFFE the bytes have to be swapped on decoding.
Unfortunately upto Unicode 4.0 the character U+FEFF had a second purpose as
a ZEROWIDTHNO-BREAKSPACE: A character that has no width and doesn’t allow
a word to be split. It can e.g. be used to give hints to a ligature algorithm.
With Unicode 4.0 using U+FEFF as a ZEROWIDTHNO-BREAKSPACE has been
deprecated (with U+2060 (WORDJOINER) assuming this role). Nevertheless
Unicode software still must be able to handle U+FEFF in both roles: As a BOM
it’s a device to determine the storage layout of the encoded bytes, and vanishes
once the byte sequence has been decoded into a string; as a ZEROWIDTHNO-BREAKSPACE it’s a normal character that will be decoded like any other.
There’s another encoding that is able to encoding the full range of Unicode
characters: UTF-8. UTF-8 is an 8-bit encoding, which means there are no issues
with byte order in UTF-8. Each byte in a UTF-8 byte sequence consists of two
parts: Marker bits (the most significant bits) and payload bits. The marker bits
are a sequence of zero to six 1 bits followed by a 0 bit. Unicode characters are
encoded like this (with x being payload bits, which when concatenated give the
Unicode character):
The least significant bit of the Unicode character is the rightmost x bit.
As UTF-8 is an 8-bit encoding no BOM is required and any U+FEFF character in
the decoded string (even if it’s the first character) is treated as a ZEROWIDTHNO-BREAKSPACE.
Without external information it’s impossible to reliably determine which
encoding was used for encoding a string. Each charmap encoding can
decode any random byte sequence. However that’s not possible with UTF-8, as
UTF-8 byte sequences have a structure that doesn’t allow arbitrary byte
sequences. To increase the reliability with which a UTF-8 encoding can be
detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls
"utf-8-sig") for its Notepad program: Before any of the Unicode characters
is written to the file, a UTF-8 encoded BOM (which looks like this as a byte
sequence: 0xef, 0xbb, 0xbf) is written. As it’s rather improbable
that any charmap encoded file starts with these byte values (which would e.g.
map to
LATIN SMALL LETTER I WITH DIAERESIS
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
INVERTED QUESTION MARK
in iso-8859-1), this increases the probability that a utf-8-sig encoding can be
correctly guessed from the byte sequence. So here the BOM is not used to be able
to determine the byte order used for generating the byte sequence, but as a
signature that helps in guessing the encoding. On encoding the utf-8-sig codec
will write 0xef, 0xbb, 0xbf as the first three bytes to the file. On
decoding utf-8-sig will skip those three bytes if they appear as the first three
bytes in the file.
Python comes with a number of codecs built-in, either implemented as C functions
or with dictionaries as mapping tables. The following table lists the codecs by
name, together with a few common aliases, and the languages for which the
encoding is likely used. Neither the list of aliases nor the list of languages
is meant to be exhaustive. Notice that spelling alternatives that only differ in
case or use a hyphen instead of an underscore are also valid aliases; therefore,
e.g. 'utf-8' is a valid alias for the 'utf_8' codec.
Many of the character sets support the same languages. They vary in individual
characters (e.g. whether the EURO SIGN is supported or not), and in the
assignment of characters to code positions. For the European languages in
particular, the following variants typically exist:
an ISO 8859 codeset
a Microsoft Windows code page, which is typically derived from a 8859 codeset,
but replaces control characters with additional graphic characters
an IBM EBCDIC code page
an IBM PC code page, which is ASCII compatible
Codec
Aliases
Languages
ascii
646, us-ascii
English
big5
big5-tw, csbig5
Traditional Chinese
big5hkscs
big5-hkscs, hkscs
Traditional Chinese
cp037
IBM037, IBM039
English
cp424
EBCDIC-CP-HE, IBM424
Hebrew
cp437
437, IBM437
English
cp500
EBCDIC-CP-BE, EBCDIC-CP-CH,
IBM500
Western Europe
cp720
Arabic
cp737
Greek
cp775
IBM775
Baltic languages
cp850
850, IBM850
Western Europe
cp852
852, IBM852
Central and Eastern Europe
cp855
855, IBM855
Bulgarian, Byelorussian,
Macedonian, Russian, Serbian
cp856
Hebrew
cp857
857, IBM857
Turkish
cp858
858, IBM858
Western Europe
cp860
860, IBM860
Portuguese
cp861
861, CP-IS, IBM861
Icelandic
cp862
862, IBM862
Hebrew
cp863
863, IBM863
Canadian
cp864
IBM864
Arabic
cp865
865, IBM865
Danish, Norwegian
cp866
866, IBM866
Russian
cp869
869, CP-GR, IBM869
Greek
cp874
Thai
cp875
Greek
cp932
932, ms932, mskanji, ms-kanji
Japanese
cp949
949, ms949, uhc
Korean
cp950
950, ms950
Traditional Chinese
cp1006
Urdu
cp1026
ibm1026
Turkish
cp1140
ibm1140
Western Europe
cp1250
windows-1250
Central and Eastern Europe
cp1251
windows-1251
Bulgarian, Byelorussian,
Macedonian, Russian, Serbian
Produce a string that is
suitable as raw Unicode
literal in Python source
code
undefined
Raise an exception for
all conversions. Can be
used as the system
encoding if no automatic
coercion between byte and
Unicode strings is
desired.
unicode_escape
Produce a string that is
suitable as Unicode
literal in Python source
code
unicode_internal
Return the internal
representation of the
operand
The following codecs provide bytes-to-bytes mappings.
Codec
Aliases
Purpose
base64_codec
base64, base-64
Convert operand to MIME
base64
bz2_codec
bz2
Compress the operand
using bz2
hex_codec
hex
Convert operand to
hexadecimal
representation, with two
digits per byte
quopri_codec
quopri, quoted-printable,
quotedprintable
Convert operand to MIME
quoted printable
uu_codec
uu
Convert the operand using
uuencode
zlib_codec
zip, zlib
Compress the operand
using gzip
The following codecs provide string-to-string mappings.
Codec
Aliases
Purpose
rot_13
rot13
Returns the Caesar-cypher
encryption of the operand
New in version 3.2:
New in version 3.2: bytes-to-bytes and string-to-string codecs.
encodings.idna — Internationalized Domain Names in Applications¶
This module implements RFC 3490 (Internationalized Domain Names in
Applications) and RFC 3492 (Nameprep: A Stringprep Profile for
Internationalized Domain Names (IDN)). It builds upon the punycode encoding
and stringprep.
These RFCs together define a protocol to support non-ASCII characters in domain
names. A domain name containing non-ASCII characters (such as
www.Alliancefrançaise.nu) is converted into an ASCII-compatible encoding
(ACE, such as www.xn--alliancefranaise-npb.nu). The ACE form of the domain
name is then used in all places where arbitrary characters are not allowed by
the protocol, such as DNS queries, HTTP Host fields, and so
on. This conversion is carried out in the application; if possible invisible to
the user: The application should transparently convert Unicode domain labels to
IDNA on the wire, and convert back ACE labels to Unicode before presenting them
to the user.
Python supports this conversion in several ways: the idna codec performs
conversion between Unicode and ACE, separating an input string into labels
based on the separator characters defined in section 3.1 (1) of RFC 3490
and converting each label to ACE as required, and conversely separating an input
byte string into labels based on the . separator and converting any ACE
labels found into unicode. Furthermore, the socket module
transparently converts Unicode host names to ACE, so that applications need not
be concerned about converting host names themselves when they pass them to the
socket module. On top of that, modules that have host names as function
parameters, such as http.client and ftplib, accept Unicode host
names (http.client then also transparently sends an IDNA hostname in the
Host field if it sends that field at all).
When receiving host names from the wire (such as in reverse name lookup), no
automatic conversion to Unicode is performed: Applications wishing to present
such host names to the user should decode them to Unicode.
The module encodings.idna also implements the nameprep procedure, which
performs certain normalizations on host names, to achieve case-insensitivity of
international domain names, and to unify similar characters. The nameprep
functions can be used directly if desired.
Encode operand according to the ANSI codepage (CP_ACP). This codec only
supports 'strict' and 'replace' error handlers to encode, and
'strict' and 'ignore' error handlers to decode.
Availability: Windows only.
Changed in version 3.2:
Changed in version 3.2: Before 3.2, the errors argument was ignored; 'replace' was always used
to encode, and 'ignore' to decode.
This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded
BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this
is only done once (on the first write to the byte stream). For decoding an
optional UTF-8 encoded BOM at the start of the data will be skipped.
This module provides access to the Unicode Character Database (UCD) which
defines character properties for all Unicode characters. The data contained in
this database is compiled from the UCD version 6.0.0.
The module uses the same names and symbols as defined by Unicode
Standard Annex #44, “Unicode Character Database”. It defines the
following functions:
Returns the decimal value assigned to the character chr as integer.
If no such value is defined, default is returned, or, if not given,
ValueError is raised.
Returns the digit value assigned to the character chr as integer.
If no such value is defined, default is returned, or, if not given,
ValueError is raised.
Returns the numeric value assigned to the character chr as float.
If no such value is defined, default is returned, or, if not given,
ValueError is raised.
Returns the mirrored property assigned to the character chr as
integer. Returns 1 if the character has been identified as a “mirrored”
character in bidirectional text, 0 otherwise.
Return the normal form form for the Unicode string unistr. Valid values for
form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.
The Unicode standard defines various normalization forms of a Unicode string,
based on the definition of canonical equivalence and compatibility equivalence.
In Unicode, several characters can be expressed in various way. For example, the
character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as
the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C).
For each character, there are two normal forms: normal form C and normal form D.
Normal form D (NFD) is also known as canonical decomposition, and translates
each character into its decomposed form. Normal form C (NFC) first applies a
canonical decomposition, then composes pre-combined characters again.
In addition to these two forms, there are two additional normal forms based on
compatibility equivalence. In Unicode, certain characters are supported which
normally would be unified with other characters. For example, U+2160 (ROMAN
NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
However, it is supported in Unicode for compatibility with existing character
sets (e.g. gb2312).
The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
replace all compatibility characters with their equivalents. The normal form KC
(NFKC) first applies the compatibility decomposition, followed by the canonical
composition.
Even if two unicode strings are normalized and look the same to
a human reader, if one has combining characters and the other
doesn’t, they may not compare equal.
In addition, the module exposes the following constant:
This is an object that has the same methods as the entire module, but uses the
Unicode database version 3.2 instead, for applications that require this
specific version of the Unicode database (such as IDNA).
When identifying things (such as host names) in the internet, it is often
necessary to compare such identifications for “equality”. Exactly how this
comparison is executed may depend on the application domain, e.g. whether it
should be case-insensitive or not. It may be also necessary to restrict the
possible identifications, to allow only identifications consisting of
“printable” characters.
RFC 3454 defines a procedure for “preparing” Unicode strings in internet
protocols. Before passing strings onto the wire, they are processed with the
preparation procedure, after which they have a certain normalized form. The RFC
defines a set of tables, which can be combined into profiles. Each profile must
define which tables it uses, and what other optional parts of the stringprep
procedure are part of the profile. One example of a stringprep profile is
nameprep, which is used for internationalized domain names.
The module stringprep only exposes the tables from RFC 3454. As these
tables would be very large to represent them as dictionaries or lists, the
module uses the Unicode character database internally. The module source code
itself was generated using the mkstringprep.py utility.
As a result, these tables are exposed as functions, not as data structures.
There are two kinds of tables in the RFC: sets and mappings. For a set,
stringprep provides the “characteristic function”, i.e. a function that
returns true if the parameter is part of the set. For mappings, it provides the
mapping function: given the key, it returns the associated value. Below is a
list of all functions available in the module.
The modules described in this chapter provide a variety of specialized data
types such as dates and times, fixed-type arrays, heap queues, synchronized
queues, and sets.
Python also provides some built-in data types, in particular,
dict, list, set and frozenset, and
tuple. The str class is used to hold
Unicode strings, and the bytes class is used to hold binary data.
The following modules are documented in this chapter:
The datetime module supplies classes for manipulating dates and times in
both simple and complex ways. While date and time arithmetic is supported, the
focus of the implementation is on efficient attribute extraction for output
formatting and manipulation. For related
functionality, see also the time and calendar modules.
There are two kinds of date and time objects: “naive” and “aware”. This
distinction refers to whether the object has any notion of time zone, daylight
saving time, or other kind of algorithmic or political time adjustment. Whether
a naive datetime object represents Coordinated Universal Time (UTC),
local time, or time in some other timezone is purely up to the program, just
like it’s up to the program whether a particular number represents metres,
miles, or mass. Naive datetime objects are easy to understand and to
work with, at the cost of ignoring some aspects of reality.
For applications requiring more, datetime and time objects
have an optional time zone information attribute, tzinfo, that can be
set to an instance of a subclass of the abstract tzinfo class. These
tzinfo objects capture information about the offset from UTC time, the
time zone name, and whether Daylight Saving Time is in effect. Note that only
one concrete tzinfo class, the timezone class, is supplied by the
datetime module. The timezone class can reprsent simple
timezones with fixed offset from UTC such as UTC itself or North American EST and
EDT timezones. Supporting timezones at whatever level of detail is
required is up to the application. The rules for time adjustment across the
world are more political than rational, change frequently, and there is no
standard suitable for every application aside from UTC.
The datetime module exports the following constants:
An idealized naive date, assuming the current Gregorian calendar always was, and
always will be, in effect. Attributes: year, month, and
day.
class datetime.time
An idealized time, independent of any particular day, assuming that every day
has exactly 24*60*60 seconds (there is no notion of “leap seconds” here).
Attributes: hour, minute, second, microsecond,
and tzinfo.
An abstract base class for time zone information objects. These are used by the
datetime and time classes to provide a customizable notion of
time adjustment (for example, to account for time zone and/or daylight saving
time).
An object d of type time or datetime may be naive or aware.
d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d) does
not return None. If d.tzinfo is None, or if d.tzinfo is not
None but d.tzinfo.utcoffset(d) returns None, d is naive.
The distinction between naive and aware doesn’t apply to timedelta
objects.
Subclass relationships:
object
timedelta
tzinfo
timezone
time
date
datetime
A timedelta object represents a duration, the difference between two
dates or times.
class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)¶
All arguments are optional and default to 0. Arguments may be integers
or floats, and may be positive or negative.
Only days, seconds and microseconds are stored internally. Arguments are
converted to those units:
A millisecond is converted to 1000 microseconds.
A minute is converted to 60 seconds.
An hour is converted to 3600 seconds.
A week is converted to 7 days.
and days, seconds and microseconds are then normalized so that the
representation is unique, with
0<=microseconds<1000000
0<=seconds<3600*24 (the number of seconds in one day)
-999999999<=days<=999999999
If any argument is a float and there are fractional microseconds, the fractional
microseconds left over from all arguments are combined and their sum is rounded
to the nearest microsecond. If no argument is a float, the conversion and
normalization processes are exact (no information is lost).
If the normalized value of days lies outside the indicated range,
OverflowError is raised.
Note that normalization of negative values may be surprising at first. For
example,
>>> from datetime import timedelta
>>> d = timedelta(microseconds=-1)
>>> (d.days, d.seconds, d.microseconds)
(-1, 86399, 999999)
-timedelta.max is not representable as a timedelta object.
String representations of timedelta objects are normalized
similarly to their internal representation. This leads to somewhat
unusual results for negative timedeltas. For example:
In addition to the operations listed above timedelta objects support
certain additions and subtractions with date and datetime
objects (see below).
Changed in version 3.2:
Changed in version 3.2: Floor division and true division of a timedelta object by another
timedelta object are now supported, as are remainder operations and
the divmod() function. True division and multiplication of a
timedelta object by a float object are now supported.
Comparisons of timedelta objects are supported with the
timedelta object representing the smaller duration considered to be the
smaller timedelta. In order to stop mixed-type comparisons from falling back to
the default comparison by object address, when a timedelta object is
compared to an object of a different type, TypeError is raised unless the
comparison is == or !=. The latter cases return False or
True, respectively.
timedelta objects are hashable (usable as dictionary keys), support
efficient pickling, and in Boolean contexts, a timedelta object is
considered to be true if and only if it isn’t equal to timedelta(0).
A date object represents a date (year, month and day) in an idealized
calendar, the current Gregorian calendar indefinitely extended in both
directions. January 1 of year 1 is called day number 1, January 2 of year 1 is
called day number 2, and so on. This matches the definition of the “proleptic
Gregorian” calendar in Dershowitz and Reingold’s book Calendrical Calculations,
where it’s the base calendar for all computations. See the book for algorithms
for converting between proleptic Gregorian ordinals and many other calendar
systems.
Return the local date corresponding to the POSIX timestamp, such as is returned
by time.time(). This may raise ValueError, if the timestamp is out
of the range of values supported by the platform C localtime() function.
It’s common for this to be restricted to years from 1970 through 2038. Note
that on non-POSIX systems that include leap seconds in their notion of a
timestamp, leap seconds are ignored by fromtimestamp().
Return the date corresponding to the proleptic Gregorian ordinal, where January
1 of year 1 has ordinal 1. ValueError is raised unless 1<=ordinal<=date.max.toordinal(). For any date d, date.fromordinal(d.toordinal())==d.
Between 1 and the number of days in the given month of the given year.
Supported operations:
Operation
Result
date2=date1+timedelta
date2 is timedelta.days days removed
from date1. (1)
date2=date1-timedelta
Computes date2 such that date2+timedelta==date1. (2)
timedelta=date1-date2
(3)
date1<date2
date1 is considered less than date2 when
date1 precedes date2 in time. (4)
Notes:
date2 is moved forward in time if timedelta.days>0, or backward if
timedelta.days<0. Afterward date2-date1==timedelta.days.
timedelta.seconds and timedelta.microseconds are ignored.
OverflowError is raised if date2.year would be smaller than
MINYEAR or larger than MAXYEAR.
This isn’t quite equivalent to date1 + (-timedelta), because -timedelta in
isolation can overflow in cases where date1 - timedelta does not.
timedelta.seconds and timedelta.microseconds are ignored.
This is exact, and cannot overflow. timedelta.seconds and
timedelta.microseconds are 0, and date2 + timedelta == date1 after.
In other words, date1<date2 if and only if date1.toordinal()<date2.toordinal(). In order to stop comparison from falling back to the
default scheme of comparing object addresses, date comparison normally raises
TypeError if the other comparand isn’t also a date object.
However, NotImplemented is returned instead if the other comparand has a
timetuple() attribute. This hook gives other kinds of date objects a
chance at implementing mixed-type comparison. If not, when a date
object is compared to an object of a different type, TypeError is raised
unless the comparison is == or !=. The latter cases return
False or True, respectively.
Dates can be used as dictionary keys. In Boolean contexts, all date
objects are considered to be true.
Return a date with the same value, except for those parameters given new
values by whichever keyword arguments are specified. For example, if d==date(2002,12,31), then d.replace(day=26)==date(2002,12,26).
Return a time.struct_time such as returned by time.localtime().
The hours, minutes and seconds are 0, and the DST flag is -1. d.timetuple()
is equivalent to time.struct_time((d.year,d.month,d.day,0,0,0,d.weekday(),yday,-1)), where yday=d.toordinal()-date(d.year,1,1).toordinal()+1 is the day number within the current year starting with
1 for January 1st.
Return the proleptic Gregorian ordinal of the date, where January 1 of year 1
has ordinal 1. For any date object d,
date.fromordinal(d.toordinal())==d.
Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
For example, date(2002,12,4).weekday()==2, a Wednesday. See also
isoweekday().
Return the day of the week as an integer, where Monday is 1 and Sunday is 7.
For example, date(2002,12,4).isoweekday()==3, a Wednesday. See also
weekday(), isocalendar().
The ISO year consists of 52 or 53 full weeks, and where a week starts on a
Monday and ends on a Sunday. The first week of an ISO year is the first
(Gregorian) calendar week of a year containing a Thursday. This is called week
number 1, and the ISO year of that Thursday is the same as its Gregorian year.
For example, 2004 begins on a Thursday, so the first week of ISO year 2004
begins on Monday, 29 Dec 2003 and ends on Sunday, 4 Jan 2004, so that
date(2003,12,29).isocalendar()==(2004,1,1) and date(2004,1,4).isocalendar()==(2004,1,7).
Return a string representing the date, for example date(2002,12,4).ctime()=='WedDec400:00:002002'. d.ctime() is equivalent to
time.ctime(time.mktime(d.timetuple())) on platforms where the native C
ctime() function (which time.ctime() invokes, but which
date.ctime() does not invoke) conforms to the C standard.
Return a string representing the date, controlled by an explicit format string.
Format codes referring to hours, minutes or seconds will see 0 values. See
section strftime() and strptime() Behavior.
>>> from datetime import date
>>> d = date.fromordinal(730920) # 730920th day after 1. 1. 0001
>>> d
datetime.date(2002, 3, 11)
>>> t = d.timetuple()
>>> for i in t:
... print(i)
2002 # year
3 # month
11 # day
0
0
0
0 # weekday (0 = Monday)
70 # 70th day in the year
-1
>>> ic = d.isocalendar()
>>> for i in ic:
... print(i)
2002 # ISO year
11 # ISO week number
1 # ISO day number ( 1 = Monday )
>>> d.isoformat()
'2002-03-11'
>>> d.strftime("%d/%m/%y")
'11/03/02'
>>> d.strftime("%A %d. %B %Y")
'Monday 11. March 2002'
A datetime object is a single object containing all the information
from a date object and a time object. Like a date
object, datetime assumes the current Gregorian calendar extended in
both directions; like a time object, datetime assumes there are exactly
3600*24 seconds in every day.
Constructor:
class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None)¶
The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be integers,
in the following ranges:
MINYEAR<=year<=MAXYEAR
1<=month<=12
1<=day<=numberofdaysinthegivenmonthandyear
0<=hour<24
0<=minute<60
0<=second<60
0<=microsecond<1000000
If an argument outside those ranges is given, ValueError is raised.
Return the current local date and time. If optional argument tz is None
or not specified, this is like today(), but, if possible, supplies more
precision than can be gotten from going through a time.time() timestamp
(for example, this may be possible on platforms supplying the C
gettimeofday() function).
Else tz must be an instance of a class tzinfo subclass, and the
current date and time are converted to tz‘s time zone. In this case the
result is equivalent to tz.fromutc(datetime.utcnow().replace(tzinfo=tz)).
See also today(), utcnow().
Return the current UTC date and time, with tzinfoNone. This is like
now(), but returns the current UTC date and time, as a naive
datetime object. An aware current UTC datetime can be obtained by
calling datetime.now(timezone.utc). See also now().
Return the local date and time corresponding to the POSIX timestamp, such as is
returned by time.time(). If optional argument tz is None or not
specified, the timestamp is converted to the platform’s local date and time, and
the returned datetime object is naive.
Else tz must be an instance of a class tzinfo subclass, and the
timestamp is converted to tz‘s time zone. In this case the result is
equivalent to
tz.fromutc(datetime.utcfromtimestamp(timestamp).replace(tzinfo=tz)).
fromtimestamp() may raise ValueError, if the timestamp is out of
the range of values supported by the platform C localtime() or
gmtime() functions. It’s common for this to be restricted to years in
1970 through 2038. Note that on non-POSIX systems that include leap seconds in
their notion of a timestamp, leap seconds are ignored by fromtimestamp(),
and then it’s possible to have two timestamps differing by a second that yield
identical datetime objects. See also utcfromtimestamp().
Return the UTC datetime corresponding to the POSIX timestamp, with
tzinfoNone. This may raise ValueError, if the timestamp is
out of the range of values supported by the platform C gmtime() function.
It’s common for this to be restricted to years in 1970 through 2038. See also
fromtimestamp().
Return the datetime corresponding to the proleptic Gregorian ordinal,
where January 1 of year 1 has ordinal 1. ValueError is raised unless 1<=ordinal<=datetime.max.toordinal(). The hour, minute, second and
microsecond of the result are all 0, and tzinfo is None.
Return a new datetime object whose date components are equal to the
given date object’s, and whose time components and tzinfo
attributes are equal to the given time object’s. For any
datetime object d,
d==datetime.combine(d.date(),d.timetz()). If date is a
datetime object, its time components and tzinfo attributes
are ignored.
Return a datetime corresponding to date_string, parsed according to
format. This is equivalent to datetime(*(time.strptime(date_string,format)[0:6])). ValueError is raised if the date_string and format
can’t be parsed by time.strptime() or if it returns a value which isn’t a
time tuple. See section strftime() and strptime() Behavior.
datetime2 is a duration of timedelta removed from datetime1, moving forward in
time if timedelta.days > 0, or backward if timedelta.days < 0. The
result has the same tzinfo attribute as the input datetime, and
datetime2 - datetime1 == timedelta after. OverflowError is raised if
datetime2.year would be smaller than MINYEAR or larger than
MAXYEAR. Note that no time zone adjustments are done even if the
input is an aware object.
Computes the datetime2 such that datetime2 + timedelta == datetime1. As for
addition, the result has the same tzinfo attribute as the input
datetime, and no time zone adjustments are done even if the input is aware.
This isn’t quite equivalent to datetime1 + (-timedelta), because -timedelta
in isolation can overflow in cases where datetime1 - timedelta does not.
Subtraction of a datetime from a datetime is defined only if
both operands are naive, or if both are aware. If one is aware and the other is
naive, TypeError is raised.
If both are naive, or both are aware and have the same tzinfo attribute,
the tzinfo attributes are ignored, and the result is a timedelta
object t such that datetime2+t==datetime1. No time zone adjustments
are done in this case.
If both are aware and have different tzinfo attributes, a-b acts
as if a and b were first converted to naive UTC datetimes first. The
result is (a.replace(tzinfo=None)-a.utcoffset())-(b.replace(tzinfo=None)-b.utcoffset()) except that the implementation never overflows.
datetime1 is considered less than datetime2 when datetime1 precedes
datetime2 in time.
If one comparand is naive and the other is aware, TypeError is raised.
If both comparands are aware, and have the same tzinfo attribute, the
common tzinfo attribute is ignored and the base datetimes are
compared. If both comparands are aware and have different tzinfo
attributes, the comparands are first adjusted by subtracting their UTC
offsets (obtained from self.utcoffset()).
Note
In order to stop comparison from falling back to the default scheme of comparing
object addresses, datetime comparison normally raises TypeError if the
other comparand isn’t also a datetime object. However,
NotImplemented is returned instead if the other comparand has a
timetuple() attribute. This hook gives other kinds of date objects a
chance at implementing mixed-type comparison. If not, when a datetime
object is compared to an object of a different type, TypeError is raised
unless the comparison is == or !=. The latter cases return
False or True, respectively.
datetime objects can be used as dictionary keys. In Boolean contexts,
all datetime objects are considered to be true.
Return a datetime with the same attributes, except for those attributes given
new values by whichever keyword arguments are specified. Note that
tzinfo=None can be specified to create a naive datetime from an aware
datetime with no conversion of date and time data.
Return a datetime object with new tzinfo attribute tz,
adjusting the date and time data so the result is the same UTC time as
self, but in tz‘s local time.
tz must be an instance of a tzinfo subclass, and its
utcoffset() and dst() methods must not return None. self must
be aware (self.tzinfo must not be None, and self.utcoffset() must
not return None).
If self.tzinfo is tz, self.astimezone(tz) is equal to self: no
adjustment of date or time data is performed. Else the result is local
time in time zone tz, representing the same UTC time as self: after
astz=dt.astimezone(tz), astz-astz.utcoffset() will usually have
the same date and time data as dt-dt.utcoffset(). The discussion
of class tzinfo explains the cases at Daylight Saving Time transition
boundaries where this cannot be achieved (an issue only if tz models both
standard and daylight time).
If you merely want to attach a time zone object tz to a datetime dt without
adjustment of date and time data, use dt.replace(tzinfo=tz). If you
merely want to remove the time zone object from an aware datetime dt without
conversion of date and time data, use dt.replace(tzinfo=None).
def astimezone(self, tz):
if self.tzinfo is tz:
return self
# Convert self to UTC, and attach the new time zone object.
utc = (self - self.utcoffset()).replace(tzinfo=tz)
# Convert from UTC to tz's local time.
return tz.fromutc(utc)
If tzinfo is None, returns None, else returns
self.tzinfo.utcoffset(self), and raises an exception if the latter doesn’t
return None, or a timedelta object representing a whole number of
minutes with magnitude less than one day.
If tzinfo is None, returns None, else returns
self.tzinfo.dst(self), and raises an exception if the latter doesn’t return
None, or a timedelta object representing a whole number of minutes
with magnitude less than one day.
Return a time.struct_time such as returned by time.localtime().
d.timetuple() is equivalent to time.struct_time((d.year,d.month,d.day,d.hour,d.minute,d.second,d.weekday(),yday,dst)), where yday=d.toordinal()-date(d.year,1,1).toordinal()+1 is the day number within
the current year starting with 1 for January 1st. The tm_isdst flag
of the result is set according to the dst() method: tzinfo is
None or dst() returns None, tm_isdst is set to -1;
else if dst() returns a non-zero value, tm_isdst is set to 1;
else tm_isdst is set to 0.
If datetime instance d is naive, this is the same as
d.timetuple() except that tm_isdst is forced to 0 regardless of what
d.dst() returns. DST is never in effect for a UTC time.
If d is aware, d is normalized to UTC time, by subtracting
d.utcoffset(), and a time.struct_time for the
normalized time is returned. tm_isdst is forced to 0. Note
that an OverflowError may be raised if d.year was
MINYEAR or MAXYEAR and UTC adjustment spills over a year
boundary.
Return a string representing the date and time in ISO 8601 format,
YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0,
YYYY-MM-DDTHH:MM:SS
If utcoffset() does not return None, a 6-character string is
appended, giving the UTC offset in (signed) hours and minutes:
YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0
YYYY-MM-DDTHH:MM:SS+HH:MM
The optional argument sep (default 'T') is a one-character separator,
placed between the date and time portions of the result. For example,
Return a string representing the date and time, for example datetime(2002,12,4,20,30,40).ctime()=='WedDec420:30:402002'. d.ctime() is
equivalent to time.ctime(time.mktime(d.timetuple())) on platforms where the
native C ctime() function (which time.ctime() invokes, but which
datetime.ctime() does not invoke) conforms to the C standard.
Return a string representing the date and time, controlled by an explicit format
string. See section strftime() and strptime() Behavior.
Examples of working with datetime objects:
>>> from datetime import datetime, date, time
>>> # Using datetime.combine()
>>> d = date(2005, 7, 14)
>>> t = time(12, 30)
>>> datetime.combine(d, t)
datetime.datetime(2005, 7, 14, 12, 30)
>>> # Using datetime.now() or datetime.utcnow()
>>> datetime.now()
datetime.datetime(2007, 12, 6, 16, 29, 43, 79043) # GMT +1
>>> datetime.utcnow()
datetime.datetime(2007, 12, 6, 15, 29, 43, 79060)
>>> # Using datetime.strptime()
>>> dt = datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M")
>>> dt
datetime.datetime(2006, 11, 21, 16, 30)
>>> # Using datetime.timetuple() to get tuple of all attributes
>>> tt = dt.timetuple()
>>> for it in tt:
... print(it)
...
2006 # year
11 # month
21 # day
16 # hour
30 # minute
0 # second
1 # weekday (0 = Monday)
325 # number of days since 1st January
-1 # dst - method tzinfo.dst() returned None
>>> # Date in ISO format
>>> ic = dt.isocalendar()
>>> for it in ic:
... print(it)
...
2006 # ISO year
47 # ISO week
2 # ISO weekday
>>> # Formatting datetime
>>> dt.strftime("%A, %d. %B %Y %I:%M%p")
'Tuesday, 21. November 2006 04:30PM'
Using datetime with tzinfo:
>>> from datetime import timedelta, datetime, tzinfo
>>> class GMT1(tzinfo):
... def __init__(self): # DST starts last Sunday in March
... d = datetime(dt.year, 4, 1) # ends last Sunday in October
... self.dston = d - timedelta(days=d.weekday() + 1)
... d = datetime(dt.year, 11, 1)
... self.dstoff = d - timedelta(days=d.weekday() + 1)
... def utcoffset(self, dt):
... return timedelta(hours=1) + self.dst(dt)
... def dst(self, dt):
... if self.dston <= dt.replace(tzinfo=None) < self.dstoff:
... return timedelta(hours=1)
... else:
... return timedelta(0)
... def tzname(self,dt):
... return "GMT +1"
...
>>> class GMT2(tzinfo):
... def __init__(self):
... d = datetime(dt.year, 4, 1)
... self.dston = d - timedelta(days=d.weekday() + 1)
... d = datetime(dt.year, 11, 1)
... self.dstoff = d - timedelta(days=d.weekday() + 1)
... def utcoffset(self, dt):
... return timedelta(hours=1) + self.dst(dt)
... def dst(self, dt):
... if self.dston <= dt.replace(tzinfo=None) < self.dstoff:
... return timedelta(hours=2)
... else:
... return timedelta(0)
... def tzname(self,dt):
... return "GMT +2"
...
>>> gmt1 = GMT1()
>>> # Daylight Saving Time
>>> dt1 = datetime(2006, 11, 21, 16, 30, tzinfo=gmt1)
>>> dt1.dst()
datetime.timedelta(0)
>>> dt1.utcoffset()
datetime.timedelta(0, 3600)
>>> dt2 = datetime(2006, 6, 14, 13, 0, tzinfo=gmt1)
>>> dt2.dst()
datetime.timedelta(0, 3600)
>>> dt2.utcoffset()
datetime.timedelta(0, 7200)
>>> # Convert datetime to another time zone
>>> dt3 = dt2.astimezone(GMT2())
>>> dt3 # doctest: +ELLIPSIS
datetime.datetime(2006, 6, 14, 14, 0, tzinfo=<GMT2 object at 0x...>)
>>> dt2 # doctest: +ELLIPSIS
datetime.datetime(2006, 6, 14, 13, 0, tzinfo=<GMT1 object at 0x...>)
>>> dt2.utctimetuple() == dt3.utctimetuple()
True
The smallest possible difference between non-equal time objects,
timedelta(microseconds=1), although note that arithmetic on time
objects is not supported.
The object passed as the tzinfo argument to the time constructor, or
None if none was passed.
Supported operations:
comparison of time to time, where a is considered less
than b when a precedes b in time. If one comparand is naive and the other
is aware, TypeError is raised. If both comparands are aware, and have
the same tzinfo attribute, the common tzinfo attribute is
ignored and the base times are compared. If both comparands are aware and
have different tzinfo attributes, the comparands are first adjusted by
subtracting their UTC offsets (obtained from self.utcoffset()). In order
to stop mixed-type comparisons from falling back to the default comparison by
object address, when a time object is compared to an object of a
different type, TypeError is raised unless the comparison is == or
!=. The latter cases return False or True, respectively.
hash, use as dict key
efficient pickling
in Boolean contexts, a time object is considered to be true if and
only if, after converting it to minutes and subtracting utcoffset() (or
0 if that’s None), the result is non-zero.
Return a time with the same value, except for those attributes given
new values by whichever keyword arguments are specified. Note that
tzinfo=None can be specified to create a naive time from an
aware time, without conversion of the time data.
Return a string representing the time in ISO 8601 format, HH:MM:SS.mmmmmm or, if
self.microsecond is 0, HH:MM:SS If utcoffset() does not return None, a
6-character string is appended, giving the UTC offset in (signed) hours and
minutes: HH:MM:SS.mmmmmm+HH:MM or, if self.microsecond is 0, HH:MM:SS+HH:MM
If tzinfo is None, returns None, else returns
self.tzinfo.utcoffset(None), and raises an exception if the latter doesn’t
return None or a timedelta object representing a whole number of
minutes with magnitude less than one day.
If tzinfo is None, returns None, else returns
self.tzinfo.dst(None), and raises an exception if the latter doesn’t return
None, or a timedelta object representing a whole number of minutes
with magnitude less than one day.
tzinfo is an abstract base class, meaning that this class should not be
instantiated directly. You need to derive a concrete subclass, and (at least)
supply implementations of the standard tzinfo methods needed by the
datetime methods you use. The datetime module supplies
a simple concrete subclass of tzinfotimezone which can reprsent
timezones with fixed offset from UTC such as UTC itself or North American EST and
EDT.
An instance of (a concrete subclass of) tzinfo can be passed to the
constructors for datetime and time objects. The latter objects
view their attributes as being in local time, and the tzinfo object
supports methods revealing offset of local time from UTC, the name of the time
zone, and DST offset, all relative to a date or time object passed to them.
Special requirement for pickling: A tzinfo subclass must have an
__init__() method that can be called with no arguments, else it can be
pickled but possibly not unpickled again. This is a technical requirement that
may be relaxed in the future.
A concrete subclass of tzinfo may need to implement the following
methods. Exactly which methods are needed depends on the uses made of aware
datetime objects. If in doubt, simply implement all of them.
Return offset of local time from UTC, in minutes east of UTC. If local time is
west of UTC, this should be negative. Note that this is intended to be the
total offset from UTC; for example, if a tzinfo object represents both
time zone and DST adjustments, utcoffset() should return their sum. If
the UTC offset isn’t known, return None. Else the value returned must be a
timedelta object specifying a whole number of minutes in the range
-1439 to 1439 inclusive (1440 = 24*60; the magnitude of the offset must be less
than one day). Most implementations of utcoffset() will probably look
like one of these two:
return CONSTANT # fixed-offset class
return CONSTANT + self.dst(dt) # daylight-aware class
If utcoffset() does not return None, dst() should not return
None either.
Return the daylight saving time (DST) adjustment, in minutes east of UTC, or
None if DST information isn’t known. Return timedelta(0) if DST is not
in effect. If DST is in effect, return the offset as a timedelta object
(see utcoffset() for details). Note that DST offset, if applicable, has
already been added to the UTC offset returned by utcoffset(), so there’s
no need to consult dst() unless you’re interested in obtaining DST info
separately. For example, datetime.timetuple() calls its tzinfo
attribute’s dst() method to determine how the tm_isdst flag
should be set, and tzinfo.fromutc() calls dst() to account for
DST changes when crossing time zones.
An instance tz of a tzinfo subclass that models both standard and
daylight times must be consistent in this sense:
tz.utcoffset(dt)-tz.dst(dt)
must return the same result for every datetimedt with dt.tzinfo==tz For sane tzinfo subclasses, this expression yields the time
zone’s “standard offset”, which should not depend on the date or the time, but
only on geographic location. The implementation of datetime.astimezone()
relies on this, but cannot detect violations; it’s the programmer’s
responsibility to ensure it. If a tzinfo subclass cannot guarantee
this, it may be able to override the default implementation of
tzinfo.fromutc() to work correctly with astimezone() regardless.
Most implementations of dst() will probably look like one of these two:
def dst(self):
# a fixed-offset class: doesn't account for DST
return timedelta(0)
or
def dst(self):
# Code to set dston and dstoff to the time zone's DST
# transition times based on the input dt.year, and expressed
# in standard local time. Then
if dston <= dt.replace(tzinfo=None) < dstoff:
return timedelta(hours=1)
else:
return timedelta(0)
Return the time zone name corresponding to the datetime object dt, as
a string. Nothing about string names is defined by the datetime module,
and there’s no requirement that it mean anything in particular. For example,
“GMT”, “UTC”, “-500”, “-5:00”, “EDT”, “US/Eastern”, “America/New York” are all
valid replies. Return None if a string name isn’t known. Note that this is
a method rather than a fixed string primarily because some tzinfo
subclasses will wish to return different names depending on the specific value
of dt passed, especially if the tzinfo class is accounting for
daylight time.
These methods are called by a datetime or time object, in
response to their methods of the same names. A datetime object passes
itself as the argument, and a time object passes None as the
argument. A tzinfo subclass’s methods should therefore be prepared to
accept a dt argument of None, or of class datetime.
When None is passed, it’s up to the class designer to decide the best
response. For example, returning None is appropriate if the class wishes to
say that time objects don’t participate in the tzinfo protocols. It
may be more useful for utcoffset(None) to return the standard UTC offset, as
there is no other convention for discovering the standard offset.
When a datetime object is passed in response to a datetime
method, dt.tzinfo is the same object as self. tzinfo methods can
rely on this, unless user code calls tzinfo methods directly. The
intent is that the tzinfo methods interpret dt as being in local
time, and not need worry about objects in other timezones.
There is one more tzinfo method that a subclass may wish to override:
This is called from the default datetime.astimezone()
implementation. When called from that, dt.tzinfo is self, and dt‘s
date and time data are to be viewed as expressing a UTC time. The purpose
of fromutc() is to adjust the date and time data, returning an
equivalent datetime in self‘s local time.
Most tzinfo subclasses should be able to inherit the default
fromutc() implementation without problems. It’s strong enough to handle
fixed-offset time zones, and time zones accounting for both standard and
daylight time, and the latter even if the DST transition times differ in
different years. An example of a time zone the default fromutc()
implementation may not handle correctly in all cases is one where the standard
offset (from UTC) depends on the specific date and time passed, which can happen
for political reasons. The default implementations of astimezone() and
fromutc() may not produce the result you want if the result is one of the
hours straddling the moment the standard offset changes.
Skipping code for error cases, the default fromutc() implementation acts
like:
def fromutc(self, dt):
# raise ValueError error if dt.tzinfo is not self
dtoff = dt.utcoffset()
dtdst = dt.dst()
# raise ValueError if dtoff is None or dtdst is None
delta = dtoff - dtdst # this is self's standard offset
if delta:
dt += delta # convert to standard local time
dtdst = dt.dst()
# raise ValueError if dtdst is None
if dtdst:
return dt + dtdst
else:
return dt
from datetime import tzinfo, timedelta, datetime
ZERO = timedelta(0)
HOUR = timedelta(hours=1)
# A UTC class.
class UTC(tzinfo):
"""UTC"""
def utcoffset(self, dt):
return ZERO
def tzname(self, dt):
return "UTC"
def dst(self, dt):
return ZERO
utc = UTC()
# A class building tzinfo objects for fixed-offset time zones.
# Note that FixedOffset(0, "UTC") is a different way to build a
# UTC tzinfo object.
class FixedOffset(tzinfo):
"""Fixed offset in minutes east from UTC."""
def __init__(self, offset, name):
self.__offset = timedelta(minutes=offset)
self.__name = name
def utcoffset(self, dt):
return self.__offset
def tzname(self, dt):
return self.__name
def dst(self, dt):
return ZERO
# A class capturing the platform's idea of local time.
import time as _time
STDOFFSET = timedelta(seconds = -_time.timezone)
if _time.daylight:
DSTOFFSET = timedelta(seconds = -_time.altzone)
else:
DSTOFFSET = STDOFFSET
DSTDIFF = DSTOFFSET - STDOFFSET
class LocalTimezone(tzinfo):
def utcoffset(self, dt):
if self._isdst(dt):
return DSTOFFSET
else:
return STDOFFSET
def dst(self, dt):
if self._isdst(dt):
return DSTDIFF
else:
return ZERO
def tzname(self, dt):
return _time.tzname[self._isdst(dt)]
def _isdst(self, dt):
tt = (dt.year, dt.month, dt.day,
dt.hour, dt.minute, dt.second,
dt.weekday(), 0, 0)
stamp = _time.mktime(tt)
tt = _time.localtime(stamp)
return tt.tm_isdst > 0
Local = LocalTimezone()
# A complete implementation of current DST rules for major US time zones.
def first_sunday_on_or_after(dt):
days_to_go = 6 - dt.weekday()
if days_to_go:
dt += timedelta(days_to_go)
return dt
# US DST Rules
#
# This is a simplified (i.e., wrong for a few cases) set of rules for US
# DST start and end times. For a complete and up-to-date set of DST rules
# and timezone definitions, visit the Olson Database (or try pytz):
# http://www.twinsun.com/tz/tz-link.htm
# http://sourceforge.net/projects/pytz/ (might not be up-to-date)
#
# In the US, since 2007, DST starts at 2am (standard time) on the second
# Sunday in March, which is the first Sunday on or after Mar 8.
DSTSTART_2007 = datetime(1, 3, 8, 2)
# and ends at 2am (DST time; 1am standard time) on the first Sunday of Nov.
DSTEND_2007 = datetime(1, 11, 1, 1)
# From 1987 to 2006, DST used to start at 2am (standard time) on the first
# Sunday in April and to end at 2am (DST time; 1am standard time) on the last
# Sunday of October, which is the first Sunday on or after Oct 25.
DSTSTART_1987_2006 = datetime(1, 4, 1, 2)
DSTEND_1987_2006 = datetime(1, 10, 25, 1)
# From 1967 to 1986, DST used to start at 2am (standard time) on the last
# Sunday in April (the one on or after April 24) and to end at 2am (DST time;
# 1am standard time) on the last Sunday of October, which is the first Sunday
# on or after Oct 25.
DSTSTART_1967_1986 = datetime(1, 4, 24, 2)
DSTEND_1967_1986 = DSTEND_1987_2006
class USTimeZone(tzinfo):
def __init__(self, hours, reprname, stdname, dstname):
self.stdoffset = timedelta(hours=hours)
self.reprname = reprname
self.stdname = stdname
self.dstname = dstname
def __repr__(self):
return self.reprname
def tzname(self, dt):
if self.dst(dt):
return self.dstname
else:
return self.stdname
def utcoffset(self, dt):
return self.stdoffset + self.dst(dt)
def dst(self, dt):
if dt is None or dt.tzinfo is None:
# An exception may be sensible here, in one or both cases.
# It depends on how you want to treat them. The default
# fromutc() implementation (called by the default astimezone()
# implementation) passes a datetime with dt.tzinfo is self.
return ZERO
assert dt.tzinfo is self
# Find start and end times for US DST. For years before 1967, return
# ZERO for no DST.
if 2006 < dt.year:
dststart, dstend = DSTSTART_2007, DSTEND_2007
elif 1986 < dt.year < 2007:
dststart, dstend = DSTSTART_1987_2006, DSTEND_1987_2006
elif 1966 < dt.year < 1987:
dststart, dstend = DSTSTART_1967_1986, DSTEND_1967_1986
else:
return ZERO
start = first_sunday_on_or_after(dststart.replace(year=dt.year))
end = first_sunday_on_or_after(dstend.replace(year=dt.year))
# Can't compare naive to aware objects, so strip the timezone from
# dt first.
if start <= dt.replace(tzinfo=None) < end:
return HOUR
else:
return ZERO
Eastern = USTimeZone(-5, "Eastern", "EST", "EDT")
Central = USTimeZone(-6, "Central", "CST", "CDT")
Mountain = USTimeZone(-7, "Mountain", "MST", "MDT")
Pacific = USTimeZone(-8, "Pacific", "PST", "PDT")
Note that there are unavoidable subtleties twice per year in a tzinfo
subclass accounting for both standard and daylight time, at the DST transition
points. For concreteness, consider US Eastern (UTC -0500), where EDT begins the
minute after 1:59 (EST) on the second Sunday in March, and ends the minute after
1:59 (EDT) on the first Sunday in November:
When DST starts (the “start” line), the local wall clock leaps from 1:59 to
3:00. A wall time of the form 2:MM doesn’t really make sense on that day, so
astimezone(Eastern) won’t deliver a result with hour==2 on the day DST
begins. In order for astimezone() to make this guarantee, the
rzinfo.dst() method must consider times in the “missing hour” (2:MM for
Eastern) to be in daylight time.
When DST ends (the “end” line), there’s a potentially worse problem: there’s an
hour that can’t be spelled unambiguously in local wall time: the last hour of
daylight time. In Eastern, that’s times of the form 5:MM UTC on the day
daylight time ends. The local wall clock leaps from 1:59 (daylight time) back
to 1:00 (standard time) again. Local times of the form 1:MM are ambiguous.
astimezone() mimics the local clock’s behavior by mapping two adjacent UTC
hours into the same local hour then. In the Eastern example, UTC times of the
form 5:MM and 6:MM both map to 1:MM when converted to Eastern. In order for
astimezone() to make this guarantee, the tzinfo.dst() method must
consider times in the “repeated hour” to be in standard time. This is easily
arranged, as in the example, by expressing DST switch times in the time zone’s
standard local time.
Applications that can’t bear such ambiguities should avoid using hybrid
tzinfo subclasses; there are no ambiguities when using timezone,
or any other fixed-offset tzinfo subclass (such as a class representing
only EST (fixed offset -5 hours), or only EDT (fixed offset -4 hours)).
A timezone object represents a timezone that is defined by a
fixed offset from UTC. Note that objects of this class cannot be used
to represent timezone information in the locations where different
offsets are used in different days of the year or where historical
changes have been made to civil time.
class datetime.timezone(offset[, name])
The offset argument must be specified as a timedelta
object representing the difference between the local time and UTC. It must
be strictly between -timedelta(hours=24) and
timedelta(hours=24) and represent a whole number of minutes,
otherwise ValueError is raised.
The name argument is optional. If specified it must be a string that
is used as the value returned by the tzname(dt) method. Otherwise,
tzname(dt) returns a string ‘UTCsHH:MM’, where s is the sign of
offset, HH and MM are two digits of offset.hours and
offset.minutes respectively.
Return the fixed value specified when the timezone instance is
constructed. The dt argument is ignored. The return value is a
timedelta instance equal to the difference between the
local time and UTC.
Return the fixed value specified when the timezone instance is
constructed or a string ‘UTCsHH:MM’, where s is the sign of
offset, HH and MM are two digits of offset.hours and
offset.minutes respectively.
date, datetime, and time objects all support a
strftime(format) method, to create a string representing the time under the
control of an explicit format string. Broadly speaking, d.strftime(fmt)
acts like the time module’s time.strftime(fmt,d.timetuple())
although not all objects support a timetuple() method.
Conversely, the datetime.strptime() class method creates a
datetime object from a string representing a date and time and a
corresponding format string. datetime.strptime(date_string,format) is
equivalent to datetime(*(time.strptime(date_string,format)[0:6])).
For time objects, the format codes for year, month, and day should not
be used, as time objects have no such values. If they’re used anyway, 1900
is substituted for the year, and 1 for the month and day.
For date objects, the format codes for hours, minutes, seconds, and
microseconds should not be used, as date objects have no such
values. If they’re used anyway, 0 is substituted for them.
For a naive object, the %z and %Z format codes are replaced by empty
strings.
For an aware object:
%z
utcoffset() is transformed into a 5-character string of the form +HHMM or
-HHMM, where HH is a 2-digit string giving the number of UTC offset hours, and
MM is a 2-digit string giving the number of UTC offset minutes. For example, if
utcoffset() returns timedelta(hours=-3,minutes=-30), %z is
replaced with the string '-0330'.
%Z
If tzname() returns None, %Z is replaced by an empty string.
Otherwise %Z is replaced by the returned value, which must be a string.
The full set of format codes supported varies across platforms, because Python
calls the platform C library’s strftime() function, and platform
variations are common.
The following is a list of all the format codes that the C standard (1989
version) requires, and these work on all platforms with a standard C
implementation. Note that the 1999 version of the C standard added additional
format codes.
Directive
Meaning
Notes
%a
Locale’s abbreviated weekday
name.
%A
Locale’s full weekday name.
%b
Locale’s abbreviated month
name.
%B
Locale’s full month name.
%c
Locale’s appropriate date and
time representation.
%d
Day of the month as a decimal
number [01,31].
%f
Microsecond as a decimal
number [0,999999], zero-padded
on the left
(1)
%H
Hour (24-hour clock) as a
decimal number [00,23].
%I
Hour (12-hour clock) as a
decimal number [01,12].
%j
Day of the year as a decimal
number [001,366].
%m
Month as a decimal number
[01,12].
%M
Minute as a decimal number
[00,59].
%p
Locale’s equivalent of either
AM or PM.
(2)
%S
Second as a decimal number
[00,59].
(3)
%U
Week number of the year
(Sunday as the first day of
the week) as a decimal number
[00,53]. All days in a new
year preceding the first
Sunday are considered to be in
week 0.
(4)
%w
Weekday as a decimal number
[0(Sunday),6].
%W
Week number of the year
(Monday as the first day of
the week) as a decimal number
[00,53]. All days in a new
year preceding the first
Monday are considered to be in
week 0.
(4)
%x
Locale’s appropriate date
representation.
%X
Locale’s appropriate time
representation.
%y
Year without century as a
decimal number [00,99].
%Y
Year with century as a decimal
number [0001,9999] (strptime),
[1000,9999] (strftime).
(5)
%z
UTC offset in the form +HHMM
or -HHMM (empty string if the
the object is naive).
(6)
%Z
Time zone name (empty string
if the object is naive).
%%
A literal '%' character.
Notes:
When used with the strptime() method, the %f directive
accepts from one to six digits and zero pads on the right. %f is
an extension to the set of format characters in the C standard (but
implemented separately in datetime objects, and therefore always
available).
When used with the strptime() method, the %p directive only affects
the output hour field if the %I directive is used to parse the hour.
Unlike time module, datetime module does not support
leap seconds.
When used with the strptime() method, %U and %W are only used in
calculations when the day of the week and the year are specified.
For technical reasons, strftime() method does not support
dates before year 1000: t.strftime(format) will raise a
ValueError when t.year<1000 even if format does
not contain %Y directive. The strptime() method can
parse years in the full [1, 9999] range, but years < 1000 must be
zero-filled to 4-digit width.
Changed in version 3.2:
Changed in version 3.2: In previous versions, strftime() method was restricted to
years >= 1900.
For example, if utcoffset() returns timedelta(hours=-3,minutes=-30),
%z is replaced with the string '-0330'.
Changed in version 3.2:
Changed in version 3.2: When the %z directive is provided to the strptime() method, an
aware datetime object will be produced. The tzinfo of the
result will be set to a timezone instance.
This module allows you to output calendars like the Unix cal program,
and provides additional useful functions related to the calendar. By default,
these calendars have Monday as the first day of the week, and Sunday as the last
(the European convention). Use setfirstweekday() to set the first day of
the week to Sunday (6) or to any other weekday. Parameters that specify dates
are given as integers. For related
functionality, see also the datetime and time modules.
Most of these functions and classes rely on the datetime module which
uses an idealized calendar, the current Gregorian calendar extended
in both directions. This matches the definition of the “proleptic Gregorian”
calendar in Dershowitz and Reingold’s book “Calendrical Calculations”, where
it’s the base calendar for all computations.
Creates a Calendar object. firstweekday is an integer specifying the
first day of the week. 0 is Monday (the default), 6 is Sunday.
A Calendar object provides several methods that can be used for
preparing the calendar data for formatting. This class doesn’t do any formatting
itself. This is the job of subclasses.
Return an iterator for the week day numbers that will be used for one
week. The first value from the iterator will be the same as the value of
the firstweekday property.
Return an iterator for the month month (1-12) in the year year. This
iterator will return all days (as datetime.date objects) for the
month and all days before the start of the month or after the end of the
month that are required to get a complete week.
Return an iterator for the month month in the year year similar to
itermonthdates(). Days returned will be tuples consisting of a day
number and a week day number.
Return the data for the specified year ready for formatting. The return
value is a list of month rows. Each month row contains up to width
months (defaulting to 3). Each month contains between 4 and 6 weeks and
each week contains 1–7 days. Days are datetime.date objects.
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are tuples of day
numbers and weekday numbers. Day numbers outside this month are zero.
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are day numbers. Day
numbers outside this month are zero.
Return a month’s calendar in a multi-line string. If w is provided, it
specifies the width of the date columns, which are centered. If l is
given, it specifies the number of lines that each week will use. Depends
on the first weekday as specified in the constructor or set by the
setfirstweekday() method.
Return a m-column calendar for an entire year as a multi-line string.
Optional parameters w, l, and c are for date column width, lines per
week, and number of spaces between month columns, respectively. Depends on
the first weekday as specified in the constructor or set by the
setfirstweekday() method. The earliest year for which a calendar
can be generated is platform-dependent.
Return a year’s calendar as a complete HTML page. width (defaulting to
3) specifies the number of months per row. css is the name for the
cascading style sheet to be used. None can be passed if no style
sheet should be used. encoding specifies the encoding to be used for the
output (defaulting to the system default encoding).
class calendar.LocaleTextCalendar(firstweekday=0, locale=None)¶
This subclass of TextCalendar can be passed a locale name in the
constructor and will return month and weekday names in the specified locale.
If this locale includes an encoding all strings containing month and weekday
names will be returned as unicode.
class calendar.LocaleHTMLCalendar(firstweekday=0, locale=None)¶
This subclass of HTMLCalendar can be passed a locale name in the
constructor and will return month and weekday names in the specified
locale. If this locale includes an encoding all strings containing month and
weekday names will be returned as unicode.
Note
The formatweekday() and formatmonthname() methods of these two
classes temporarily change the current locale to the given locale. Because
the current locale is a process-wide setting, they are not thread-safe.
For simple text calendars this module provides the following functions.
Sets the weekday (0 is Monday, 6 is Sunday) to start each week. The
values MONDAY, TUESDAY, WEDNESDAY, THURSDAY,
FRIDAY, SATURDAY, and SUNDAY are provided for
convenience. For example, to set the first weekday to Sunday:
Returns a matrix representing a month’s calendar. Each row represents a week;
days outside of the month a represented by zeros. Each week begins with Monday
unless set by setfirstweekday().
An unrelated but handy function that takes a time tuple such as returned by the
gmtime() function in the time module, and returns the corresponding
Unix timestamp value, assuming an epoch of 1970, and the POSIX encoding. In
fact, time.gmtime() and timegm() are each others’ inverse.
The calendar module exports the following data attributes:
An array that represents the months of the year in the current locale. This
follows normal convention of January being month number 1, so it has a length of
13 and month_name[0] is the empty string.
An array that represents the abbreviated months of the year in the current
locale. This follows normal convention of January being month number 1, so it
has a length of 13 and month_abbr[0] is the empty string.
This module implements specialized container datatypes providing alternatives to
Python’s general purpose built-in containers, dict, list,
set, and tuple.
wrapper around string objects for easier string subclassing
In addition to the concrete container classes, the collections module provides
abstract base classes that can be
used to test whether a class provides a particular interface, for example,
whether it is hashable or a mapping.
A counter tool is provided to support convenient and rapid tallies.
For example:
>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
... cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall('\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
A Counter is a dict subclass for counting hashable objects.
It is an unordered collection where elements are stored as dictionary keys
and their counts are stored as dictionary values. Counts are allowed to be
any integer value including zero or negative counts. The Counter
class is similar to bags or multisets in other languages.
Elements are counted from an iterable or initialized from another
mapping (or counter):
>>> c = Counter() # a new, empty counter
>>> c = Counter('gallahad') # a new counter from an iterable
>>> c = Counter({'red': 4, 'blue': 2}) # a new counter from a mapping
>>> c = Counter(cats=4, dogs=8) # a new counter from keyword args
Counter objects have a dictionary interface except that they return a zero
count for missing items instead of raising a KeyError:
>>> c = Counter(['eggs', 'ham'])
>>> c['bacon'] # count of a missing element is zero
0
Setting a count to zero does not remove an element from a counter.
Use del to remove it entirely:
>>> c['sausage'] = 0 # counter entry with a zero count
>>> del c['sausage'] # del actually removes the entry
New in version 3.1:
New in version 3.1.
Counter objects support three methods beyond those available for all
dictionaries:
Return an iterator over elements repeating each as many times as its
count. Elements are returned in arbitrary order. If an element’s count
is less than one, elements() will ignore it.
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common()
returns all elements in the counter. Elements with equal counts are
ordered arbitrarily:
Elements are subtracted from an iterable or from another mapping
(or counter). Like dict.update() but subtracts counts instead
of replacing them. Both inputs and outputs may be zero or negative.
Elements are counted from an iterable or added-in from another
mapping (or counter). Like dict.update() but adds counts
instead of replacing them. Also, the iterable is expected to be a
sequence of elements, not a sequence of (key,value) pairs.
sum(c.values()) # total of all counts
c.clear() # reset all counts
list(c) # list unique elements
set(c) # convert to a set
dict(c) # convert to a regular dictionary
c.items() # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs)) # convert from a list of (elem, cnt) pairs
c.most_common()[:-n:-1] # n least common elements
c += Counter() # remove zero and negative counts
Several mathematical operations are provided for combining Counter
objects to produce multisets (counters that have counts greater than zero).
Addition and subtraction combine counters by adding or subtracting the counts
of corresponding elements. Intersection and union return the minimum and
maximum of corresponding counts. Each operation can accept inputs with signed
counts, but the output will exclude results with counts of zero or less.
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d # intersection: min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d # union: max(c[x], d[x])
Counter({'a': 3, 'b': 2})
Note
Counters were primarily designed to work with positive integers to represent
running counts; however, care was taken to not unnecessarily preclude use
cases needing other types or negative values. To help with those use cases,
this section documents the minimum range and type restrictions.
The Counter class itself is a dictionary subclass with no
restrictions on its keys and values. The values are intended to be numbers
representing counts, but you could store anything in the value field.
The most_common() method requires only that the values be orderable.
For in-place operations such as c[key]+=1, the value type need only
support addition and subtraction. So fractions, floats, and decimals would
work and negative values are supported. The same is also true for
update() and subtract() which allow negative and zero values
for both inputs and outputs.
The multiset methods are designed only for use cases with positive values.
The inputs may be negative or zero, but only outputs with positive values
are created. There are no type restrictions, but the value type needs to
support support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and
negative counts.
For mathematical operations on multisets and their use cases, see
Knuth, Donald. The Art of Computer Programming Volume II,
Section 4.6.3, Exercise 19.
Returns a new deque object initialized left-to-right (using append()) with
data from iterable. If iterable is not specified, the new deque is empty.
Deques are a generalization of stacks and queues (the name is pronounced “deck”
and is short for “double-ended queue”). Deques support thread-safe, memory
efficient appends and pops from either side of the deque with approximately the
same O(1) performance in either direction.
Though list objects support similar operations, they are optimized for
fast fixed-length operations and incur O(n) memory movement costs for
pop(0) and insert(0,v) operations which change both the size and
position of the underlying data representation.
If maxlen is not specified or is None, deques may grow to an
arbitrary length. Otherwise, the deque is bounded to the specified maximum
length. Once a bounded length deque is full, when new items are added, a
corresponding number of items are discarded from the opposite end. Bounded
length deques provide functionality similar to the tail filter in
Unix. They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
Extend the left side of the deque by appending elements from iterable.
Note, the series of left appends results in reversing the order of
elements in the iterable argument.
In addition to the above, deques support iteration, pickling, len(d),
reversed(d), copy.copy(d), copy.deepcopy(d), membership testing with
the in operator, and subscript references such as d[-1]. Indexed
access is O(1) at both ends but slows to O(n) in the middle. For fast random
access, use lists instead.
Example:
>>> from collections import deque
>>> d = deque('ghi') # make a new deque with three items
>>> for elem in d: # iterate over the deque's elements
... print(elem.upper())
G
H
I
>>> d.append('j') # add a new entry to the right side
>>> d.appendleft('f') # add a new entry to the left side
>>> d # show the representation of the deque
deque(['f', 'g', 'h', 'i', 'j'])
>>> d.pop() # return and remove the rightmost item
'j'
>>> d.popleft() # return and remove the leftmost item
'f'
>>> list(d) # list the contents of the deque
['g', 'h', 'i']
>>> d[0] # peek at leftmost item
'g'
>>> d[-1] # peek at rightmost item
'i'
>>> list(reversed(d)) # list the contents of a deque in reverse
['i', 'h', 'g']
>>> 'h' in d # search the deque
True
>>> d.extend('jkl') # add multiple elements at once
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> d.rotate(1) # right rotation
>>> d
deque(['l', 'g', 'h', 'i', 'j', 'k'])
>>> d.rotate(-1) # left rotation
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> deque(reversed(d)) # make a new deque in reverse order
deque(['l', 'k', 'j', 'i', 'h', 'g'])
>>> d.clear() # empty the deque
>>> d.pop() # cannot pop from an empty deque
Traceback (most recent call last):
File "<pyshell#6>", line 1, in -toplevel-
d.pop()
IndexError: pop from an empty deque
>>> d.extendleft('abc') # extendleft() reverses the input order
>>> d
deque(['c', 'b', 'a'])
This section shows various approaches to working with deques.
Bounded length deques provide functionality similar to the tail filter
in Unix:
def tail(filename, n=10):
'Return the last n lines of a file'
return deque(open(filename), n)
Another approach to using deques is to maintain a sequence of recently
added elements by appending to the right and popping to the left:
def moving_average(iterable, n=3):
# moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0
# http://en.wikipedia.org/wiki/Moving_average
it = iter(iterable)
d = deque(itertools.islice(it, n-1))
d.appendleft(0)
s = sum(d)
for elem in it:
s += elem - d.popleft()
d.append(elem)
yield s / n
The rotate() method provides a way to implement deque slicing and
deletion. For example, a pure Python implementation of deld[n] relies on
the rotate() method to position elements to be popped:
To implement deque slicing, use a similar approach applying
rotate() to bring a target element to the left side of the deque. Remove
old entries with popleft(), add new entries with extend(), and then
reverse the rotation.
With minor variations on that approach, it is easy to implement Forth style
stack manipulations such as dup, drop, swap, over, pick,
rot, and roll.
class collections.defaultdict([default_factory[, ...]])¶
Returns a new dictionary-like object. defaultdict is a subclass of the
built-in dict class. It overrides one method and adds one writable
instance variable. The remaining functionality is the same as for the
dict class and is not documented here.
The first argument provides the initial value for the default_factory
attribute; it defaults to None. All remaining arguments are treated the same
as if they were passed to the dict constructor, including keyword
arguments.
defaultdict objects support the following method in addition to the
standard dict operations:
If the default_factory attribute is None, this raises a
KeyError exception with the key as argument.
If default_factory is not None, it is called without arguments
to provide a default value for the given key, this value is inserted in
the dictionary for the key, and returned.
If calling default_factory raises an exception this exception is
propagated unchanged.
This method is called by the __getitem__() method of the
dict class when the requested key is not found; whatever it
returns or raises is then returned or raised by __getitem__().
defaultdict objects support the following instance variable:
Using list as the default_factory, it is easy to group a
sequence of key-value pairs into a dictionary of lists:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
When each key is encountered for the first time, it is not already in the
mapping; so an entry is automatically created using the default_factory
function which returns an empty list. The list.append()
operation then attaches the value to the new list. When keys are encountered
again, the look-up proceeds normally (returning the list for that key) and the
list.append() operation adds another value to the list. This technique is
simpler and faster than an equivalent technique using dict.setdefault():
>>> d = {}
>>> for k, v in s:
... d.setdefault(k, []).append(v)
...
>>> list(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
Setting the default_factory to int makes the
defaultdict useful for counting (like a bag or multiset in other
languages):
>>> s = 'mississippi'
>>> d = defaultdict(int)
>>> for k in s:
... d[k] += 1
...
>>> list(d.items())
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]
When a letter is first encountered, it is missing from the mapping, so the
default_factory function calls int() to supply a default count of
zero. The increment operation then builds up the count for each letter.
The function int() which always returns zero is just a special case of
constant functions. A faster and more flexible way to create constant functions
is to use a lambda function which can supply any constant value (not just
zero):
>>> def constant_factory(value):
... return lambda: value
>>> d = defaultdict(constant_factory('<missing>'))
>>> d.update(name='John', action='ran')
>>> '%(name)s %(action)s to %(object)s' % d
'John ran to <missing>'
Setting the default_factory to set makes the
defaultdict useful for building a dictionary of sets:
>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
>>> d = defaultdict(set)
>>> for k, v in s:
... d[k].add(v)
...
>>> list(d.items())
[('blue', set([2, 4])), ('red', set([1, 3]))]
namedtuple() Factory Function for Tuples with Named Fields¶
Named tuples assign meaning to each position in a tuple and allow for more readable,
self-documenting code. They can be used wherever regular tuples are used, and
they add the ability to access fields by name instead of position index.
Returns a new tuple subclass named typename. The new subclass is used to
create tuple-like objects that have fields accessible by attribute lookup as
well as being indexable and iterable. Instances of the subclass also have a
helpful docstring (with typename and field_names) and a helpful __repr__()
method which lists the tuple contents in a name=value format.
The field_names are a single string with each fieldname separated by whitespace
and/or commas, for example 'xy' or 'x,y'. Alternatively, field_names
can be a sequence of strings such as ['x','y'].
Any valid Python identifier may be used for a fieldname except for names
starting with an underscore. Valid identifiers consist of letters, digits,
and underscores but do not start with a digit or underscore and cannot be
a keyword such as class, for, return, global, pass,
or raise.
If rename is true, invalid fieldnames are automatically replaced
with positional names. For example, ['abc','def','ghi','abc'] is
converted to ['abc','_1','ghi','_3'], eliminating the keyword
def and the duplicate fieldname abc.
If verbose is true, the class definition is printed just before being built.
Named tuple instances do not have per-instance dictionaries, so they are
lightweight and require no more memory than regular tuples.
Changed in version 3.1:
Changed in version 3.1: Added support for rename.
>>> # Basic example
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(x=10, y=11)
>>> # Example using the verbose option to print the class definition
>>> Point = namedtuple('Point', 'x y', verbose=True)
class Point(tuple):
'Point(x, y)'
__slots__ = ()
_fields = ('x', 'y')
def __new__(_cls, x, y):
'Create a new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '(x=%r, y=%r)' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values'
return OrderedDict(zip(self._fields, self))
__dict__ = property(_asdict)
def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds.keys()))
return result
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
x = _property(_itemgetter(0), doc='Alias for field number 0')
y = _property(_itemgetter(1), doc='Alias for field number 1')
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
Named tuples are especially useful for assigning field names to result tuples returned
by the csv or sqlite3 modules:
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
print(emp.name, emp.title)
import sqlite3
conn = sqlite3.connect('/companydata')
cursor = conn.cursor()
cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
for emp in map(EmployeeRecord._make, cursor.fetchall()):
print(emp.name, emp.title)
In addition to the methods inherited from tuples, named tuples support
three additional methods and one attribute. To prevent conflicts with
field names, the method and attribute names start with an underscore.
Return a new instance of the named tuple replacing specified fields with new
values:
>>> p = Point(x=11, y=22)
>>> p._replace(x=33)
Point(x=33, y=22)
>>> for partnum, record in inventory.items():
... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
Since a named tuple is a regular Python class, it is easy to add or change
functionality with a subclass. Here is how to add a calculated field and
a fixed-width print format:
>>> for p in Point(3, 4), Point(14, 5/7):
print(p)
Point: x= 3.000 y= 4.000 hypot= 5.000
Point: x=14.000 y= 0.714 hypot=14.018
The subclass shown above sets __slots__ to an empty tuple. This helps
keep memory requirements low by preventing the creation of instance dictionaries.
Subclassing is not useful for adding new, stored fields. Instead, simply
create a new named tuple type from the _fields attribute:
Ordered dictionaries are just like regular dictionaries but they remember the
order that items were inserted. When iterating over an ordered dictionary,
the items are returned in the order their keys were first added.
Return an instance of a dict subclass, supporting the usual dict
methods. An OrderedDict is a dict that remembers the order that keys
were first inserted. If a new entry overwrites an existing entry, the
original insertion position is left unchanged. Deleting an entry and
reinserting it will move it to the end.
The popitem() method for ordered dictionaries returns and removes a
(key, value) pair. The pairs are returned in LIFO order if last is true
or FIFO order if false.
Move an existing key to either end of an ordered dictionary. The item
is moved to the right end if last is true (the default) or to the
beginning if last is false. Raises KeyError if the key does
not exist:
In addition to the usual mapping methods, ordered dictionaries also support
reverse iteration using reversed().
Equality tests between OrderedDict objects are order-sensitive
and are implemented as list(od1.items())==list(od2.items()).
Equality tests between OrderedDict objects and other
Mapping objects are order-insensitive like regular dictionaries.
This allows OrderedDict objects to be substituted anywhere a
regular dictionary is used.
The OrderedDict constructor and update() method both accept
keyword arguments, but their order is lost because Python’s function call
semantics pass-in keyword arguments using a regular unordered dictionary.
The new sorted dictionaries maintain their sort order when entries
are deleted. But when new keys are added, the keys are appended
to the end and the sort is not maintained.
It is also straight-forward to create an ordered dictionary variant
that the remembers the order the keys were last inserted.
If a new entry overwrites an existing entry, the
original insertion position is changed and moved to the end:
class LastUpdatedOrderedDict(OrderedDict):
'Store items in the order the keys were last added'
def __setitem__(self, key, value):
if key in self:
del self[key]
OrderedDict.__setitem__(self, key, value)
An ordered dictionary can combined with the Counter class
so that the counter remembers the order elements are first encountered:
class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
def __reduce__(self):
return self.__class__, (OrderedDict(self),)
The class, UserDict acts as a wrapper around dictionary objects.
The need for this class has been partially supplanted by the ability to
subclass directly from dict; however, this class can be easier
to work with because the underlying dictionary is accessible as an
attribute.
Class that simulates a dictionary. The instance’s contents are kept in a
regular dictionary, which is accessible via the data attribute of
UserDict instances. If initialdata is provided, data is
initialized with its contents; note that a reference to initialdata will not
be kept, allowing it be used for other purposes.
In addition to supporting the methods and operations of mappings,
UserDict instances provide the following attribute:
This class acts as a wrapper around list objects. It is a useful base class
for your own list-like classes which can inherit from them and override
existing methods or add new ones. In this way, one can add new behaviors to
lists.
The need for this class has been partially supplanted by the ability to
subclass directly from list; however, this class can be easier
to work with because the underlying list is accessible as an attribute.
Class that simulates a list. The instance’s contents are kept in a regular
list, which is accessible via the data attribute of UserList
instances. The instance’s contents are initially set to a copy of list,
defaulting to the empty list []. list can be any iterable, for
example a real Python list or a UserList object.
In addition to supporting the methods and operations of mutable sequences,
UserList instances provide the following attribute:
A real list object used to store the contents of the
UserList class.
Subclassing requirements: Subclasses of UserList are expect to
offer a constructor which can be called with either no arguments or one
argument. List operations which return a new sequence attempt to create an
instance of the actual implementation class. To do so, it assumes that the
constructor can be called with a single parameter, which is a sequence object
used as a data source.
If a derived class does not wish to comply with this requirement, all of the
special methods supported by this class will need to be overridden; please
consult the sources for information about the methods which need to be provided
in that case.
The class, UserString acts as a wrapper around string objects.
The need for this class has been partially supplanted by the ability to
subclass directly from str; however, this class can be easier
to work with because the underlying string is accessible as an
attribute.
Class that simulates a string or a Unicode string object. The instance’s
content is kept in a regular string object, which is accessible via the
data attribute of UserString instances. The instance’s
contents are initially set to a copy of sequence. The sequence can
be an instance of bytes, str, UserString (or a
subclass) or an arbitrary sequence which can be converted into a string using
the built-in str() function.
These ABCs allow us to ask classes or instances if they provide
particular functionality, for example:
size = None
if isinstance(myvar, collections.Sized):
size = len(myvar)
Several of the ABCs are also useful as mixins that make it easier to develop
classes supporting container APIs. For example, to write a class supporting
the full Set API, it only necessary to supply the three underlying
abstract methods: __contains__(), __iter__(), and __len__().
The ABC supplies the remaining methods such as __and__() and
isdisjoint()
class ListBasedSet(collections.Set):
''' Alternate set implementation favoring space over speed
and not requiring the set elements to be hashable. '''
def __init__(self, iterable):
self.elements = lst = []
for value in iterable:
if value not in lst:
lst.append(value)
def __iter__(self):
return iter(self.elements)
def __contains__(self, value):
return value in self.elements
def __len__(self):
return len(self.elements)
s1 = ListBasedSet('abcdef')
s2 = ListBasedSet('defghi')
overlap = s1 & s2 # The __and__() method is supported automatically
Since some set operations create new sets, the default mixin methods need
a way to create new instances from an iterable. The class constructor is
assumed to have a signature in the form ClassName(iterable).
That assumption is factored-out to an internal classmethod called
_from_iterable() which calls cls(iterable) to produce a new set.
If the Set mixin is being used in a class with a different
constructor signature, you will need to override _from_iterable()
with a classmethod that can construct new instances from
an iterable argument.
To override the comparisons (presumably for speed, as the
semantics are fixed), redefine __le__() and
then the other operations will automatically follow suit.
The Set mixin provides a _hash() method to compute a hash value
for the set; however, __hash__() is not defined because not all sets
are hashable or immutable. To add set hashabilty using mixins,
inherit from both Set() and Hashable(), then define
__hash__=Set._hash.
This module provides an implementation of the heap queue algorithm, also known
as the priority queue algorithm.
Heaps are binary trees for which every parent node has a value less than or
equal to any of its children. This implementation uses arrays for which
heap[k]<=heap[2*k+1] and heap[k]<=heap[2*k+2] for all k, counting
elements from zero. For the sake of comparison, non-existing elements are
considered to be infinite. The interesting property of a heap is that its
smallest element is always the root, heap[0].
The API below differs from textbook heap algorithms in two aspects: (a) We use
zero-based indexing. This makes the relationship between the index for a node
and the indexes for its children slightly less obvious, but is more suitable
since Python uses zero-based indexing. (b) Our pop method returns the smallest
item, not the largest (called a “min heap” in textbooks; a “max heap” is more
common in texts because of its suitability for in-place sorting).
These two make it possible to view the heap as a regular Python list without
surprises: heap[0] is the smallest item, and heap.sort() maintains the
heap invariant!
To create a heap, use a list initialized to [], or you can transform a
populated list into a heap via function heapify().
Push item on the heap, then pop and return the smallest item from the
heap. The combined action runs more efficiently than heappush()
followed by a separate call to heappop().
Pop and return the smallest item from the heap, and also push the new item.
The heap size doesn’t change. If the heap is empty, IndexError is raised.
This one step operation is more efficient than a heappop() followed by
heappush() and can be more appropriate when using a fixed-size heap.
The pop/push combination always returns an element from the heap and replaces
it with item.
The value returned may be larger than the item added. If that isn’t
desired, consider using heappushpop() instead. Its push/pop
combination returns the smaller of the two values, leaving the larger value
on the heap.
The module also offers three general purpose functions based on heaps.
Merge multiple sorted inputs into a single sorted output (for example, merge
timestamped entries from multiple log files). Returns an iterator
over the sorted values.
Similar to sorted(itertools.chain(*iterables)) but returns an iterable, does
not pull the data into memory all at once, and assumes that each of the input
streams is already sorted (smallest to largest).
Return a list with the n largest elements from the dataset defined by
iterable. key, if provided, specifies a function of one argument that is
used to extract a comparison key from each element in the iterable:
key=str.lower Equivalent to: sorted(iterable,key=key,reverse=True)[:n]
Return a list with the n smallest elements from the dataset defined by
iterable. key, if provided, specifies a function of one argument that is
used to extract a comparison key from each element in the iterable:
key=str.lower Equivalent to: sorted(iterable,key=key)[:n]
The latter two functions perform best for smaller values of n. For larger
values, it is more efficient to use the sorted() function. Also, when
n==1, it is more efficient to use the built-in min() and max()
functions.
A priority queue is common use
for a heap, and it presents several implementation challenges:
Sort stability: how do you get two tasks with equal priorities to be returned
in the order they were originally added?
Tuple comparison breaks for (priority, task) pairs if the priorities are equal
and the tasks do not have a default comparison order.
If the priority of a task changes, how do you move it to a new position in
the heap?
Or if a pending task needs to be deleted, how do you find it and remove it
from the queue?
A solution to the first two challenges is to store entries as 3-element list
including the priority, an entry count, and the task. The entry count serves as
a tie-breaker so that two tasks with the same priority are returned in the order
they were added. And since no two entry counts are the same, the tuple
comparison will never attempt to directly compare two tasks.
The remaining challenges revolve around finding a pending task and making
changes to its priority or removing it entirely. Finding a task can be done
with a dictionary pointing to an entry in the queue.
Removing the entry or changing its priority is more difficult because it would
break the heap structure invariants. So, a possible solution is to mark an
entry as invalid and optionally add a new entry with the revised priority:
pq = [] # the priority queue list
counter = itertools.count(1) # unique sequence count
task_finder = {} # mapping of tasks to entries
INVALID = 0 # mark an entry as deleted
def add_task(priority, task, count=None):
if count is None:
count = next(counter)
entry = [priority, count, task]
task_finder[task] = entry
heappush(pq, entry)
def get_top_priority():
while True:
priority, count, task = heappop(pq)
del task_finder[task]
if count is not INVALID:
return task
def delete_task(task):
entry = task_finder[task]
entry[1] = INVALID
def reprioritize(priority, task):
entry = task_finder[task]
add_task(priority, task, entry[1])
entry[1] = INVALID
Heaps are arrays for which a[k]<=a[2*k+1] and a[k]<=a[2*k+2] for all
k, counting elements from 0. For the sake of comparison, non-existing
elements are considered to be infinite. The interesting property of a heap is
that a[0] is always its smallest element.
The strange invariant above is meant to be an efficient memory representation
for a tournament. The numbers below are k, not a[k]:
In the tree above, each cell k is topping 2*k+1 and 2*k+2. In an usual
binary tournament we see in sports, each cell is the winner over the two cells
it tops, and we can trace the winner down the tree to see all opponents s/he
had. However, in many computer applications of such tournaments, we do not need
to trace the history of a winner. To be more memory efficient, when a winner is
promoted, we try to replace it by something else at a lower level, and the rule
becomes that a cell and the two cells it tops contain three different items, but
the top cell “wins” over the two topped cells.
If this heap invariant is protected at all time, index 0 is clearly the overall
winner. The simplest algorithmic way to remove it and find the “next” winner is
to move some loser (let’s say cell 30 in the diagram above) into the 0 position,
and then percolate this new 0 down the tree, exchanging values, until the
invariant is re-established. This is clearly logarithmic on the total number of
items in the tree. By iterating over all items, you get an O(n log n) sort.
A nice feature of this sort is that you can efficiently insert new items while
the sort is going on, provided that the inserted items are not “better” than the
last 0’th element you extracted. This is especially useful in simulation
contexts, where the tree holds all incoming events, and the “win” condition
means the smallest scheduled time. When an event schedule other events for
execution, they are scheduled into the future, so they can easily go into the
heap. So, a heap is a good structure for implementing schedulers (this is what
I used for my MIDI sequencer :-).
Various structures for implementing schedulers have been extensively studied,
and heaps are good for this, as they are reasonably speedy, the speed is almost
constant, and the worst case is not much different than the average case.
However, there are other representations which are more efficient overall, yet
the worst cases might be terrible.
Heaps are also very useful in big disk sorts. You most probably all know that a
big sort implies producing “runs” (which are pre-sorted sequences, which size is
usually related to the amount of CPU memory), followed by a merging passes for
these runs, which merging is often very cleverly organised [1]. It is very
important that the initial sort produces the longest runs possible. Tournaments
are a good way to that. If, using all the memory available to hold a
tournament, you replace and percolate items that happen to fit the current run,
you’ll produce runs which are twice the size of the memory for random input, and
much better for input fuzzily ordered.
Moreover, if you output the 0’th item on disk and get an input which may not fit
in the current tournament (because the value “wins” over the last output value),
it cannot fit in the heap, so the size of the heap decreases. The freed memory
could be cleverly reused immediately for progressively building a second heap,
which grows at exactly the same rate the first heap is melting. When the first
heap completely vanishes, you switch heaps and start a new run. Clever and
quite effective!
In a word, heaps are useful memory structures to know. I use them in a few
applications, and I think it is good to keep a ‘heap’ module around. :-)
The disk balancing algorithms which are current, nowadays, are more annoying
than clever, and this is a consequence of the seeking capabilities of the disks.
On devices which cannot seek, like big tape drives, the story was quite
different, and one had to be very clever to ensure (far in advance) that each
tape movement will be the most effective possible (that is, will best
participate at “progressing” the merge). Some tapes were even able to read
backwards, and this was also used to avoid the rewinding time. Believe me, real
good tape sorts were quite spectacular to watch! From all times, sorting has
always been a Great Art! :-)
This module provides support for maintaining a list in sorted order without
having to sort the list after each insertion. For long lists of items with
expensive comparison operations, this can be an improvement over the more common
approach. The module is called bisect because it uses a basic bisection
algorithm to do its work. The source code may be most useful as a working
example of the algorithm (the boundary conditions are already right!).
Locate the insertion point for x in a to maintain sorted order.
The parameters lo and hi may be used to specify a subset of the list
which should be considered; by default the entire list is used. If x is
already present in a, the insertion point will be before (to the left of)
any existing entries. The return value is suitable for use as the first
parameter to list.insert() assuming that a is already sorted.
The returned insertion point i partitions the array a into two halves so
that all(val<xforvalina[lo:i]) for the left side and
all(val>=xforvalina[i:hi]) for the right side.
Similar to bisect_left(), but returns an insertion point which comes
after (to the right of) any existing entries of x in a.
The returned insertion point i partitions the array a into two halves so
that all(val<=xforvalina[lo:i]) for the left side and
all(val>xforvalina[i:hi]) for the right side.
Insert x in a in sorted order. This is equivalent to
a.insert(bisect.bisect_left(a,x,lo,hi),x) assuming that a is
already sorted. Keep in mind that the O(log n) search is dominated by
the slow O(n) insertion step.
Similar to insort_left(), but inserting x in a after any existing
entries of x.
See also
SortedCollection recipe that uses
bisect to build a full-featured collection class with straight-forward search
methods and support for a key-function. The keys are precomputed to save
unnecessary calls to the key function during searches.
The above bisect() functions are useful for finding insertion points but
can be tricky or awkward to use for common searching tasks. The following five
functions show how to transform them into the standard lookups for sorted
lists:
def index(a, x):
'Locate the leftmost value exactly equal to x'
i = bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
raise ValueError
def find_lt(a, x):
'Find rightmost value less than x'
i = bisect_left(a, x)
if i:
return a[i-1]
raise ValueError
def find_le(a, x):
'Find rightmost value less than or equal to x'
i = bisect_right(a, x)
if i:
return a[i-1]
raise ValueError
def find_gt(a, x):
'Find leftmost value greater than x'
i = bisect_right(a, x)
if i != len(a):
return a[i]
raise ValueError
def find_ge(a, x):
'Find leftmost item greater than or equal to x'
i = bisect_left(a, x)
if i != len(a):
return a[i]
raise ValueError
The bisect() function can be useful for numeric table lookups. This
example uses bisect() to look up a letter grade for an exam score (say)
based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80 to 89 is
a ‘B’, and so on:
Unlike the sorted() function, it does not make sense for the bisect()
functions to have key or reversed arguments because that would lead to an
inefficient design (successive calls to bisect functions would not “remember”
all of the previous key lookups).
Instead, it is better to search a list of precomputed keys to find the index
of the record in question:
>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data] # precomputed list of keys
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)
This module defines an object type which can compactly represent an array of
basic values: characters, integers, floating point numbers. Arrays are sequence
types and behave very much like lists, except that the type of objects stored in
them is constrained. The type is specified at object creation time by using a
type code, which is a single character. The following type codes are
defined:
Type code
C Type
Python Type
Minimum size in bytes
'b'
signed char
int
1
'B'
unsigned char
int
1
'u'
Py_UNICODE
Unicode character
2 (see note)
'h'
signed short
int
2
'H'
unsigned short
int
2
'i'
signed int
int
2
'I'
unsigned int
int
2
'l'
signed long
int
4
'L'
unsigned long
int
4
'f'
float
float
4
'd'
double
float
8
Note
The 'u' typecode corresponds to Python’s unicode character. On narrow
Unicode builds this is 2-bytes, on wide builds this is 4-bytes.
The actual representation of values is determined by the machine architecture
(strictly speaking, by the C implementation). The actual size can be accessed
through the itemsize attribute.
A new array whose items are restricted by typecode, and initialized
from the optional initializer value, which must be a list, object
supporting the buffer interface, or iterable over elements of the
appropriate type.
If given a list or string, the initializer is passed to the new array’s
fromlist(), frombytes(), or fromunicode() method (see below)
to add initial items to the array. Otherwise, the iterable initializer is
passed to the extend() method.
Array objects support the ordinary sequence operations of indexing, slicing,
concatenation, and multiplication. When using slice assignment, the assigned
value must be an array object with the same type code; in all other cases,
TypeError is raised. Array objects also implement the buffer interface,
and may be used wherever buffer objects are supported.
The following data items and methods are also supported:
Return a tuple (address,length) giving the current memory address and the
length in elements of the buffer used to hold array’s contents. The size of the
memory buffer in bytes can be computed as array.buffer_info()[1]*array.itemsize. This is occasionally useful when working with low-level (and
inherently unsafe) I/O interfaces that require memory addresses, such as certain
ioctl() operations. The returned numbers are valid as long as the array
exists and no length-changing operations are applied to it.
Note
When using array objects from code written in C or C++ (the only way to
effectively make use of this information), it makes more sense to use the buffer
interface supported by array objects. This method is maintained for backward
compatibility and should be avoided in new code. The buffer interface is
documented in Buffer Protocol.
“Byteswap” all items of the array. This is only supported for values which are
1, 2, 4, or 8 bytes in size; for other types of values, RuntimeError is
raised. It is useful when reading data from a file written on a machine with a
different byte order.
Append items from iterable to the end of the array. If iterable is another
array, it must have exactly the same type code; if not, TypeError will
be raised. If iterable is not an array, it must be iterable and its elements
must be the right type to be appended to the array.
Read n items (as machine values) from the file objectf and append
them to the end of the array. If less than n items are available,
EOFError is raised, but the items that were available are still
inserted into the array. f must be a real built-in file object; something
else with a read() method won’t do.
Extends this array with data from the given unicode string. The array must
be a type 'u' array; otherwise a ValueError is raised. Use
array.frombytes(unicodestring.encode(enc)) to append Unicode data to an
array of some other type.
Removes the item with the index i from the array and returns it. The optional
argument defaults to -1, so that by default the last item is removed and
returned.
Convert the array to an array of machine values and return the bytes
representation (the same sequence of bytes that would be written to a file by
the tofile() method.)
Convert the array to a unicode string. The array must be a type 'u' array;
otherwise a ValueError is raised. Use array.tobytes().decode(enc) to
obtain a unicode string from an array of some other type.
When an array object is printed or converted to a string, it is represented as
array(typecode,initializer). The initializer is omitted if the array is
empty, otherwise it is a string if the typecode is 'u', otherwise it is a
list of numbers. The string is guaranteed to be able to be converted back to an
array with the same type and value using eval(), so long as the
array() function has been imported using fromarrayimportarray.
Examples:
The scheduler class defines a generic interface to scheduling events.
It needs two functions to actually deal with the “outside world” — timefunc
should be callable without arguments, and return a number (the “time”, in any
units whatsoever). The delayfunc function should be callable with one
argument, compatible with the output of timefunc, and should delay that many
time units. delayfunc will also be called with the argument 0 after each
event is run to allow other threads an opportunity to run in multi-threaded
applications.
In multi-threaded environments, the scheduler class has limitations
with respect to thread-safety, inability to insert a new task before
the one currently pending in a running scheduler, and holding up the main
thread until the event queue is empty. Instead, the preferred approach
is to use the threading.Timer class instead.
Example:
>>> import time
>>> from threading import Timer
>>> def print_time():
... print("From print_time", time.time())
...
>>> def print_some_times():
... print(time.time())
... Timer(5, print_time, ()).start()
... Timer(10, print_time, ()).start()
... time.sleep(11) # sleep while time-delay events execute
... print(time.time())
...
>>> print_some_times()
930343690.257
From print_time 930343695.274
From print_time 930343700.273
930343701.301
Schedule a new event. The time argument should be a numeric type compatible
with the return value of the timefunc function passed to the constructor.
Events scheduled for the same time will be executed in the order of their
priority.
Executing the event means executing action(*argument). argument must be a
sequence holding the parameters for action.
Return value is an event which may be used for later cancellation of the event
(see cancel()).
Schedule an event for delay more time units. Other then the relative time, the
other arguments, the effect and the return value are the same as those for
enterabs().
Run all scheduled events. This function will wait (using the delayfunc()
function passed to the constructor) for the next event, then execute it and so
on until there are no more scheduled events.
Either action or delayfunc can raise an exception. In either case, the
scheduler will maintain a consistent state and propagate the exception. If an
exception is raised by action, the event will not be attempted in future calls
to run().
If a sequence of events takes longer to run than the time available before the
next event, the scheduler will simply fall behind. No events will be dropped;
the calling code is responsible for canceling events which are no longer
pertinent.
Read-only attribute returning a list of upcoming events in the order they
will be run. Each event is shown as a named tuple with the
following fields: time, priority, action, argument.
The queue module implements multi-producer, multi-consumer queues.
It is especially useful in threaded programming when information must be
exchanged safely between multiple threads. The Queue class in this
module implements all the required locking semantics. It depends on the
availability of thread support in Python; see the threading
module.
Implements three types of queue whose only difference is the order that
the entries are retrieved. In a FIFO queue, the first tasks added are
the first retrieved. In a LIFO queue, the most recently added entry is
the first retrieved (operating like a stack). With a priority queue,
the entries are kept sorted (using the heapq module) and the
lowest valued entry is retrieved first.
The queue module defines the following classes and exceptions:
Constructor for a FIFO queue. maxsize is an integer that sets the upperbound
limit on the number of items that can be placed in the queue. Insertion will
block once this size has been reached, until queue items are consumed. If
maxsize is less than or equal to zero, the queue size is infinite.
Constructor for a LIFO queue. maxsize is an integer that sets the upperbound
limit on the number of items that can be placed in the queue. Insertion will
block once this size has been reached, until queue items are consumed. If
maxsize is less than or equal to zero, the queue size is infinite.
Constructor for a priority queue. maxsize is an integer that sets the upperbound
limit on the number of items that can be placed in the queue. Insertion will
block once this size has been reached, until queue items are consumed. If
maxsize is less than or equal to zero, the queue size is infinite.
The lowest valued entries are retrieved first (the lowest valued entry is the
one returned by sorted(list(entries))[0]). A typical pattern for entries
is a tuple in the form: (priority_number,data).
Return the approximate size of the queue. Note, qsize() > 0 doesn’t
guarantee that a subsequent get() will not block, nor will qsize() < maxsize
guarantee that put() will not block.
Return True if the queue is empty, False otherwise. If empty()
returns True it doesn’t guarantee that a subsequent call to put()
will not block. Similarly, if empty() returns False it doesn’t
guarantee that a subsequent call to get() will not block.
Return True if the queue is full, False otherwise. If full()
returns True it doesn’t guarantee that a subsequent call to get()
will not block. Similarly, if full() returns False it doesn’t
guarantee that a subsequent call to put() will not block.
Put item into the queue. If optional args block is true and timeout is
None (the default), block if necessary until a free slot is available. If
timeout is a positive number, it blocks at most timeout seconds and raises
the Full exception if no free slot was available within that time.
Otherwise (block is false), put an item on the queue if a free slot is
immediately available, else raise the Full exception (timeout is
ignored in that case).
Remove and return an item from the queue. If optional args block is true and
timeout is None (the default), block if necessary until an item is available.
If timeout is a positive number, it blocks at most timeout seconds and
raises the Empty exception if no item was available within that time.
Otherwise (block is false), return an item if one is immediately available,
else raise the Empty exception (timeout is ignored in that case).
Indicate that a formerly enqueued task is complete. Used by queue consumer
threads. For each get() used to fetch a task, a subsequent call to
task_done() tells the queue that the processing on the task is complete.
If a join() is currently blocking, it will resume when all items have been
processed (meaning that a task_done() call was received for every item
that had been put() into the queue).
Raises a ValueError if called more times than there were items placed in
the queue.
Blocks until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the queue.
The count goes down whenever a consumer thread calls task_done() to
indicate that the item was retrieved and all work on it is complete. When the
count of unfinished tasks drops to zero, join() unblocks.
Example of how to wait for enqueued tasks to be completed:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
The weakref module allows the Python programmer to create weak
references to objects.
In the following, the term referent means the object which is referred to
by a weak reference.
A weak reference to an object is not enough to keep the object alive: when the
only remaining references to a referent are weak references,
garbage collection is free to destroy the referent and reuse its memory
for something else. A primary use for weak references is to implement caches or
mappings holding large objects, where it’s desired that a large object not be
kept alive solely because it appears in a cache or mapping.
For example, if you have a number of large binary image objects, you may wish to
associate a name with each. If you used a Python dictionary to map names to
images, or images to names, the image objects would remain alive just because
they appeared as values or keys in the dictionaries. The
WeakKeyDictionary and WeakValueDictionary classes supplied by
the weakref module are an alternative, using weak references to construct
mappings that don’t keep objects alive solely because they appear in the mapping
objects. If, for example, an image object is a value in a
WeakValueDictionary, then when the last remaining references to that
image object are the weak references held by weak mappings, garbage collection
can reclaim the object, and its corresponding entries in weak mappings are
simply deleted.
WeakKeyDictionary and WeakValueDictionary use weak references
in their implementation, setting up callback functions on the weak references
that notify the weak dictionaries when a key or value has been reclaimed by
garbage collection. WeakSet implements the set interface,
but keeps weak references to its elements, just like a
WeakKeyDictionary does.
Most programs should find that using one of these weak container types is all
they need – it’s not usually necessary to create your own weak references
directly. The low-level machinery used by the weak dictionary implementations
is exposed by the weakref module for the benefit of advanced uses.
Note
Weak references to an object are cleared before the object’s __del__()
is called, to ensure that the weak reference callback (if any) finds the
object still alive.
Not all objects can be weakly referenced; those objects which can include class
instances, functions written in Python (but not in C), instance methods, sets,
frozensets, some file objects, generators, type
objects, sockets, arrays, deques, regular expression pattern objects, and code
objects.
Changed in version 3.2:
Changed in version 3.2: Added support for thread.lock, threading.Lock, and code objects.
Several built-in types such as list and dict do not directly
support weak references but can add support through subclassing:
class Dict(dict):
pass
obj = Dict(red=1, green=2, blue=3) # this object is weak referenceable
Other built-in types such as tuple and int do not support weak
references even when subclassed (This is an implementation detail and may be
different across various Python implementations.).
Extension types can easily be made to support weak references; see
Weak Reference Support.
Return a weak reference to object. The original object can be retrieved by
calling the reference object if the referent is still alive; if the referent is
no longer alive, calling the reference object will cause None to be
returned. If callback is provided and not None, and the returned
weakref object is still alive, the callback will be called when the object is
about to be finalized; the weak reference object will be passed as the only
parameter to the callback; the referent will no longer be available.
It is allowable for many weak references to be constructed for the same object.
Callbacks registered for each weak reference will be called from the most
recently registered callback to the oldest registered callback.
Exceptions raised by the callback will be noted on the standard error output,
but cannot be propagated; they are handled in exactly the same way as exceptions
raised from an object’s __del__() method.
Weak references are hashable if the object is hashable. They will
maintain their hash value even after the object was deleted. If
hash() is called the first time only after the object was deleted,
the call will raise TypeError.
Weak references support tests for equality, but not ordering. If the referents
are still alive, two references have the same equality relationship as their
referents (regardless of the callback). If either referent has been deleted,
the references are equal only if the reference objects are the same object.
This is a subclassable type rather than a factory function.
Return a proxy to object which uses a weak reference. This supports use of
the proxy in most contexts instead of requiring the explicit dereferencing used
with weak reference objects. The returned object will have a type of either
ProxyType or CallableProxyType, depending on whether object is
callable. Proxy objects are not hashable regardless of the referent; this
avoids a number of problems related to their fundamentally mutable nature, and
prevent their use as dictionary keys. callback is the same as the parameter
of the same name to the ref() function.
Mapping class that references keys weakly. Entries in the dictionary will be
discarded when there is no longer a strong reference to the key. This can be
used to associate additional data with an object owned by other parts of an
application without adding attributes to those objects. This can be especially
useful with objects that override attribute accesses.
Note
Caution: Because a WeakKeyDictionary is built on top of a Python
dictionary, it must not change size when iterating over it. This can be
difficult to ensure for a WeakKeyDictionary because actions
performed by the program during iteration may cause items in the
dictionary to vanish “by magic” (as a side effect of garbage collection).
WeakKeyDictionary objects have the following additional methods. These
expose the internal references directly. The references are not guaranteed to
be “live” at the time they are used, so the result of calling the references
needs to be checked before being used. This can be used to avoid creating
references that will cause the garbage collector to keep the keys around longer
than needed.
Mapping class that references values weakly. Entries in the dictionary will be
discarded when no strong reference to the value exists any more.
Note
Caution: Because a WeakValueDictionary is built on top of a Python
dictionary, it must not change size when iterating over it. This can be
difficult to ensure for a WeakValueDictionary because actions performed
by the program during iteration may cause items in the dictionary to vanish “by
magic” (as a side effect of garbage collection).
WeakValueDictionary objects have the following additional methods.
These method have the same issues as the and keyrefs() method of
WeakKeyDictionary objects.
Sequence containing all the type objects for proxies. This can make it simpler
to test if an object is a proxy without being dependent on naming both proxy
types.
Weak reference objects have no attributes or methods, but do allow the referent
to be obtained, if it still exists, by calling it:
>>> import weakref
>>> class Object:
... pass
...
>>> o = Object()
>>> r = weakref.ref(o)
>>> o2 = r()
>>> o is o2
True
If the referent no longer exists, calling the reference object returns
None:
>>> del o, o2
>>> print(r())
None
Testing that a weak reference object is still live should be done using the
expression ref()isnotNone. Normally, application code that needs to use
a reference object should follow this pattern:
# r is a weak reference object
o = r()
if o is None:
# referent has been garbage collected
print("Object has been deallocated; can't frobnicate.")
else:
print("Object is still live!")
o.do_something_useful()
Using a separate test for “liveness” creates race conditions in threaded
applications; another thread can cause a weak reference to become invalidated
before the weak reference is called; the idiom shown above is safe in threaded
applications as well as single-threaded applications.
Specialized versions of ref objects can be created through subclassing.
This is used in the implementation of the WeakValueDictionary to reduce
the memory overhead for each entry in the mapping. This may be most useful to
associate additional information with a reference, but could also be used to
insert additional processing on calls to retrieve the referent.
This example shows how a subclass of ref can be used to store
additional information about an object and affect the value that’s returned when
the referent is accessed:
import weakref
class ExtendedRef(weakref.ref):
def __init__(self, ob, callback=None, **annotations):
super(ExtendedRef, self).__init__(ob, callback)
self.__counter = 0
for k, v in annotations.items():
setattr(self, k, v)
def __call__(self):
"""Return a pair containing the referent and the number of
times the reference has been called.
"""
ob = super(ExtendedRef, self).__call__()
if ob is not None:
self.__counter += 1
ob = (ob, self.__counter)
return ob
This simple example shows how an application can use objects IDs to retrieve
objects that it has seen before. The IDs of the objects can then be used in
other data structures without forcing the objects to remain alive, but the
objects can still be retrieved by ID if they do.
import weakref
_id2obj_dict = weakref.WeakValueDictionary()
def remember(obj):
oid = id(obj)
_id2obj_dict[oid] = obj
return oid
def id2obj(oid):
return _id2obj_dict[oid]
This module defines names for some object types that are used by the standard
Python interpreter, but not exposed as builtins like int or
str are. Also, it does not include some of the types that arise
transparently during processing such as the listiterator type.
The type of objects defined in extension modules with PyGetSetDef, such
as FrameType.f_locals or array.array.typecode. This type is used as
descriptor for object attributes; it has the same purpose as the
property type, but for classes defined in extension modules.
The type of objects defined in extension modules with PyMemberDef, such
as datetime.timedelta.days. This type is used as descriptor for simple C
data members which use standard conversion functions; it has the same purpose
as the property type, but for classes defined in extension modules.
CPython implementation detail: In other implementations of Python, this type may be identical to
GetSetDescriptorType.
The difference between shallow and deep copying is only relevant for compound
objects (objects that contain other objects, like lists or class instances):
A shallow copy constructs a new compound object and then (to the extent
possible) inserts references into it to the objects found in the original.
A deep copy constructs a new compound object and then, recursively, inserts
copies into it of the objects found in the original.
Two problems often exist with deep copy operations that don’t exist with shallow
copy operations:
Recursive objects (compound objects that, directly or indirectly, contain a
reference to themselves) may cause a recursive loop.
Because deep copy copies everything it may copy too much, e.g.,
administrative data structures that should be shared even between copies.
keeping a “memo” dictionary of objects already copied during the current
copying pass; and
letting user-defined classes override the copying operation or the set of
components copied.
This module does not copy types like module, method, stack trace, stack frame,
file, socket, window, array, or any similar types. It does “copy” functions and
classes (shallow and deeply), by returning the original object unchanged; this
is compatible with the way these are treated by the pickle module.
Shallow copies of dictionaries can be made using dict.copy(), and
of lists by assigning a slice of the entire list, for example,
copied_list=original_list[:].
Classes can use the same interfaces to control copying that they use to control
pickling. See the description of module pickle for information on these
methods. The copy module does not use the copyreg registration
module.
In order for a class to define its own copy implementation, it can define
special methods __copy__() and __deepcopy__(). The former is called
to implement the shallow copy operation; no additional arguments are passed.
The latter is called to implement the deep copy operation; it is passed one
argument, the memo dictionary. If the __deepcopy__() implementation needs
to make a deep copy of a component, it should call the deepcopy() function
with the component as first argument and the memo dictionary as second argument.
The pprint module provides a capability to “pretty-print” arbitrary
Python data structures in a form which can be used as input to the interpreter.
If the formatted structures include objects which are not fundamental Python
types, the representation may not be loadable. This may be the case if objects
such as files, sockets, classes, or instances are included, as well as many
other built-in objects which are not representable as Python constants.
The formatted representation keeps objects on a single line if it can, and
breaks them onto multiple lines if they don’t fit within the allowed width.
Construct PrettyPrinter objects explicitly if you need to adjust the
width constraint.
Dictionaries are sorted by key before the display is computed.
class pprint.PrettyPrinter(indent=1, width=80, depth=None, stream=None)¶
Construct a PrettyPrinter instance. This constructor understands
several keyword parameters. An output stream may be set using the stream
keyword; the only method used on the stream object is the file protocol’s
write() method. If not specified, the PrettyPrinter adopts
sys.stdout. Three additional parameters may be used to control the
formatted representation. The keywords are indent, depth, and width. The
amount of indentation added for each recursive level is specified by indent;
the default is one. Other values can cause output to look a little odd, but can
make nesting easier to spot. The number of levels which may be printed is
controlled by depth; if the data structure being printed is too deep, the next
contained level is replaced by .... By default, there is no constraint on
the depth of the objects being formatted. The desired output width is
constrained using the width parameter; the default is 80 characters. If a
structure cannot be formatted within the constrained width, a best effort will
be made.
Return the formatted representation of object as a string. indent, width
and depth will be passed to the PrettyPrinter constructor as
formatting parameters.
Prints the formatted representation of object on stream, followed by a
newline. If stream is None, sys.stdout is used. This may be used
in the interactive interpreter instead of the print() function for
inspecting values (you can even reassign print=pprint.pprint for use
within a scope). indent, width and depth will be passed to the
PrettyPrinter constructor as formatting parameters.
>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.pprint(stuff)
[<Recursion on list with id=...>,
'spam',
'eggs',
'lumberjack',
'knights',
'ni']
Determine if the formatted representation of object is “readable,” or can be
used to reconstruct the value using eval(). This always returns False
for recursive objects.
Return a string representation of object, protected against recursive data
structures. If the representation of object exposes a recursive entry, the
recursive reference will be represented as <Recursionontypenamewithid=number>. The representation is not otherwise formatted.
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=...>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"
Print the formatted representation of object on the configured stream,
followed by a newline.
The following methods provide the implementations for the corresponding
functions of the same names. Using these methods on an instance is slightly
more efficient since new PrettyPrinter objects don’t need to be
created.
Determine if the formatted representation of the object is “readable,” or can be
used to reconstruct the value using eval(). Note that this returns
False for recursive objects. If the depth parameter of the
PrettyPrinter is set and the object is deeper than allowed, this
returns False.
Determine if the object requires a recursive representation.
This method is provided as a hook to allow subclasses to modify the way objects
are converted to strings. The default implementation uses the internals of the
saferepr() implementation.
Returns three values: the formatted version of object as a string, a flag
indicating whether the result is readable, and a flag indicating whether
recursion was detected. The first argument is the object to be presented. The
second is a dictionary which contains the id() of objects that are part of
the current presentation context (direct and indirect containers for object
that are affecting the presentation) as the keys; if an object needs to be
presented which is already represented in context, the third return value
should be True. Recursive calls to the format() method should add
additional entries for containers to this dictionary. The third argument,
maxlevels, gives the requested limit to recursion; this will be 0 if there
is no requested limit. This argument should be passed unmodified to recursive
calls. The fourth argument, level, gives the current level; recursive calls
should be passed a value less than that of the current call.
The reprlib module provides a means for producing object representations
with limits on the size of the resulting strings. This is used in the Python
debugger and may be useful in other contexts as well.
This module provides a class, an instance, and a function:
Class which provides formatting services useful in implementing functions
similar to the built-in repr(); size limits for different object types
are added to avoid the generation of representations which are excessively long.
This is an instance of Repr which is used to provide the
repr() function described below. Changing the attributes of this
object will affect the size limits used by repr() and the Python
debugger.
This is the repr() method of aRepr. It returns a string
similar to that returned by the built-in function of the same name, but with
limits on most sizes.
In addition to size-limiting tools, the module also provides a decorator for
detecting recursive calls to __repr__() and substituting a placeholder
string instead.
Decorator for __repr__() methods to detect recursive calls within the
same thread. If a recursive call is made, the fillvalue is returned,
otherwise, the usual __repr__() call is made. For example:
Repr instances provide several attributes which can be used to provide
size limits for the representations of different object types, and methods
which format specific object types.
Limit on the number of characters in the representation of the string. Note
that the “normal” representation of the string is used as the character source:
if escape sequences are needed in the representation, these may be mangled when
the representation is shortened. The default is 30.
This limit is used to control the size of object types for which no specific
formatting method is available on the Repr object. It is applied in a
similar manner as maxstring. The default is 20.
Recursive implementation used by repr(). This uses the type of obj to
determine which formatting method to call, passing it obj and level. The
type-specific methods should call repr1() to perform recursive formatting,
with level-1 for the value of level in the recursive call.
Repr.repr_TYPE(obj, level)
Formatting methods for specific types are implemented as methods with a name
based on the type name. In the method name, TYPE is replaced by
string.join(string.split(type(obj).__name__,'_')). Dispatch to these
methods is handled by repr1(). Type-specific methods which need to
recursively format a value should call self.repr1(subobj,level-1).
The use of dynamic dispatching by Repr.repr1() allows subclasses of
Repr to add support for additional built-in object types or to modify
the handling of types already supported. This example shows how special support
for file objects could be added:
import reprlib
import sys
class MyRepr(reprlib.Repr):
def repr_file(self, obj, level):
if obj.name in ['<stdin>', '<stdout>', '<stderr>']:
return obj.name
else:
return repr(obj)
aRepr = MyRepr()
print(aRepr.repr(sys.stdin)) # prints '<stdin>'
The modules described in this chapter provide numeric and math-related functions
and data types. The numbers module defines an abstract hierarchy of
numeric types. The math and cmath modules contain various
mathematical functions for floating-point and complex numbers. For users more
interested in decimal accuracy than in speed, the decimal module supports
exact representations of decimal numbers.
The following modules are documented in this chapter:
The numbers module (PEP 3141) defines a hierarchy of numeric
abstract base classes which progressively define
more operations. None of the types defined in this module can be instantiated.
Subclasses of this type describe complex numbers and include the operations
that work on the built-in complex type. These are: conversions to
complex and bool, real, imag, +,
-, *, /, abs(), conjugate(), ==, and !=. All
except - and != are abstract.
Subtypes Rational and adds a conversion to int.
Provides defaults for float(), numerator, and
denominator, and bit-string operations: <<,
>>, &, ^, |, ~.
Implementors should be careful to make equal numbers equal and hash
them to the same values. This may be subtle if there are two different
extensions of the real numbers. For example, fractions.Fraction
implements hash() as follows:
def __hash__(self):
if self.denominator == 1:
# Get integers right.
return hash(self.numerator)
# Expensive check, but definitely correct.
if self == float(self):
return hash(float(self))
else:
# Use tuple's hash to avoid a high collision rate on
# simple fractions.
return hash((self.numerator, self.denominator))
There are, of course, more possible ABCs for numbers, and this would
be a poor hierarchy if it precluded the possibility of adding
those. You can add MyFoo between Complex and
Real with:
We want to implement the arithmetic operations so that mixed-mode
operations either call an implementation whose author knew about the
types of both arguments, or convert both to the nearest built in type
and do the operation there. For subtypes of Integral, this
means that __add__() and __radd__() should be defined as:
There are 5 different cases for a mixed-type operation on subclasses
of Complex. I’ll refer to all of the above code that doesn’t
refer to MyIntegral and OtherTypeIKnowAbout as
“boilerplate”. a will be an instance of A, which is a subtype
of Complex (a:A<:Complex), and b:B<:Complex. I’ll consider a+b:
If A defines an __add__() which accepts b, all is
well.
If A falls back to the boilerplate code, and it were to
return a value from __add__(), we’d miss the possibility
that B defines a more intelligent __radd__(), so the
boilerplate should return NotImplemented from
__add__(). (Or A may not implement __add__() at
all.)
Then B‘s __radd__() gets a chance. If it accepts
a, all is well.
If it falls back to the boilerplate, there are no more possible
methods to try, so this is where the default implementation
should live.
If B<:A, Python tries B.__radd__ before
A.__add__. This is ok, because it was implemented with
knowledge of A, so it can handle those instances before
delegating to Complex.
If A<:Complex and B<:Real without sharing any other knowledge,
then the appropriate shared operation is the one involving the built
in complex, and both __radd__() s land there, so a+b==b+a.
Because most of the operations on any given type will be very similar,
it can be useful to define a helper function which generates the
forward and reverse instances of any given operator. For example,
fractions.Fraction uses:
This module is always available. It provides access to the mathematical
functions defined by the C standard.
These functions cannot be used with complex numbers; use the functions of the
same name from the cmath module if you require support for complex
numbers. The distinction between functions which support complex numbers and
those which don’t is made since most users do not want to learn quite as much
mathematics as required to understand complex numbers. Receiving an exception
instead of a complex result allows earlier detection of the unexpected complex
number used as a parameter, so that the programmer can determine how and why it
was generated in the first place.
The following functions are provided by this module. Except when explicitly
noted otherwise, all return values are floats.
Return the ceiling of x, the smallest integer greater than or equal to x.
If x is not a float, delegates to x.__ceil__(), which should return an
Integral value.
Return the floor of x, the largest integer less than or equal to x.
If x is not a float, delegates to x.__floor__(), which should return an
Integral value.
Return fmod(x,y), as defined by the platform C library. Note that the
Python expression x%y may not return the same result. The intent of the C
standard is that fmod(x,y) be exactly (mathematically; to infinite
precision) equal to x-n*y for some integer n such that the result has
the same sign as x and magnitude less than abs(y). Python’s x%y
returns a result with the sign of y instead, and may not be exactly computable
for float arguments. For example, fmod(-1e-100,1e100) is -1e-100, but
the result of Python’s -1e-100%1e100 is 1e100-1e-100, which cannot be
represented exactly as a float, and rounds to the surprising 1e100. For
this reason, function fmod() is generally preferred when working with
floats, while Python’s x%y is preferred when working with integers.
Return the mantissa and exponent of x as the pair (m,e). m is a float
and e is an integer such that x==m*2**e exactly. If x is zero,
returns (0.0,0), otherwise 0.5<=abs(m)<1. This is used to “pick
apart” the internal representation of a float in a portable way.
The algorithm’s accuracy depends on IEEE-754 arithmetic guarantees and the
typical case where the rounding mode is half-even. On some non-Windows
builds, the underlying C library uses extended precision addition and may
occasionally double-round an intermediate sum causing it to be off in its
least significant bit.
Return the Real value x truncated to an Integral (usually
an integer). Delegates to x.__trunc__().
Note that frexp() and modf() have a different call/return pattern
than their C equivalents: they take a single argument and return a pair of
values, rather than returning their second return value through an ‘output
parameter’ (there is no such thing in Python).
For the ceil(), floor(), and modf() functions, note that all
floating-point numbers of sufficiently large magnitude are exact integers.
Python floats typically carry no more than 53 bits of precision (the same as the
platform C double type), in which case any float x with abs(x)>=2**52
necessarily has no fractional bits.
Return e**x-1. For small floats x, the subtraction in exp(x)-1
can result in a significant loss of precision; the expm1()
function provides a way to compute this quantity to full precision:
>>> from math import exp, expm1
>>> exp(1e-5) - 1 # gives result accurate to 11 places
1.0000050000069649e-05
>>> expm1(1e-5) # result accurate to full precision
1.0000050000166668e-05
Return x raised to the power y. Exceptional cases follow
Annex ‘F’ of the C99 standard as far as possible. In particular,
pow(1.0,x) and pow(x,0.0) always return 1.0, even
when x is a zero or a NaN. If both x and y are finite,
x is negative, and y is not an integer then pow(x,y)
is undefined, and raises ValueError.
Return atan(y/x), in radians. The result is between -pi and pi.
The vector in the plane from the origin to point (x,y) makes this angle
with the positive X axis. The point of atan2() is that the signs of both
inputs are known to it, so it can compute the correct quadrant for the angle.
For example, atan(1) and atan2(1,1) are both pi/4, but atan2(-1,-1) is -3*pi/4.
Return the complementary error function at x. The complementary error
function is defined as
1.0-erf(x). It is used for large values of x where a subtraction
from one would cause a loss of significance.
The mathematical constant e = 2.718281..., to available precision.
CPython implementation detail: The math module consists mostly of thin wrappers around the platform C
math library functions. Behavior in exceptional cases follows Annex F of
the C99 standard where appropriate. The current implementation will raise
ValueError for invalid operations like sqrt(-1.0) or log(0.0)
(where C99 Annex F recommends signaling invalid operation or divide-by-zero),
and OverflowError for results that overflow (for example,
exp(1000.0)). A NaN will not be returned from any of the functions
above unless one or more of the input arguments was a NaN; in that case,
most functions will return a NaN, but (again following C99 Annex F) there
are some exceptions to this rule, for example pow(float('nan'),0.0) or
hypot(float('nan'),float('inf')).
Note that Python makes no effort to distinguish signaling NaNs from
quiet NaNs, and behavior for signaling NaNs remains unspecified.
Typical behavior is to treat all NaNs as though they were quiet.
Complex number versions of many of these functions.
cmath — Mathematical functions for complex numbers¶
This module is always available. It provides access to mathematical functions
for complex numbers. The functions in this module accept integers,
floating-point numbers or complex numbers as arguments. They will also accept
any Python object that has either a __complex__() or a __float__()
method: these methods are used to convert the object to a complex or
floating-point number, respectively, and the function is then applied to the
result of the conversion.
Note
On platforms with hardware and system-level support for signed
zeros, functions involving branch cuts are continuous on both
sides of the branch cut: the sign of the zero distinguishes one
side of the branch cut from the other. On platforms that do not
support signed zeros the continuity is as specified below.
A Python complex number z is stored internally using rectangular
or Cartesian coordinates. It is completely determined by its real
partz.real and its imaginary partz.imag. In other
words:
z == z.real + z.imag*1j
Polar coordinates give an alternative way to represent a complex
number. In polar coordinates, a complex number z is defined by the
modulus r and the phase angle phi. The modulus r is the distance
from z to the origin, while the phase phi is the counterclockwise
angle, measured in radians, from the positive x-axis to the line
segment that joins the origin to z.
The following functions can be used to convert from the native
rectangular coordinates to polar coordinates and back.
Return the phase of x (also known as the argument of x), as a
float. phase(x) is equivalent to math.atan2(x.imag,x.real). The result lies in the range [-π, π], and the branch
cut for this operation lies along the negative real axis,
continuous from above. On systems with support for signed zeros
(which includes most systems in current use), this means that the
sign of the result is the same as the sign of x.imag, even when
x.imag is zero:
The modulus (absolute value) of a complex number x can be
computed using the built-in abs() function. There is no
separate cmath module function for this operation.
Return the representation of x in polar coordinates. Returns a
pair (r,phi) where r is the modulus of x and phi is the
phase of x. polar(x) is equivalent to (abs(x),phase(x)).
Returns the logarithm of x to the given base. If the base is not
specified, returns the natural logarithm of x. There is one branch cut, from 0
along the negative real axis to -∞, continuous from above.
Return the arc cosine of x. There are two branch cuts: One extends right from
1 along the real axis to ∞, continuous from below. The other extends left from
-1 along the real axis to -∞, continuous from above.
Return the arc tangent of x. There are two branch cuts: One extends from
1j along the imaginary axis to ∞j, continuous from the right. The
other extends from -1j along the imaginary axis to -∞j, continuous
from the left.
Return the hyperbolic arc sine of x. There are two branch cuts:
One extends from 1j along the imaginary axis to ∞j,
continuous from the right. The other extends from -1j along
the imaginary axis to -∞j, continuous from the left.
Return the hyperbolic arc tangent of x. There are two branch cuts: One
extends from 1 along the real axis to ∞, continuous from below. The
other extends from -1 along the real axis to -∞, continuous from
above.
Note that the selection of functions is similar, but not identical, to that in
module math. The reason for having two modules is that some users aren’t
interested in complex numbers, and perhaps don’t even know what they are. They
would rather have math.sqrt(-1) raise an exception than return a complex
number. Also note that the functions defined in cmath always return a
complex number, even if the answer can be expressed as a real number (in which
case the complex number has an imaginary part of zero).
A note on branch cuts: They are curves along which the given function fails to
be continuous. They are a necessary feature of many complex functions. It is
assumed that if you need to compute with complex functions, you will understand
about branch cuts. Consult almost any (not too elementary) book on complex
variables for enlightenment. For information of the proper choice of branch
cuts for numerical purposes, a good reference should be the following:
See also
Kahan, W: Branch cuts for complex elementary functions; or, Much ado about
nothing’s sign bit. In Iserles, A., and Powell, M. (eds.), The state of the art
in numerical analysis. Clarendon Press (1987) pp165-211.
decimal — Decimal fixed point and floating point arithmetic¶
The decimal module provides support for decimal floating point
arithmetic. It offers several advantages over the float datatype:
Decimal “is based on a floating-point model which was designed with people
in mind, and necessarily has a paramount guiding principle – computers must
provide an arithmetic that works in the same way as the arithmetic that
people learn at school.” – excerpt from the decimal arithmetic specification.
Decimal numbers can be represented exactly. In contrast, numbers like
1.1 and 2.2 do not have an exact representations in binary
floating point. End users typically would not expect 1.1+2.2 to display
as 3.3000000000000003 as it does with binary floating point.
The exactness carries over into arithmetic. In decimal floating point, 0.1+0.1+0.1-0.3 is exactly equal to zero. In binary floating point, the result
is 5.5511151231257827e-017. While near to zero, the differences
prevent reliable equality testing and differences can accumulate. For this
reason, decimal is preferred in accounting applications which have strict
equality invariants.
The decimal module incorporates a notion of significant places so that 1.30+1.20 is 2.50. The trailing zero is kept to indicate significance.
This is the customary presentation for monetary applications. For
multiplication, the “schoolbook” approach uses all the figures in the
multiplicands. For instance, 1.3*1.2 gives 1.56 while 1.30*1.20 gives 1.5600.
Unlike hardware based binary floating point, the decimal module has a user
alterable precision (defaulting to 28 places) which can be as large as needed for
a given problem:
Both binary and decimal floating point are implemented in terms of published
standards. While the built-in float type exposes only a modest portion of its
capabilities, the decimal module exposes all required parts of the standard.
When needed, the programmer has full control over rounding and signal handling.
This includes an option to enforce exact arithmetic by using exceptions
to block any inexact operations.
The decimal module was designed to support “without prejudice, both exact
unrounded decimal arithmetic (sometimes called fixed-point arithmetic)
and rounded floating-point arithmetic.” – excerpt from the decimal
arithmetic specification.
The module design is centered around three concepts: the decimal number, the
context for arithmetic, and signals.
A decimal number is immutable. It has a sign, coefficient digits, and an
exponent. To preserve significance, the coefficient digits do not truncate
trailing zeros. Decimals also include special values such as
Infinity, -Infinity, and NaN. The standard also
differentiates -0 from +0.
The context for arithmetic is an environment specifying precision, rounding
rules, limits on exponents, flags indicating the results of operations, and trap
enablers which determine whether signals are treated as exceptions. Rounding
options include ROUND_CEILING, ROUND_DOWN,
ROUND_FLOOR, ROUND_HALF_DOWN, ROUND_HALF_EVEN,
ROUND_HALF_UP, ROUND_UP, and ROUND_05UP.
Signals are groups of exceptional conditions arising during the course of
computation. Depending on the needs of the application, signals may be ignored,
considered as informational, or treated as exceptions. The signals in the
decimal module are: Clamped, InvalidOperation,
DivisionByZero, Inexact, Rounded, Subnormal,
Overflow, and Underflow.
For each signal there is a flag and a trap enabler. When a signal is
encountered, its flag is set to one, then, if the trap enabler is
set to one, an exception is raised. Flags are sticky, so the user needs to
reset them before monitoring a calculation.
The usual start to using decimals is importing the module, viewing the current
context with getcontext() and, if necessary, setting new values for
precision, rounding, or enabled traps:
>>> from decimal import *
>>> getcontext()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999,
capitals=1, clamp=0, flags=[], traps=[Overflow, DivisionByZero,
InvalidOperation])
>>> getcontext().prec = 7 # Set a new precision
Decimal instances can be constructed from integers, strings, floats, or tuples.
Construction from an integer or a float performs an exact conversion of the
value of that integer or float. Decimal numbers include special values such as
NaN which stands for “Not a number”, positive and negative
Infinity, and -0.
The significance of a new Decimal is determined solely by the number of digits
input. Context precision and rounding only come into play during arithmetic
operations.
The quantize() method rounds a number to a fixed exponent. This method is
useful for monetary applications that often round results to a fixed number of
places:
As shown above, the getcontext() function accesses the current context and
allows the settings to be changed. This approach meets the needs of most
applications.
For more advanced work, it may be useful to create alternate contexts using the
Context() constructor. To make an alternate active, use the setcontext()
function.
In accordance with the standard, the Decimal module provides two ready to
use standard contexts, BasicContext and ExtendedContext. The
former is especially useful for debugging because many of the traps are
enabled:
Contexts also have signal flags for monitoring exceptional conditions
encountered during computations. The flags remain set until explicitly cleared,
so it is best to clear the flags before each set of monitored computations by
using the clear_flags() method.
The flags entry shows that the rational approximation to Pi was
rounded (digits beyond the context precision were thrown away) and that the
result is inexact (some of the discarded digits were non-zero).
Individual traps are set using the dictionary in the traps field of a
context:
Most programs adjust the current context only once, at the beginning of the
program. And, in many applications, data is converted to Decimal with
a single cast inside a loop. With context set and decimals created, the bulk of
the program manipulates the data no differently than with other Python numeric
types.
value can be an integer, string, tuple, float, or another Decimal
object. If no value is given, returns Decimal('0'). If value is a
string, it should conform to the decimal numeric string syntax after leading
and trailing whitespace characters are removed:
Other Unicode decimal digits are also permitted where digit
appears above. These include decimal digits from various other
alphabets (for example, Arabic-Indic and Devanāgarī digits) along
with the fullwidth digits '\uff10' through '\uff19'.
If value is a tuple, it should have three components, a sign
(0 for positive or 1 for negative), a tuple of
digits, and an integer exponent. For example, Decimal((0,(1,4,1,4),-3))
returns Decimal('1.414').
If value is a float, the binary floating point value is losslessly
converted to its exact decimal equivalent. This conversion can often require
53 or more digits of precision. For example, Decimal(float('1.1'))
converts to
Decimal('1.100000000000000088817841970012523233890533447265625').
The context precision does not affect how many digits are stored. That is
determined exclusively by the number of digits in value. For example,
Decimal('3.00000') records all five zeros even if the context precision is
only three.
The purpose of the context argument is determining what to do if value is a
malformed string. If the context traps InvalidOperation, an exception
is raised; otherwise, the constructor returns a new Decimal with the value of
NaN.
Changed in version 3.2: The argument to the constructor is now permitted to be a float
instance.
Decimal floating point objects share many properties with the other built-in
numeric types such as float and int. All of the usual math
operations and special methods apply. Likewise, decimal objects can be
copied, pickled, printed, used as dictionary keys, used as set elements,
compared, sorted, and coerced to another type (such as float or
int).
Decimal objects cannot generally be combined with floats or
instances of fractions.Fraction in arithmetic operations:
an attempt to add a Decimal to a float, for
example, will raise a TypeError. However, it is possible to
use Python’s comparison operators to compare a Decimal
instance x with another number y. This avoids confusing results
when doing equality comparisons between numbers of different types.
Changed in version 3.2:
Changed in version 3.2: Mixed-type comparisons between Decimal instances and other
numeric types are now fully supported.
In addition to the standard numeric properties, decimal floating point
objects also have a number of specialized methods:
Return the adjusted exponent after shifting out the coefficient’s
rightmost digits until only the lead digit remains:
Decimal('321e+5').adjusted() returns seven. Used for determining the
position of the most significant digit with respect to the decimal point.
Return the canonical encoding of the argument. Currently, the encoding of
a Decimal instance is always canonical, so this operation returns
its argument unchanged.
This operation is identical to the compare() method, except that all
NaNs signal. That is, if neither operand is a signaling NaN then any
quiet NaN operand is treated as though it were a signaling NaN.
Compare two operands using their abstract representation rather than their
numerical value. Similar to the compare() method, but the result
gives a total ordering on Decimal instances. Two
Decimal instances with the same numeric value but different
representations compare unequal in this ordering:
Quiet and signaling NaNs are also included in the total ordering. The
result of this function is Decimal('0') if both operands have the same
representation, Decimal('-1') if the first operand is lower in the
total order than the second, and Decimal('1') if the first operand is
higher in the total order than the second operand. See the specification
for details of the total order.
Compare two operands using their abstract representation rather than their
value as in compare_total(), but ignoring the sign of each operand.
x.compare_total_mag(y) is equivalent to
x.copy_abs().compare_total(y.copy_abs()).
Return the absolute value of the argument. This operation is unaffected
by the context and is quiet: no flags are changed and no rounding is
performed.
Return the value of the (natural) exponential function e**x at the
given number. The result is correctly rounded using the
ROUND_HALF_EVEN rounding mode.
Classmethod that converts a float to a decimal number, exactly.
Note Decimal.from_float(0.1) is not the same as Decimal(‘0.1’).
Since 0.1 is not exactly representable in binary floating point, the
value is stored as the nearest representable value which is
0x1.999999999999ap-4. That equivalent value in decimal is
0.1000000000000000055511151231257827021181583404541015625.
Note
From Python 3.2 onwards, a Decimal instance
can also be constructed directly from a float.
Return True if the argument is canonical and False
otherwise. Currently, a Decimal instance is always canonical, so
this operation always returns True.
For a nonzero number, return the adjusted exponent of its operand as a
Decimal instance. If the operand is a zero then
Decimal('-Infinity') is returned and the DivisionByZero flag
is raised. If the operand is an infinity then Decimal('Infinity') is
returned.
logical_xor() is a logical operation which takes two logical
operands (see Logical operands). The result is the
digit-wise exclusive or of the two operands.
Like max(self,other) except that the context rounding rule is applied
before returning and that NaN values are either signaled or
ignored (depending on the context and whether they are signaling or
quiet).
Like min(self,other) except that the context rounding rule is applied
before returning and that NaN values are either signaled or
ignored (depending on the context and whether they are signaling or
quiet).
Return the largest number representable in the given context (or in the
current thread’s context if no context is given) that is smaller than the
given operand.
Return the smallest number representable in the given context (or in the
current thread’s context if no context is given) that is larger than the
given operand.
If the two operands are unequal, return the number closest to the first
operand in the direction of the second operand. If both operands are
numerically equal, return a copy of the first operand with the sign set to
be the same as the sign of the second operand.
Normalize the number by stripping the rightmost trailing zeros and
converting any result equal to Decimal('0') to
Decimal('0e0'). Used for producing canonical values for attributes
of an equivalence class. For example, Decimal('32.100') and
Decimal('0.321000e+2') both normalize to the equivalent value
Decimal('32.1').
Unlike other operations, if the length of the coefficient after the
quantize operation would be greater than precision, then an
InvalidOperation is signaled. This guarantees that, unless there
is an error condition, the quantized exponent is always equal to that of
the right-hand operand.
Also unlike other operations, quantize never signals Underflow, even if
the result is subnormal and inexact.
If the exponent of the second operand is larger than that of the first
then rounding may be necessary. In this case, the rounding mode is
determined by the rounding argument if given, else by the given
context argument; if neither argument is given the rounding mode of
the current thread’s context is used.
If watchexp is set (default), then an error is returned whenever the
resulting exponent is greater than Emax or less than
Etiny.
Compute the modulo as either a positive or negative value depending on
which is closest to zero. For instance, Decimal(10).remainder_near(6)
returns Decimal('-2') which is closer to zero than Decimal('4').
If both are equally close, the one chosen will have the same sign as
self.
Return the result of rotating the digits of the first operand by an amount
specified by the second operand. The second operand must be an integer in
the range -precision through precision. The absolute value of the second
operand gives the number of places to rotate. If the second operand is
positive then rotation is to the left; otherwise rotation is to the right.
The coefficient of the first operand is padded on the left with zeros to
length precision if necessary. The sign and exponent of the first operand
are unchanged.
Return the first operand with exponent adjusted by the second.
Equivalently, return the first operand multiplied by 10**other. The
second operand must be an integer.
Return the result of shifting the digits of the first operand by an amount
specified by the second operand. The second operand must be an integer in
the range -precision through precision. The absolute value of the second
operand gives the number of places to shift. If the second operand is
positive then the shift is to the left; otherwise the shift is to the
right. Digits shifted into the coefficient are zeros. The sign and
exponent of the first operand are unchanged.
Engineering notation has an exponent which is a multiple of 3, so there
are up to 3 digits left of the decimal place. For example, converts
Decimal('123E+1') to Decimal('1.23E+3')
Round to the nearest integer, signaling Inexact or
Rounded as appropriate if rounding occurs. The rounding mode is
determined by the rounding parameter if given, else by the given
context. If neither parameter is given then the rounding mode of the
current context is used.
Round to the nearest integer without signaling Inexact or
Rounded. If given, applies rounding; otherwise, uses the
rounding method in either the supplied context or the current context.
The logical_and(), logical_invert(), logical_or(),
and logical_xor() methods expect their arguments to be logical
operands. A logical operand is a Decimal instance whose
exponent and sign are both zero, and whose digits are all either
0 or 1.
Contexts are environments for arithmetic operations. They govern precision, set
rules for rounding, determine which signals are treated as exceptions, and limit
the range for exponents.
Each thread has its own current context which is accessed or changed using the
getcontext() and setcontext() functions:
Return a context manager that will set the current context for the active thread
to a copy of c on entry to the with-statement and restore the previous context
when exiting the with-statement. If no context is specified, a copy of the
current context is used.
For example, the following code sets the current decimal precision to 42 places,
performs a calculation, and then automatically restores the previous context:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
New contexts can also be created using the Context constructor
described below. In addition, the module provides three pre-made contexts:
This is a standard context defined by the General Decimal Arithmetic
Specification. Precision is set to nine. Rounding is set to
ROUND_HALF_UP. All flags are cleared. All traps are enabled (treated
as exceptions) except Inexact, Rounded, and
Subnormal.
Because many of the traps are enabled, this context is useful for debugging.
This is a standard context defined by the General Decimal Arithmetic
Specification. Precision is set to nine. Rounding is set to
ROUND_HALF_EVEN. All flags are cleared. No traps are enabled (so that
exceptions are not raised during computations).
Because the traps are disabled, this context is useful for applications that
prefer to have result value of NaN or Infinity instead of
raising exceptions. This allows an application to complete a run in the
presence of conditions that would otherwise halt the program.
This context is used by the Context constructor as a prototype for new
contexts. Changing a field (such a precision) has the effect of changing the
default for new contexts created by the Context constructor.
This context is most useful in multi-threaded environments. Changing one of the
fields before threads are started has the effect of setting system-wide
defaults. Changing the fields after threads have started is not recommended as
it would require thread synchronization to prevent race conditions.
In single threaded environments, it is preferable to not use this context at
all. Instead, simply create contexts explicitly as described below.
The default values are precision=28, rounding=ROUND_HALF_EVEN, and enabled traps
for Overflow, InvalidOperation, and DivisionByZero.
In addition to the three supplied contexts, new contexts can be created with the
Context constructor.
class decimal.Context(prec=None, rounding=None, traps=None, flags=None, Emin=None, Emax=None, capitals=None, clamp=None)¶
Creates a new context. If a field is not specified or is None, the
default values are copied from the DefaultContext. If the flags
field is not specified or is None, all flags are cleared.
The prec field is a positive integer that sets the precision for arithmetic
operations in the context.
The rounding option is one of:
ROUND_CEILING (towards Infinity),
ROUND_DOWN (towards zero),
ROUND_FLOOR (towards -Infinity),
ROUND_HALF_DOWN (to nearest with ties going towards zero),
ROUND_HALF_EVEN (to nearest with ties going to nearest even integer),
ROUND_HALF_UP (to nearest with ties going away from zero), or
ROUND_UP (away from zero).
ROUND_05UP (away from zero if last digit after rounding towards zero
would have been 0 or 5; otherwise towards zero)
The traps and flags fields list any signals to be set. Generally, new
contexts should only set traps and leave the flags clear.
The Emin and Emax fields are integers specifying the outer limits allowable
for exponents.
The capitals field is either 0 or 1 (the default). If set to
1, exponents are printed with a capital E; otherwise, a
lowercase e is used: Decimal('6.02e+23').
The clamp field is either 0 (the default) or 1.
If set to 1, the exponent e of a Decimal
instance representable in this context is strictly limited to the
range Emin-prec+1<=e<=Emax-prec+1. If clamp is
0 then a weaker condition holds: the adjusted exponent of
the Decimal instance is at most Emax. When clamp is
1, a large normal number will, where possible, have its
exponent reduced and a corresponding number of zeros added to its
coefficient, in order to fit the exponent constraints; this
preserves the value of the number but loses information about
significant trailing zeros. For example:
A clamp value of 1 allows compatibility with the
fixed-width decimal interchange formats specified in IEEE 754.
The Context class defines several general purpose methods as well as
a large number of methods for doing arithmetic directly in a given context.
In addition, for each of the Decimal methods described above (with
the exception of the adjusted() and as_tuple() methods) there is
a corresponding Context method. For example, for a Context
instance C and Decimal instance x, C.exp(x) is
equivalent to x.exp(context=C). Each Context method accepts a
Python integer (an instance of int) anywhere that a
Decimal instance is accepted.
Creates a new Decimal instance from num but using self as
context. Unlike the Decimal constructor, the context precision,
rounding method, flags, and traps are applied to the conversion.
This is useful because constants are often given to a greater precision
than is needed by the application. Another benefit is that rounding
immediately eliminates unintended effects from digits beyond the current
precision. In the following example, using unrounded inputs means that
adding zero to a sum can change the result:
Creates a new Decimal instance from a float f but rounding using self
as the context. Unlike the Decimal.from_float() class method,
the context precision, rounding method, flags, and traps are applied to
the conversion.
The usual approach to working with decimals is to create Decimal
instances and then apply arithmetic operations which take place within the
current context for the active thread. An alternative approach is to use
context methods for calculating within a specific context. The methods are
similar to those for the Decimal class and are only briefly
recounted here.
Plus corresponds to the unary prefix plus operator in Python. This
operation applies the context precision and rounding, so it is not an
identity operation.
Return x to the power of y, reduced modulo modulo if given.
With two arguments, compute x**y. If x is negative then y
must be integral. The result will be inexact unless y is integral and
the result is finite and can be expressed exactly in ‘precision’ digits.
The result should always be correctly rounded, using the rounding mode of
the current thread’s context.
With three arguments, compute (x**y)%modulo. For the three argument
form, the following restrictions on the arguments hold:
all three arguments must be integral
y must be nonnegative
at least one of x or y must be nonzero
modulo must be nonzero and have at most ‘precision’ digits
The value resulting from Context.power(x,y,modulo) is
equal to the value that would be obtained by computing (x**y)%modulo with unbounded precision, but is computed more
efficiently. The exponent of the result is zero, regardless of
the exponents of x, y and modulo. The result is
always exact.
Signals represent conditions that arise during computation. Each corresponds to
one context flag and one context trap enabler.
The context flag is set whenever the condition is encountered. After the
computation, flags may be checked for informational purposes (for instance, to
determine whether a computation was exact). After checking the flags, be sure to
clear all flags before starting the next computation.
If the context’s trap enabler is set for the signal, then the condition causes a
Python exception to be raised. For example, if the DivisionByZero trap
is set, then a DivisionByZero exception is raised upon encountering the
condition.
Altered an exponent to fit representation constraints.
Typically, clamping occurs when an exponent falls outside the context’s
Emin and Emax limits. If possible, the exponent is reduced to
fit by adding zeros to the coefficient.
Signals the division of a non-infinite number by zero.
Can occur with division, modulo division, or when raising a number to a negative
power. If this signal is not trapped, returns Infinity or
-Infinity with the sign determined by the inputs to the calculation.
Indicates that rounding occurred and the result is not exact.
Signals when non-zero digits were discarded during rounding. The rounded result
is returned. The signal flag or trap is used to detect when results are
inexact.
Indicates the exponent is larger than Emax after rounding has
occurred. If not trapped, the result depends on the rounding mode, either
pulling inward to the largest representable finite number or rounding outward
to Infinity. In either case, Inexact and Rounded
are also signaled.
Rounding occurred though possibly no information was lost.
Signaled whenever rounding discards digits; even if those digits are zero
(such as rounding 5.00 to 5.0). If not trapped, returns
the result unchanged. This signal is used to detect loss of significant
digits.
Mitigating round-off error with increased precision¶
The use of decimal floating point eliminates decimal representation error
(making it possible to represent 0.1 exactly); however, some operations
can still incur round-off error when non-zero digits exceed the fixed precision.
The effects of round-off error can be amplified by the addition or subtraction
of nearly offsetting quantities resulting in loss of significance. Knuth
provides two instructive examples where rounded floating point arithmetic with
insufficient precision causes the breakdown of the associative and distributive
properties of addition:
# Examples from Seminumerical Algorithms, Section 4.2.2.
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 8
>>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111')
>>> (u + v) + w
Decimal('9.5111111')
>>> u + (v + w)
Decimal('10')
>>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003')
>>> (u*v) + (u*w)
Decimal('0.01')
>>> u * (v+w)
Decimal('0.0060000')
The decimal module makes it possible to restore the identities by
expanding the precision sufficiently to avoid loss of significance:
>>> getcontext().prec = 20
>>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111')
>>> (u + v) + w
Decimal('9.51111111')
>>> u + (v + w)
Decimal('9.51111111')
>>>
>>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003')
>>> (u*v) + (u*w)
Decimal('0.0060000')
>>> u * (v+w)
Decimal('0.0060000')
The number system for the decimal module provides special values
including NaN, sNaN, -Infinity, Infinity,
and two zeros, +0 and -0.
Infinities can be constructed directly with: Decimal('Infinity'). Also,
they can arise from dividing by zero when the DivisionByZero signal is
not trapped. Likewise, when the Overflow signal is not trapped, infinity
can result from rounding beyond the limits of the largest representable number.
The infinities are signed (affine) and can be used in arithmetic operations
where they get treated as very large, indeterminate numbers. For instance,
adding a constant to infinity gives another infinite result.
Some operations are indeterminate and return NaN, or if the
InvalidOperation signal is trapped, raise an exception. For example,
0/0 returns NaN which means “not a number”. This variety of
NaN is quiet and, once created, will flow through other computations
always resulting in another NaN. This behavior can be useful for a
series of computations that occasionally have missing inputs — it allows the
calculation to proceed while flagging specific results as invalid.
A variant is sNaN which signals rather than remaining quiet after every
operation. This is a useful return value when an invalid result needs to
interrupt a calculation for special handling.
The behavior of Python’s comparison operators can be a little surprising where a
NaN is involved. A test for equality where one of the operands is a
quiet or signaling NaN always returns False (even when doing
Decimal('NaN')==Decimal('NaN')), while a test for inequality always returns
True. An attempt to compare two Decimals using any of the <,
<=, > or >= operators will raise the InvalidOperation signal
if either operand is a NaN, and return False if this signal is
not trapped. Note that the General Decimal Arithmetic specification does not
specify the behavior of direct comparisons; these rules for comparisons
involving a NaN were taken from the IEEE 854 standard (see Table 3 in
section 5.7). To ensure strict standards-compliance, use the compare()
and compare-signal() methods instead.
The signed zeros can result from calculations that underflow. They keep the sign
that would have resulted if the calculation had been carried out to greater
precision. Since their magnitude is zero, both positive and negative zeros are
treated as equal and their sign is informational.
In addition to the two signed zeros which are distinct yet equal, there are
various representations of zero with differing precisions yet equivalent in
value. This takes a bit of getting used to. For an eye accustomed to
normalized floating point representations, it is not immediately obvious that
the following calculation returns a value equal to zero:
The getcontext() function accesses a different Context object for
each thread. Having separate thread contexts means that threads may make
changes (such as getcontext.prec=10) without interfering with other threads.
Likewise, the setcontext() function automatically assigns its target to
the current thread.
If setcontext() has not been called before getcontext(), then
getcontext() will automatically create a new context for use in the
current thread.
The new context is copied from a prototype context called DefaultContext. To
control the defaults so that each thread will use the same values throughout the
application, directly modify the DefaultContext object. This should be done
before any threads are started so that there won’t be a race condition between
threads calling getcontext(). For example:
# Set applicationwide defaults for all threads about to be launched
DefaultContext.prec = 12
DefaultContext.rounding = ROUND_DOWN
DefaultContext.traps = ExtendedContext.traps.copy()
DefaultContext.traps[InvalidOperation] = 1
setcontext(DefaultContext)
# Afterwards, the threads can be started
t1.start()
t2.start()
t3.start()
. . .
Here are a few recipes that serve as utility functions and that demonstrate ways
to work with the Decimal class:
def moneyfmt(value, places=2, curr='', sep=',', dp='.',
pos='', neg='-', trailneg=''):
"""Convert Decimal to a money formatted string.
places: required number of places after the decimal point
curr: optional currency symbol before the sign (may be blank)
sep: optional grouping separator (comma, period, space, or blank)
dp: decimal point indicator (comma or period)
only specify as blank when places is zero
pos: optional sign for positive numbers: '+', space or blank
neg: optional sign for negative numbers: '-', '(', space or blank
trailneg:optional trailing minus indicator: '-', ')', space or blank
>>> d = Decimal('-1234567.8901')
>>> moneyfmt(d, curr='$')
'-$1,234,567.89'
>>> moneyfmt(d, places=0, sep='.', dp='', neg='', trailneg='-')
'1.234.568-'
>>> moneyfmt(d, curr='$', neg='(', trailneg=')')
'($1,234,567.89)'
>>> moneyfmt(Decimal(123456789), sep=' ')
'123 456 789.00'
>>> moneyfmt(Decimal('-0.02'), neg='<', trailneg='>')
'<0.02>'
"""
q = Decimal(10) ** -places # 2 places --> '0.01'
sign, digits, exp = value.quantize(q).as_tuple()
result = []
digits = list(map(str, digits))
build, next = result.append, digits.pop
if sign:
build(trailneg)
for i in range(places):
build(next() if digits else '0')
if places:
build(dp)
if not digits:
build('0')
i = 0
while digits:
build(next())
i += 1
if i == 3 and digits:
i = 0
build(sep)
build(curr)
build(neg if sign else pos)
return ''.join(reversed(result))
def pi():
"""Compute Pi to the current precision.
>>> print(pi())
3.141592653589793238462643383
"""
getcontext().prec += 2 # extra digits for intermediate steps
three = Decimal(3) # substitute "three=3.0" for regular floats
lasts, t, s, n, na, d, da = 0, three, 3, 1, 0, 0, 24
while s != lasts:
lasts = s
n, na = n+na, na+8
d, da = d+da, da+32
t = (t * n) / d
s += t
getcontext().prec -= 2
return +s # unary plus applies the new precision
def exp(x):
"""Return e raised to the power of x. Result type matches input type.
>>> print(exp(Decimal(1)))
2.718281828459045235360287471
>>> print(exp(Decimal(2)))
7.389056098930650227230427461
>>> print(exp(2.0))
7.38905609893
>>> print(exp(2+0j))
(7.38905609893+0j)
"""
getcontext().prec += 2
i, lasts, s, fact, num = 0, 0, 1, 1, 1
while s != lasts:
lasts = s
i += 1
fact *= i
num *= x
s += num / fact
getcontext().prec -= 2
return +s
def cos(x):
"""Return the cosine of x as measured in radians.
The Taylor series approximation works best for a small value of x.
For larger values, first compute x = x % (2 * pi).
>>> print(cos(Decimal('0.5')))
0.8775825618903727161162815826
>>> print(cos(0.5))
0.87758256189
>>> print(cos(0.5+0j))
(0.87758256189+0j)
"""
getcontext().prec += 2
i, lasts, s, fact, num, sign = 0, 0, 1, 1, 1, 1
while s != lasts:
lasts = s
i += 2
fact *= i * (i-1)
num *= x * x
sign *= -1
s += num / fact * sign
getcontext().prec -= 2
return +s
def sin(x):
"""Return the sine of x as measured in radians.
The Taylor series approximation works best for a small value of x.
For larger values, first compute x = x % (2 * pi).
>>> print(sin(Decimal('0.5')))
0.4794255386042030002732879352
>>> print(sin(0.5))
0.479425538604
>>> print(sin(0.5+0j))
(0.479425538604+0j)
"""
getcontext().prec += 2
i, lasts, s, fact, num, sign = 1, 0, x, 1, x, 1
while s != lasts:
lasts = s
i += 2
fact *= i * (i-1)
num *= x * x
sign *= -1
s += num / fact * sign
getcontext().prec -= 2
return +s
Q. It is cumbersome to type decimal.Decimal('1234.5'). Is there a way to
minimize typing when using the interactive interpreter?
A. Some users abbreviate the constructor to just a single letter:
>>> D = decimal.Decimal
>>> D('1.23') + D('3.45')
Decimal('4.68')
Q. In a fixed-point application with two decimal places, some inputs have many
places and need to be rounded. Others are not supposed to have excess digits
and need to be validated. What methods should be used?
A. The quantize() method rounds to a fixed number of decimal places. If
the Inexact trap is set, it is also useful for validation:
>>> TWOPLACES = Decimal(10) ** -2 # same as Decimal('0.01')
>>> # Round to two places
>>> Decimal('3.214').quantize(TWOPLACES)
Decimal('3.21')
>>> # Validate that a number does not exceed two places
>>> Decimal('3.21').quantize(TWOPLACES, context=Context(traps=[Inexact]))
Decimal('3.21')
Q. Once I have valid two place inputs, how do I maintain that invariant
throughout an application?
A. Some operations like addition, subtraction, and multiplication by an integer
will automatically preserve fixed point. Others operations, like division and
non-integer multiplication, will change the number of decimal places and need to
be followed-up with a quantize() step:
>>> a = Decimal('102.72') # Initial fixed-point values
>>> b = Decimal('3.17')
>>> a + b # Addition preserves fixed-point
Decimal('105.89')
>>> a - b
Decimal('99.55')
>>> a * 42 # So does integer multiplication
Decimal('4314.24')
>>> (a * b).quantize(TWOPLACES) # Must quantize non-integer multiplication
Decimal('325.62')
>>> (b / a).quantize(TWOPLACES) # And quantize division
Decimal('0.03')
In developing fixed-point applications, it is convenient to define functions
to handle the quantize() step:
>>> def mul(x, y, fp=TWOPLACES):
... return (x * y).quantize(fp)
>>> def div(x, y, fp=TWOPLACES):
... return (x / y).quantize(fp)
>>> mul(a, b) # Automatically preserve fixed-point
Decimal('325.62')
>>> div(b, a)
Decimal('0.03')
Q. There are many ways to express the same value. The numbers 200,
200.000, 2E2, and 02E+4 all have the same value at
various precisions. Is there a way to transform them to a single recognizable
canonical value?
A. The normalize() method maps all equivalent values to a single
representative:
>>> values = map(Decimal, '200 200.000 2E2 .02E+4'.split())
>>> [v.normalize() for v in values]
[Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2')]
Q. Some decimal values always print with exponential notation. Is there a way
to get a non-exponential representation?
A. For some values, exponential notation is the only way to express the number
of significant places in the coefficient. For example, expressing
5.0E+3 as 5000 keeps the value constant but cannot show the
original’s two-place significance.
If an application does not care about tracking significance, it is easy to
remove the exponent and trailing zeroes, losing significance, but keeping the
value unchanged:
>>> def remove_exponent(d):
... return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize()
Q. Is there a way to convert a regular float to a Decimal?
A. Yes, any binary floating point number can be exactly expressed as a
Decimal though an exact conversion may take more precision than intuition would
suggest:
Q. Within a complex calculation, how can I make sure that I haven’t gotten a
spurious result because of insufficient precision or rounding anomalies.
A. The decimal module makes it easy to test results. A best practice is to
re-run calculations using greater precision and with various rounding modes.
Widely differing results indicate insufficient precision, rounding mode issues,
ill-conditioned inputs, or a numerically unstable algorithm.
Q. I noticed that context precision is applied to the results of operations but
not to the inputs. Is there anything to watch out for when mixing values of
different precisions?
A. Yes. The principle is that all values are considered to be exact and so is
the arithmetic on those values. Only the results are rounded. The advantage
for inputs is that “what you type is what you get”. A disadvantage is that the
results can look odd if you forget that the inputs haven’t been rounded:
The fractions module provides support for rational number arithmetic.
A Fraction instance can be constructed from a pair of integers, from
another rational number, or from a string.
class fractions.Fraction(numerator=0, denominator=1)¶
class fractions.Fraction(other_fraction)
class fractions.Fraction(float)
class fractions.Fraction(decimal)
class fractions.Fraction(string)
The first version requires that numerator and denominator are instances
of numbers.Rational and returns a new Fraction instance
with value numerator/denominator. If denominator is 0, it
raises a ZeroDivisionError. The second version requires that
other_fraction is an instance of numbers.Rational and returns a
Fraction instance with the same value. The next two versions accept
either a float or a decimal.Decimal instance, and return a
Fraction instance with exactly the same value. Note that due to the
usual issues with binary floating-point (see 浮点算术: 问题和限制), the
argument to Fraction(1.1) is not exactly equal to 11/10, and so
Fraction(1.1) does not return Fraction(11,10) as one might expect.
(But see the documentation for the limit_denominator() method below.)
The last version of the constructor expects a string or unicode instance.
The usual form for this instance is:
[sign] numerator ['/' denominator]
where the optional sign may be either ‘+’ or ‘-‘ and
numerator and denominator (if present) are strings of
decimal digits. In addition, any string that represents a finite
value and is accepted by the float constructor is also
accepted by the Fraction constructor. In either form the
input string may also have leading and/or trailing whitespace.
Here are some examples:
The Fraction class inherits from the abstract base class
numbers.Rational, and implements all of the methods and
operations from that class. Fraction instances are hashable,
and should be treated as immutable. In addition,
Fraction has the following methods:
This class method constructs a Fraction representing the exact
value of flt, which must be a float. Beware that
Fraction.from_float(0.3) is not the same value as Fraction(3,10)
Note
From Python 3.2 onwards, you can also construct a
Fraction instance directly from a float.
Finds and returns the closest Fraction to self that has
denominator at most max_denominator. This method is useful for finding
rational approximations to a given floating-point number:
>>> from fractions import Fraction
>>> Fraction('3.1415926535897932').limit_denominator(1000)
Fraction(355, 113)
or for recovering a rational number that’s represented as a float:
>>> from math import pi, cos
>>> Fraction(cos(pi/3))
Fraction(4503599627370497, 9007199254740992)
>>> Fraction(cos(pi/3)).limit_denominator()
Fraction(1, 2)
>>> Fraction(1.1).limit_denominator()
Fraction(11, 10)
The first version returns the nearest int to self,
rounding half to even. The second version rounds self to the
nearest multiple of Fraction(1,10**ndigits) (logically, if
ndigits is negative), again rounding half toward even. This
method can also be accessed through the round() function.
Return the greatest common divisor of the integers a and b. If either
a or b is nonzero, then the absolute value of gcd(a,b) is the
largest integer that divides both a and b. gcd(a,b) has the same
sign as b if b is nonzero; otherwise it takes the sign of a. gcd(0,0) returns 0.
This module implements pseudo-random number generators for various
distributions.
For integers, there is uniform selection from a range. For sequences, there is
uniform selection of a random element, a function to generate a random
permutation of a list in-place, and a function for random sampling without
replacement.
On the real line, there are functions to compute uniform, normal (Gaussian),
lognormal, negative exponential, gamma, and beta distributions. For generating
distributions of angles, the von Mises distribution is available.
Almost all module functions depend on the basic function random(), which
generates a random float uniformly in the semi-open range [0.0, 1.0). Python
uses the Mersenne Twister as the core generator. It produces 53-bit precision
floats and has a period of 2**19937-1. The underlying implementation in C is
both fast and threadsafe. The Mersenne Twister is one of the most extensively
tested random number generators in existence. However, being completely
deterministic, it is not suitable for all purposes, and is completely unsuitable
for cryptographic purposes.
The functions supplied by this module are actually bound methods of a hidden
instance of the random.Random class. You can instantiate your own
instances of Random to get generators that don’t share state.
Class Random can also be subclassed if you want to use a different
basic generator of your own devising: in that case, override the random(),
seed(), getstate(), and setstate() methods.
Optionally, a new generator can supply a getrandbits() method — this
allows randrange() to produce selections over an arbitrarily large range.
The random module also provides the SystemRandom class which
uses the system function os.urandom() to generate random numbers
from sources provided by the operating system.
If x is omitted or None, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
If x is an int, it is used directly.
With version 2 (the default), a str, bytes, or bytearray
object gets converted to an int and all of its bits are used. With version 1,
the hash() of x is used instead.
Changed in version 3.2:
Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.
state should have been obtained from a previous call to getstate(), and
setstate() restores the internal state of the generator to what it was at
the time setstate() was called.
Returns a Python integer with k random bits. This method is supplied with
the MersenneTwister generator and some other generators may also provide it
as an optional part of the API. When available, getrandbits() enables
randrange() to handle arbitrarily large ranges.
Return a randomly selected element from range(start,stop,step). This is
equivalent to choice(range(start,stop,step)), but doesn’t actually build a
range object.
The positional argument pattern matches that of range(). Keyword arguments
should not be used because the function may use them in unexpected ways.
Changed in version 3.2:
Changed in version 3.2: randrange() is more sophisticated about producing equally distributed
values. Formerly it used a style like int(random()*n) which could produce
slightly uneven distributions.
Shuffle the sequence x in place. The optional argument random is a
0-argument function returning a random float in [0.0, 1.0); by default, this is
the function random().
Note that for even rather small len(x), the total number of permutations of
x is larger than the period of most random number generators; this implies
that most permutations of a long sequence can never be generated.
Return a k length list of unique elements chosen from the population sequence
or set. Used for random sampling without replacement.
Returns a new list containing elements from the population while leaving the
original population unchanged. The resulting list is in selection order so that
all sub-slices will also be valid random samples. This allows raffle winners
(the sample) to be partitioned into grand prize and second place winners (the
subslices).
Members of the population need not be hashable or unique. If the population
contains repeats, then each occurrence is a possible selection in the sample.
To choose a sample from a range of integers, use an range() object as an
argument. This is especially fast and space efficient for sampling from a large
population: sample(range(10000000),60).
The following functions generate specific real-valued distributions. Function
parameters are named after the corresponding variables in the distribution’s
equation, as used in common mathematical practice; most of these equations can
be found in any statistics text.
Return a random floating point number N such that low<=N<=high and
with the specified mode between those bounds. The low and high bounds
default to zero and one. The mode argument defaults to the midpoint
between the bounds, giving a symmetric distribution.
Exponential distribution. lambd is 1.0 divided by the desired
mean. It should be nonzero. (The parameter would be called
“lambda”, but that is a reserved word in Python.) Returned values
range from 0 to positive infinity if lambd is positive, and from
negative infinity to 0 if lambd is negative.
Log normal distribution. If you take the natural logarithm of this
distribution, you’ll get a normal distribution with mean mu and standard
deviation sigma. mu can have any value, and sigma must be greater than
zero.
mu is the mean angle, expressed in radians between 0 and 2*pi, and kappa
is the concentration parameter, which must be greater than or equal to zero. If
kappa is equal to zero, this distribution reduces to a uniform random angle
over the range 0 to 2*pi.
Class that uses the os.urandom() function for generating random numbers
from sources provided by the operating system. Not available on all systems.
Does not rely on software state, and sequences are not reproducible. Accordingly,
the seed() method has no effect and is ignored.
The getstate() and setstate() methods raise
NotImplementedError if called.
See also
M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally
equidistributed uniform pseudorandom number generator”, ACM Transactions on
Modeling and Computer Simulation Vol. 8, No. 1, January pp.3-30 1998.
Sometimes it is useful to be able to reproduce the sequences given by a pseudo
random number generator. By re-using a seed value, the same sequence should be
reproducible from run to run as long as multiple threads are not running.
Most of the random module’s algorithms and seeding functions are subject to
change across Python versions, but two aspects are guaranteed not to change:
If a new seeding method is added, then a backward compatible seeder will be
offered.
The generator’s random() method will continue to produce the same
sequence when the compatible seeder is given the same seed.
>>> random.random() # Random float x, 0.0 <= x < 1.0
0.37444887175646646
>>> random.uniform(1, 10) # Random float x, 1.0 <= x < 10.0
1.1800146073117523
>>> random.randrange(10) # Integer from 0 to 9
7
>>> random.randrange(0, 101, 2) # Even integer from 0 to 100
26
>>> random.choice('abcdefghij') # Single random element
'c'
>>> items = [1, 2, 3, 4, 5, 6, 7]
>>> random.shuffle(items)
>>> items
[7, 3, 2, 5, 6, 4, 1]
>>> random.sample([1, 2, 3, 4, 5], 3) # Three samples without replacement
[4, 1, 5]
A common task is to make a random.choice() with weighted probababilites.
If the weights are small integer ratios, a simple technique is to build a sample
population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution
with itertools.accumulate(), and then locate the random value with
bisect.bisect():
The modules described in this chapter provide functions and classes that support
a functional programming style, and general operations on callables.
The following modules are documented in this chapter:
itertools — Functions creating iterators for efficient looping¶
This module implements a number of iterator building blocks inspired
by constructs from APL, Haskell, and SML. Each has been recast in a form
suitable for Python.
The module standardizes a core set of fast, memory efficient tools that are
useful by themselves or in combination. Together, they form an “iterator
algebra” making it possible to construct specialized tools succinctly and
efficiently in pure Python.
For instance, SML provides a tabulation tool: tabulate(f) which produces a
sequence f(0),f(1),.... The same effect can be achieved in Python
by combining map() and count() to form map(f,count()).
These tools and their built-in counterparts also work well with the high-speed
functions in the operator module. For example, the multiplication
operator can be mapped across two vectors to form an efficient dot-product:
sum(map(operator.mul,vector1,vector2)).
The following module functions all construct and return iterators. Some provide
streams of infinite length, so they should only be accessed by functions or
loops that truncate the stream.
Make an iterator that returns accumulated sums. Elements may be any addable
type including Decimal or Fraction. Equivalent to:
def accumulate(iterable):
'Return running totals'
# accumulate([1,2,3,4,5]) --> 1 3 6 10 15
it = iter(iterable)
total = next(it)
yield total
for element in it:
total = total + element
yield total
Make an iterator that returns elements from the first iterable until it is
exhausted, then proceeds to the next iterable, until all of the iterables are
exhausted. Used for treating consecutive sequences as a single sequence.
Equivalent to:
def chain(*iterables):
# chain('ABC', 'DEF') --> A B C D E F
for it in iterables:
for element in it:
yield element
Return r length subsequences of elements from the input iterable.
Combinations are emitted in lexicographic sort order. So, if the
input iterable is sorted, the combination tuples will be produced
in sorted order.
Elements are treated as unique based on their position, not on their
value. So if the input elements are unique, there will be no repeat
values in each combination.
Equivalent to:
def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = list(range(r))
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
yield tuple(pool[i] for i in indices)
The code for combinations() can be also expressed as a subsequence
of permutations() after filtering entries where the elements are not
in sorted order (according to their position in the input pool):
def combinations(iterable, r):
pool = tuple(iterable)
n = len(pool)
for indices in permutations(range(n), r):
if sorted(indices) == list(indices):
yield tuple(pool[i] for i in indices)
The number of items returned is n!/r!/(n-r)! when 0<=r<=n
or zero when r>n.
Return r length subsequences of elements from the input iterable
allowing individual elements to be repeated more than once.
Combinations are emitted in lexicographic sort order. So, if the
input iterable is sorted, the combination tuples will be produced
in sorted order.
Elements are treated as unique based on their position, not on their
value. So if the input elements are unique, the generated combinations
will also be unique.
Equivalent to:
def combinations_with_replacement(iterable, r):
# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
pool = tuple(iterable)
n = len(pool)
if not n and r:
return
indices = [0] * r
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != n - 1:
break
else:
return
indices[i:] = [indices[i] + 1] * (r - i)
yield tuple(pool[i] for i in indices)
The code for combinations_with_replacement() can be also expressed as
a subsequence of product() after filtering entries where the elements
are not in sorted order (according to their position in the input pool):
def combinations_with_replacement(iterable, r):
pool = tuple(iterable)
n = len(pool)
for indices in product(range(n), repeat=r):
if sorted(indices) == list(indices):
yield tuple(pool[i] for i in indices)
The number of items returned is (n+r-1)!/r!/(n-1)! when n>0.
Make an iterator that filters elements from data returning only those that
have a corresponding element in selectors that evaluates to True.
Stops when either the data or selectors iterables has been exhausted.
Equivalent to:
def compress(data, selectors):
# compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F
return (d for d, s in zip(data, selectors) if s)
Make an iterator that returns evenly spaced values starting with n. Often
used as an argument to map() to generate consecutive data points.
Also, used with zip() to add sequence numbers. Equivalent to:
def count(start=0, step=1):
# count(10) --> 10 11 12 13 14 ...
# count(2.5, 0.5) -> 2.5 3.0 3.5 ...
n = start
while True:
yield n
n += step
When counting with floating point numbers, better accuracy can sometimes be
achieved by substituting multiplicative code such as: (start+step*iforiincount()).
Changed in version 3.1:
Changed in version 3.1: Added step argument and allowed non-integer arguments.
Make an iterator returning elements from the iterable and saving a copy of each.
When the iterable is exhausted, return elements from the saved copy. Repeats
indefinitely. Equivalent to:
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
Note, this member of the toolkit may require significant auxiliary storage
(depending on the length of the iterable).
Make an iterator that drops elements from the iterable as long as the predicate
is true; afterwards, returns every element. Note, the iterator does not produce
any output until the predicate first becomes false, so it may have a lengthy
start-up time. Equivalent to:
def dropwhile(predicate, iterable):
# dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
Make an iterator that filters elements from iterable returning only those for
which the predicate is False. If predicate is None, return the items
that are false. Equivalent to:
def filterfalse(predicate, iterable):
# filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
if predicate is None:
predicate = bool
for x in iterable:
if not predicate(x):
yield x
Make an iterator that returns consecutive keys and groups from the iterable.
The key is a function computing a key value for each element. If not
specified or is None, key defaults to an identity function and returns
the element unchanged. Generally, the iterable needs to already be sorted on
the same key function.
The operation of groupby() is similar to the uniq filter in Unix. It
generates a break or new group every time the value of the key function changes
(which is why it is usually necessary to have sorted the data using the same key
function). That behavior differs from SQL’s GROUP BY which aggregates common
elements regardless of their input order.
The returned group is itself an iterator that shares the underlying iterable
with groupby(). Because the source is shared, when the groupby()
object is advanced, the previous group is no longer visible. So, if that data
is needed later, it should be stored as a list:
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
class groupby:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.tgtkey = self.currkey = self.currvalue = object()
def __iter__(self):
return self
def __next__(self):
while self.currkey == self.tgtkey:
self.currvalue = next(self.it) # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)
self.tgtkey = self.currkey
return (self.currkey, self._grouper(self.tgtkey))
def _grouper(self, tgtkey):
while self.currkey == tgtkey:
yield self.currvalue
self.currvalue = next(self.it) # Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)
Make an iterator that returns selected elements from the iterable. If start is
non-zero, then elements from the iterable are skipped until start is reached.
Afterward, elements are returned consecutively unless step is set higher than
one which results in items being skipped. If stop is None, then iteration
continues until the iterator is exhausted, if at all; otherwise, it stops at the
specified position. Unlike regular slicing, islice() does not support
negative values for start, stop, or step. Can be used to extract related
fields from data where the internal structure has been flattened (for example, a
multi-line report may list a name field on every third line). Equivalent to:
def islice(iterable, *args):
# islice('ABCDEFG', 2) --> A B
# islice('ABCDEFG', 2, 4) --> C D
# islice('ABCDEFG', 2, None) --> C D E F G
# islice('ABCDEFG', 0, None, 2) --> A C E G
s = slice(*args)
it = iter(range(s.start or 0, s.stop or sys.maxsize, s.step or 1))
nexti = next(it)
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
If start is None, then iteration starts at zero. If step is None,
then the step defaults to one.
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length
of the iterable and all possible full-length permutations
are generated.
Permutations are emitted in lexicographic sort order. So, if the
input iterable is sorted, the permutation tuples will be produced
in sorted order.
Elements are treated as unique based on their position, not on their
value. So if the input elements are unique, there will be no repeat
values in each permutation.
Equivalent to:
def permutations(iterable, r=None):
# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
# permutations(range(3)) --> 012 021 102 120 201 210
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
if r > n:
return
indices = list(range(n))
cycles = range(n, n-r, -1)
yield tuple(pool[i] for i in indices[:r])
while n:
for i in reversed(range(r)):
cycles[i] -= 1
if cycles[i] == 0:
indices[i:] = indices[i+1:] + indices[i:i+1]
cycles[i] = n - i
else:
j = cycles[i]
indices[i], indices[-j] = indices[-j], indices[i]
yield tuple(pool[i] for i in indices[:r])
break
else:
return
The code for permutations() can be also expressed as a subsequence of
product(), filtered to exclude entries with repeated elements (those
from the same position in the input pool):
def permutations(iterable, r=None):
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
for indices in product(range(n), repeat=r):
if len(set(indices)) == r:
yield tuple(pool[i] for i in indices)
The number of items returned is n!/(n-r)! when 0<=r<=n
or zero when r>n.
Equivalent to nested for-loops in a generator expression. For example,
product(A,B) returns the same as ((x,y)forxinAforyinB).
The nested loops cycle like an odometer with the rightmost element advancing
on every iteration. This pattern creates a lexicographic ordering so that if
the input’s iterables are sorted, the product tuples are emitted in sorted
order.
To compute the product of an iterable with itself, specify the number of
repetitions with the optional repeat keyword argument. For example,
product(A,repeat=4) means the same as product(A,A,A,A).
This function is equivalent to the following code, except that the
actual implementation does not build up intermediate results in memory:
def product(*args, repeat=1):
# product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
# product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
pools = [tuple(pool) for pool in args] * repeat
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)
Make an iterator that returns object over and over again. Runs indefinitely
unless the times argument is specified. Used as argument to map() for
invariant parameters to the called function. Also used with zip() to
create an invariant part of a tuple record. Equivalent to:
def repeat(object, times=None):
# repeat(10, 3) --> 10 10 10
if times is None:
while True:
yield object
else:
for i in range(times):
yield object
Make an iterator that computes the function using arguments obtained from
the iterable. Used instead of map() when argument parameters are already
grouped in tuples from a single iterable (the data has been “pre-zipped”). The
difference between map() and starmap() parallels the distinction
between function(a,b) and function(*c). Equivalent to:
def starmap(function, iterable):
# starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
for args in iterable:
yield function(*args)
Return n independent iterators from a single iterable. Equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
Once tee() has made a split, the original iterable should not be
used anywhere else; otherwise, the iterable could get advanced without
the tee objects being informed.
This itertool may require significant auxiliary storage (depending on how
much temporary data needs to be stored). In general, if one iterator uses
most or all of the data before another iterator starts, it is faster to use
list() instead of tee().
Make an iterator that aggregates elements from each of the iterables. If the
iterables are of uneven length, missing values are filled-in with fillvalue.
Iteration continues until the longest iterable is exhausted. Equivalent to:
def zip_longest(*args, fillvalue=None):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
yield counter() # yields the fillvalue, or raises IndexError
fillers = repeat(fillvalue)
iters = [chain(it, sentinel(), fillers) for it in args]
try:
for tup in zip(*iters):
yield tup
except IndexError:
pass
If one of the iterables is potentially infinite, then the zip_longest()
function should be wrapped with something that limits the number of calls
(for example islice() or takewhile()). If not specified,
fillvalue defaults to None.
This section shows recipes for creating an extended toolset using the existing
itertools as building blocks.
The extended tools offer the same high performance as the underlying toolset.
The superior memory performance is kept by processing elements one at a time
rather than bringing the whole iterable into memory all at once. Code volume is
kept small by linking the tools together in a functional style which helps
eliminate temporary variables. High speed is retained by preferring
“vectorized” building blocks over the use of for-loops and generators
which incur interpreter overhead.
def take(n, iterable):
"Return first n items of the iterable as a list"
return list(islice(iterable, n))
def tabulate(function, start=0):
"Return function(0), function(1), ..."
return map(function, count(start))
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
def nth(iterable, n, default=None):
"Returns the nth item or a default value"
return next(islice(iterable, n, None), default)
def quantify(iterable, pred=bool):
"Count how many times the predicate is true"
return sum(map(pred, iterable))
def padnone(iterable):
"""Returns the sequence elements and then returns None indefinitely.
Useful for emulating the behavior of the built-in map() function.
"""
return chain(iterable, repeat(None))
def ncycles(iterable, n):
"Returns the sequence elements n times"
return chain.from_iterable(repeat(tuple(iterable), n))
def dotproduct(vec1, vec2):
return sum(map(operator.mul, vec1, vec2))
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
def repeatfunc(func, times=None, *args):
"""Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return starmap(func, repeat(args))
return starmap(func, repeat(args, times))
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))
def iter_except(func, exception, first=None):
""" Call a function repeatedly until an exception is raised.
Converts a call-until-exception interface to an iterator interface.
Like __builtin__.iter(func, sentinel) but uses an exception instead
of a sentinel to end the loop.
Examples:
iter_except(functools.partial(heappop, h), IndexError) # priority queue iterator
iter_except(d.popitem, KeyError) # non-blocking dict iterator
iter_except(d.popleft, IndexError) # non-blocking deque iterator
iter_except(q.get_nowait, Queue.Empty) # loop over a producer Queue
iter_except(s.pop, KeyError) # non-blocking set iterator
"""
try:
if first is not None:
yield first() # For database APIs needing an initial cast to db.first()
while 1:
yield func()
except exception:
pass
def random_product(*args, repeat=1):
"Random selection from itertools.product(*args, **kwds)"
pools = [tuple(pool) for pool in args] * repeat
return tuple(random.choice(pool) for pool in pools)
def random_permutation(iterable, r=None):
"Random selection from itertools.permutations(iterable, r)"
pool = tuple(iterable)
r = len(pool) if r is None else r
return tuple(random.sample(pool, r))
def random_combination(iterable, r):
"Random selection from itertools.combinations(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.sample(range(n), r))
return tuple(pool[i] for i in indices)
def random_combination_with_replacement(iterable, r):
"Random selection from itertools.combinations_with_replacement(iterable, r)"
pool = tuple(iterable)
n = len(pool)
indices = sorted(random.randrange(n) for i in range(r))
return tuple(pool[i] for i in indices)
Note, many of the above recipes can be optimized by replacing global lookups
with local variables defined as default values. For example, the
dotproduct recipe can be written as:
The functools module is for higher-order functions: functions that act on
or return other functions. In general, any callable object can be treated as a
function for the purposes of this module.
The functools module defines the following functions:
Transform an old-style comparison function to a key-function. Used with
tools that accept key functions (such as sorted(), min(),
max(), heapq.nlargest(), heapq.nsmallest(),
itertools.groupby()). This function is primarily used as a transition
tool for programs being converted from Py2.x which supported the use of
comparison functions.
A compare function is any callable that accept two arguments, compares them,
and returns a negative number for less-than, zero for equality, or a positive
number for greater-than. A key function is a callable that accepts one
argument and returns another value indicating the position in the desired
collation sequence.
Example:
sorted(iterable, key=cmp_to_key(locale.strcoll)) # locale-aware sort order
Decorator to wrap a function with a memoizing callable that saves up to the
maxsize most recent calls. It can save time when an expensive or I/O bound
function is periodically called with the same arguments.
Since a dictionary is used to cache results, the positional and keyword
arguments to the function must be hashable.
If maxsize is set to None, the LRU feature is disabled and the cache
can grow without bound.
To help measure the effectiveness of the cache and tune the maxsize
parameter, the wrapped function is instrumented with a cache_info()
function that returns a named tuple showing hits, misses,
maxsize and currsize. In a multi-threaded environment, the hits
and misses are approximate.
The decorator also provides a cache_clear() function for clearing or
invalidating the cache.
The original underlying function is accessible through the
__wrapped__ attribute. This is useful for introspection, for
bypassing the cache, or for rewrapping the function with a different cache.
An LRU (least recently used) cache works
best when more recent calls are the best predictors of upcoming calls (for
example, the most popular articles on a news server tend to change daily).
The cache’s size limit assures that the cache does not grow without bound on
long-running processes such as web servers.
Example of an LRU cache for static web content:
@lru_cache(maxsize=20)
def get_pep(num):
'Retrieve text of a Python Enhancement Proposal'
resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
try:
with urllib.request.urlopen(resource) as s:
return s.read()
except urllib.error.HTTPError:
return 'Not Found'
>>> for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
... pep = get_pep(n)
... print(n, len(pep))
>>> print(get_pep.cache_info())
CacheInfo(hits=3, misses=8, maxsize=20, currsize=8)
Given a class defining one or more rich comparison ordering methods, this
class decorator supplies the rest. This simplifies the effort involved
in specifying all of the possible rich comparison operations:
Return a new partial object which when called will behave like func
called with the positional arguments args and keyword arguments keywords. If
more arguments are supplied to the call, they are appended to args. If
additional keyword arguments are supplied, they extend and override keywords.
Roughly equivalent to:
The partial() is used for partial function application which “freezes”
some portion of a function’s arguments and/or keywords resulting in a new object
with a simplified signature. For example, partial() can be used to create
a callable that behaves like the int() function where the base argument
defaults to two:
>>> from functools import partial
>>> basetwo = partial(int, base=2)
>>> basetwo.__doc__ = 'Convert base 2 string to an int.'
>>> basetwo('10010')
18
Apply function of two arguments cumulatively to the items of sequence, from
left to right, so as to reduce the sequence to a single value. For example,
reduce(lambdax,y:x+y,[1,2,3,4,5]) calculates ((((1+2)+3)+4)+5).
The left argument, x, is the accumulated value and the right argument, y, is
the update value from the sequence. If the optional initializer is present,
it is placed before the items of the sequence in the calculation, and serves as
a default when the sequence is empty. If initializer is not given and
sequence contains only one item, the first item is returned.
Update a wrapper function to look like the wrapped function. The optional
arguments are tuples to specify which attributes of the original function are
assigned directly to the matching attributes on the wrapper function and which
attributes of the wrapper function are updated with the corresponding attributes
from the original function. The default values for these arguments are the
module level constants WRAPPER_ASSIGNMENTS (which assigns to the wrapper
function’s __name__, __module__, __annotations__ and __doc__, the
documentation string) and WRAPPER_UPDATES (which updates the wrapper
function’s __dict__, i.e. the instance dictionary).
To allow access to the original function for introspection and other purposes
(e.g. bypassing a caching decorator such as lru_cache()), this function
automatically adds a __wrapped__ attribute to the wrapper that refers to
the original function.
The main intended use for this function is in decorator functions which
wrap the decorated function and return the wrapper. If the wrapper function is
not updated, the metadata of the returned function will reflect the wrapper
definition rather than the original function definition, which is typically less
than helpful.
update_wrapper() may be used with callables other than functions. Any
attributes named in assigned or updated that are missing from the object
being wrapped are ignored (i.e. this function will not attempt to set them
on the wrapper function). AttributeError is still raised if the
wrapper function itself is missing any attributes named in updated.
New in version 3.2:
New in version 3.2: Automatic addition of the __wrapped__ attribute.
New in version 3.2:
New in version 3.2: Copying of the __annotations__ attribute by default.
Changed in version 3.2:
Changed in version 3.2: Missing attributes no longer trigger an AttributeError.
This is a convenience function for invoking partial(update_wrapper,wrapped=wrapped,assigned=assigned,updated=updated) as a function decorator
when defining a wrapper function. For example:
>>> from functools import wraps
>>> def my_decorator(f):
... @wraps(f)
... def wrapper(*args, **kwds):
... print('Calling decorated function')
... return f(*args, **kwds)
... return wrapper
...
>>> @my_decorator
... def example():
... """Docstring"""
... print('Called example function')
...
>>> example()
Calling decorated function
Called example function
>>> example.__name__
'example'
>>> example.__doc__
'Docstring'
Without the use of this decorator factory, the name of the example function
would have been 'wrapper', and the docstring of the original example()
would have been lost.
The keyword arguments that will be supplied when the partial object is
called.
partial objects are like function objects in that they are
callable, weak referencable, and can have attributes. There are some important
differences. For instance, the __name__ and __doc__ attributes
are not created automatically. Also, partial objects defined in
classes behave like static methods and do not transform into bound methods
during instance attribute look-up.
The operator module exports a set of functions implemented in C
corresponding to the intrinsic operators of Python. For example,
operator.add(x,y) is equivalent to the expression x+y. The function
names are those used for special class methods; variants without leading and
trailing __ are also provided for convenience.
The functions fall into categories that perform object comparisons, logical
operations, mathematical operations and sequence operations.
The object comparison functions are useful for all objects, and are named after
the rich comparison operators they support:
Perform “rich comparisons” between a and b. Specifically, lt(a,b) is
equivalent to a<b, le(a,b) is equivalent to a<=b, eq(a,b) is equivalent to a==b, ne(a,b) is equivalent to a!=b,
gt(a,b) is equivalent to a>b and ge(a,b) is equivalent to a>=b. Note that these functions can return any value, which may
or may not be interpretable as a Boolean value. See
Comparisons for more information about rich comparisons.
The logical operations are also generally applicable to all objects, and support
truth tests, identity tests, and boolean operations:
Return the outcome of notobj. (Note that there is no
__not__() method for object instances; only the interpreter core defines
this operation. The result is affected by the __bool__() and
__len__() methods.)
The operator module also defines tools for generalized attribute and item
lookups. These are useful for making fast field extractors as arguments for
map(), sorted(), itertools.groupby(), or other functions that
expect a function argument.
Return a callable object that fetches attr from its operand. If more than one
attribute is requested, returns a tuple of attributes. After,
f=attrgetter('name'), the call f(b) returns b.name. After,
f=attrgetter('name','date'), the call f(b) returns (b.name,b.date). Equivalent to:
def attrgetter(*items):
if any(not isinstance(item, str) for item in items):
raise TypeError('attribute name must be a string')
if len(items) == 1:
attr = items[0]
def g(obj):
return resolve_attr(obj, attr)
else:
def g(obj):
return tuple(resolve_att(obj, attr) for attr in items)
return g
def resolve_attr(obj, attr):
for name in attr.split("."):
obj = getattr(obj, name)
return obj
The attribute names can also contain dots; after f=attrgetter('date.month'),
the call f(b) returns b.date.month.
Return a callable object that fetches item from its operand using the
operand’s __getitem__() method. If multiple items are specified,
returns a tuple of lookup values. Equivalent to:
def itemgetter(*items):
if len(items) == 1:
item = items[0]
def g(obj):
return obj[item]
else:
def g(obj):
return tuple(obj[item] for item in items)
return g
The items can be any type accepted by the operand’s __getitem__()
method. Dictionaries accept any hashable value. Lists, tuples, and
strings accept an index or a slice:
Return a callable object that calls the method name on its operand. If
additional arguments and/or keyword arguments are given, they will be given
to the method as well. After f=methodcaller('name'), the call f(b)
returns b.name(). After f=methodcaller('name','foo',bar=1), the
call f(b) returns b.name('foo',bar=1). Equivalent to:
Many operations have an “in-place” version. Listed below are functions
providing a more primitive access to in-place operators than the usual syntax
does; for example, the statementx+=y is equivalent to
x=operator.iadd(x,y). Another way to put it is to say that
z=operator.iadd(x,y) is equivalent to the compound statement
z=x;z+=y.
In those examples, note that when an in-place method is called, the computation
and assignment are performed in two separate steps. The in-place functions
listed below only do the first step, calling the in-place method. The second
step, assignment, is not handled.
For immutable targets such as strings, numbers, and tuples, the updated
value is computed, but not assigned back to the input variable:
>>> a = 'hello'
>>> iadd(a, ' world')
'hello world'
>>> a
'hello'
For mutable targets such as lists and dictionaries, the inplace method
will perform the update, so no subsequent assignment is necessary:
The modules described in this chapter deal with disk files and directories. For
example, there are modules for reading the properties of files, manipulating
paths in a portable way, and creating temporary files. The full list of modules
in this chapter is:
This module implements some useful functions on pathnames. To read or
write files see open(), and for accessing the filesystem see the
os module. The path parameters can be passed as either strings,
or bytes. Applications are encouraged to represent file names as
(Unicode) character strings. Unfortunately, some file names may not be
representable as strings on Unix, so applications that need to support
arbitrary file names on Unix should use bytes objects to represent
path names. Vice versa, using bytes objects cannot represent all file
names on Windows (in the standard mbcs encoding), hence Windows
applications should use string objects to access all files.
Note
All of these functions accept either only bytes or only string objects as
their parameters. The result is an object of the same type, if a path or
file name is returned.
Note
Since different operating systems have different path name conventions, there
are several versions of this module in the standard library. The
os.path module is always the path module suitable for the operating
system Python is running on, and therefore usable for local paths. However,
you can also import and use the individual modules if you want to manipulate
a path that is always in one of the different formats. They all have the
same interface:
Return the base name of pathname path. This is the second half of the pair
returned by split(path). Note that the result of this function is different
from the Unix basename program; where basename for
'/foo/bar/' returns 'bar', the basename() function returns an
empty string ('').
Return the longest path prefix (taken character-by-character) that is a prefix
of all paths in list. If list is empty, return the empty string ('').
Note that this may return invalid paths because it works a character at a time.
Return True if path refers to an existing path. Returns False for
broken symbolic links. On some platforms, this function may return False if
permission is not granted to execute os.stat() on the requested file, even
if the path physically exists.
On Unix and Windows, return the argument with an initial component of ~ or
~user replaced by that user‘s home directory.
On Unix, an initial ~ is replaced by the environment variable HOME
if it is set; otherwise the current user’s home directory is looked up in the
password directory through the built-in module pwd. An initial ~user
is looked up directly in the password directory.
On Windows, HOME and USERPROFILE will be used if set,
otherwise a combination of HOMEPATH and HOMEDRIVE will be
used. An initial ~user is handled by stripping the last directory component
from the created user path derived above.
If the expansion fails or if the path does not begin with a tilde, the path is
returned unchanged.
Return the argument with environment variables expanded. Substrings of the form
$name or ${name} are replaced by the value of environment variable
name. Malformed variable names and references to non-existing variables are
left unchanged.
On Windows, %name% expansions are supported in addition to $name and
${name}.
Return the time of last access of path. The return value is a number giving
the number of seconds since the epoch (see the time module). Raise
os.error if the file does not exist or is inaccessible.
Return the time of last modification of path. The return value is a number
giving the number of seconds since the epoch (see the time module).
Raise os.error if the file does not exist or is inaccessible.
Return the system’s ctime which, on some systems (like Unix) is the time of the
last change, and, on others (like Windows), is the creation time for path.
The return value is a number giving the number of seconds since the epoch (see
the time module). Raise os.error if the file does not exist or
is inaccessible.
Return True if path is an absolute pathname. On Unix, that means it
begins with a slash, on Windows that it begins with a (back)slash after chopping
off a potential drive letter.
Return True if pathname path is a mount point: a point in a file
system where a different file system has been mounted. The function checks
whether path‘s parent, path/.., is on a different device than path,
or whether path/.. and path point to the same i-node on the same
device — this should detect mount points for all Unix and POSIX variants.
Join one or more path components intelligently. If any component is an absolute
path, all previous components (on Windows, including the previous drive letter,
if there was one) are thrown away, and joining continues. The return value is
the concatenation of path1, and optionally path2, etc., with exactly one
directory separator (os.sep) following each non-empty part except the last.
(This means that an empty last part will result in a path that ends with a
separator.) Note that on Windows, since there is a current directory for
each drive, os.path.join("c:","foo") represents a path relative to the
current directory on drive C: (c:foo), not c:\foo.
Normalize the case of a pathname. On Unix and Mac OS X, this returns the
path unchanged; on case-insensitive filesystems, it converts the path to
lowercase. On Windows, it also converts forward slashes to backward slashes.
Raise a TypeError if the type of path is not str or bytes.
Normalize a pathname. This collapses redundant separators and up-level
references so that A//B, A/B/, A/./B and A/foo/../B all become
A/B.
It does not normalize the case (use normcase() for that). On Windows, it
converts forward slashes to backward slashes. It should be understood that this
may change the meaning of the path if it contains symbolic links!
Return the canonical path of the specified filename, eliminating any symbolic
links encountered in the path (if they are supported by the operating system).
Return True if both pathname arguments refer to the same file or directory.
On Unix, this is determined by the device number and i-node number and raises an
exception if a os.stat() call on either pathname fails.
On Windows, two files are the same if they resolve to the same final path
name using the Windows API call GetFinalPathNameByHandle. This function
raises an exception if handles cannot be obtained to either file.
Return True if the stat tuples stat1 and stat2 refer to the same file.
These structures may have been returned by fstat(), lstat(), or
stat(). This function implements the underlying comparison used by
samefile() and sameopenfile().
Split the pathname path into a pair, (head,tail) where tail is the
last pathname component and head is everything leading up to that. The
tail part will never contain a slash; if path ends in a slash, tail
will be empty. If there is no slash in path, head will be empty. If
path is empty, both head and tail are empty. Trailing slashes are
stripped from head unless it is the root (one or more slashes only). In
all cases, join(head,tail) returns a path to the same location as path
(but the strings may differ).
Split the pathname path into a pair (drive,tail) where drive is either
a mount point or the empty string. On systems which do not use drive
specifications, drive will always be the empty string. In all cases, drive+tail will be the same as path.
On Windows, splits a pathname into drive/UNC sharepoint and relative path.
If the path contains a drive letter, drive will contain everything
up to and including the colon.
e.g. splitdrive("c:/dir") returns ("c:","/dir")
If the path contains a UNC path, drive will contain the host name
and share, up to but not including the fourth separator.
e.g. splitdrive("//host/computer/dir") returns ("//host/computer","/dir")
Split the pathname path into a pair (root,ext) such that root+ext==path, and ext is empty or begins with a period and contains at most one
period. Leading periods on the basename are ignored; splitext('.cshrc')
returns ('.cshrc','').
Deprecated since version 3.1: Use splitdrive instead.
Split the pathname path into a pair (unc,rest) so that unc is the UNC
mount point (such as r'\\host\mount'), if present, and rest the rest of
the path (such as r'\path\file.ext'). For paths containing drive letters,
unc will always be the empty string.
This module implements a helper class and functions to quickly write a
loop over standard input or a list of files. If you just want to read or
write one file see open().
The typical use is:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:], defaulting
to sys.stdin if the list is empty. If a filename is '-', it is also
replaced by sys.stdin. To specify an alternative list of filenames, pass it
as the first argument to input(). A single file name is also allowed.
All files are opened in text mode by default, but you can override this by
specifying the mode parameter in the call to input() or
FileInput. If an I/O error occurs during opening or reading a file,
IOError is raised.
If sys.stdin is used more than once, the second and further use will return
no lines, except perhaps for interactive use, or if it has been explicitly reset
(e.g. using sys.stdin.seek(0)).
Empty files are opened and immediately closed; the only time their presence in
the list of filenames is noticeable at all is when the last file opened is
empty.
Lines are returned with any newlines intact, which means that the last line in
a file may not have one.
You can control how files are opened by providing an opening hook via the
openhook parameter to fileinput.input() or FileInput(). The
hook must be a function that takes two arguments, filename and mode, and
returns an accordingly opened file-like object. Two useful hooks are already
provided by this module.
The following function is the primary interface of this module:
Create an instance of the FileInput class. The instance will be used
as global state for the functions of this module, and is also returned to use
during iteration. The parameters to this function will be passed along to the
constructor of the FileInput class.
The FileInput instance can be used as a context manager in the
with statement. In this example, input is closed after the
with statement is exited, even if an exception occurs:
with fileinput.input(files=('spam.txt', 'eggs.txt')) as f:
for line in f:
process(line)
Changed in version 3.2:
Changed in version 3.2: Can be used as a context manager.
The following functions use the global state created by fileinput.input();
if there is no active state, RuntimeError is raised.
Return the cumulative line number of the line that has just been read. Before
the first line has been read, returns 0. After the last line of the last
file has been read, returns the line number of that line.
Return the line number in the current file. Before the first line has been
read, returns 0. After the last line of the last file has been read,
returns the line number of that line within the file.
Close the current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count towards the
cumulative line count. The filename is not changed until after the first line
of the next file has been read. Before the first line has been read, this
function has no effect; it cannot be used to skip the first file. After the
last line of the last file has been read, this function has no effect.
With mode you can specify which file mode will be passed to open(). It
must be one of 'r', 'rU', 'U' and 'rb'.
The openhook, when given, must be a function that takes two arguments,
filename and mode, and returns an accordingly opened file-like object. You
cannot use inplace and openhook together.
A FileInput instance can be used as a context manager in the
with statement. In this example, input is closed after the
with statement is exited, even if an exception occurs:
with FileInput(files=('spam.txt', 'eggs.txt')) as input:
process(input)
Changed in version 3.2:
Changed in version 3.2: Can be used as a context manager.
Optional in-place filtering: if the keyword argument inplace=True is
passed to fileinput.input() or to the FileInput constructor, the
file is moved to a backup file and standard output is directed to the input file
(if a file of the same name as the backup file already exists, it will be
replaced silently). This makes it possible to write a filter that rewrites its
input file in place. If the backup parameter is given (typically as
backup='.<someextension>'), it specifies the extension for the backup file,
and the backup file remains around; by default, the extension is '.bak' and
it is deleted when the output file is closed. In-place filtering is disabled
when standard input is read.
Note
The current implementation does not work for MS-DOS 8+3 filesystems.
The two following opening hooks are provided by this module:
Transparently opens files compressed with gzip and bzip2 (recognized by the
extensions '.gz' and '.bz2') using the gzip and bz2
modules. If the filename extension is not '.gz' or '.bz2', the file is
opened normally (ie, using open() without any decompression).
The stat module defines constants and functions for interpreting the
results of os.stat(), os.fstat() and os.lstat() (if they
exist). For complete details about the stat(), fstat() and
lstat() calls, consult the documentation for your system.
The stat module defines the following functions to test for specific file
types:
Return the portion of the file’s mode that can be set by os.chmod()—that is, the file’s permission bits, plus the sticky bit, set-group-id, and
set-user-id bits (on systems that support them).
Return the portion of the file’s mode that describes the file type (used by the
S_IS*() functions above).
Normally, you would use the os.path.is*() functions for testing the type
of a file; the functions here are useful when you are doing multiple tests of
the same file and wish to avoid the overhead of the stat() system call
for each test. These are also useful when checking for information about a file
that isn’t handled by os.path, like the tests for block and character
devices.
Example:
import os, sys
from stat import *
def walktree(top, callback):
'''recursively descend the directory tree rooted at top,
calling the callback function for each regular file'''
for f in os.listdir(top):
pathname = os.path.join(top, f)
mode = os.stat(pathname).st_mode
if S_ISDIR(mode):
# It's a directory, recurse into it
walktree(pathname, callback)
elif S_ISREG(mode):
# It's a file, call the callback function
callback(pathname)
else:
# Unknown file type, print a message
print('Skipping %s' % pathname)
def visitfile(file):
print('visiting', file)
if __name__ == '__main__':
walktree(sys.argv[1], visitfile)
All the variables below are simply symbolic indexes into the 10-tuple returned
by os.stat(), os.fstat() or os.lstat().
The “ctime” as reported by the operating system. On some systems (like Unix) is
the time of the last metadata change, and, on others (like Windows), is the
creation time (see platform documentation for details).
The interpretation of “file size” changes according to the file type. For plain
files this is the size of the file in bytes. For FIFOs and sockets under most
flavors of Unix (including Linux in particular), the “size” is the number of
bytes waiting to be read at the time of the call to os.stat(),
os.fstat(), or os.lstat(); this can sometimes be useful, especially
for polling one of these special files after a non-blocking open. The meaning
of the size field for other character and block devices varies more, depending
on the implementation of the underlying system call.
The variables below define the flags used in the ST_MODE field.
Use of the functions above is more portable than use of the first set of flags:
Set-group-ID bit. This bit has several special uses. For a directory
it indicates that BSD semantics is to be used for that directory:
files created there inherit their group ID from the directory, not
from the effective group ID of the creating process, and directories
created there will also get the S_ISGID bit set. For a
file that does not have the group execution bit (S_IXGRP)
set, the set-group-ID bit indicates mandatory file/record locking
(see also S_ENFMT).
Sticky bit. When this bit is set on a directory it means that a file
in that directory can be renamed or deleted only by the owner of the
file, by the owner of the directory, or by a privileged process.
System V file locking enforcement. This flag is shared with S_ISGID:
file/record locking is enforced on files that do not have the group
execution bit (S_IXGRP) set.
The filecmp module defines functions to compare files and directories,
with various optional time/correctness trade-offs. For comparing files,
see also the difflib module.
The filecmp module defines the following functions:
Compare the files in the two directories dir1 and dir2 whose names are
given by common.
Returns three lists of file names: match, mismatch,
errors. match contains the list of files that match, mismatch contains
the names of those that don’t, and errors lists the names of files which
could not be compared. Files are listed in errors if they don’t exist in
one of the directories, the user lacks permission to read them or if the
comparison could not be done for some other reason.
The shallow parameter has the same meaning and default value as for
filecmp.cmp().
For example, cmpfiles('a','b',['c','d/e']) will compare a/c with
b/c and a/d/e with b/d/e. 'c' and 'd/e' will each be in
one of the three returned lists.
dircmp instances are built using this constructor:
class filecmp.dircmp(a, b, ignore=None, hide=None)¶
Construct a new directory comparison object, to compare the directories a and
b. ignore is a list of names to ignore, and defaults to ['RCS','CVS','tags']. hide is a list of names to hide, and defaults to [os.curdir,os.pardir].
Print a comparison between a and b and common subdirectories
(recursively).
The dircmp offers a number of interesting attributes that may be
used to get various bits of information about the directory trees being
compared.
Note that via __getattr__() hooks, all attributes are computed lazily,
so there is no speed penalty if only those attributes which are lightweight
to compute are used.
This module generates temporary files and directories. It works on all
supported platforms. It provides three new functions,
NamedTemporaryFile(), mkstemp(), and mkdtemp(), which should
eliminate all remaining need to use the insecure mktemp() function.
Temporary file names created by this module no longer contain the process ID;
instead a string of six random characters is used.
Also, all the user-callable functions now take additional arguments which
allow direct control over the location and name of temporary files. It is
no longer necessary to use the global tempdir and template variables.
To maintain backward compatibility, the argument order is somewhat odd; it
is recommended to use keyword arguments for clarity.
The module defines the following user-callable items:
Return a file-like object that can be used as a temporary storage area.
The file is created using mkstemp(). It will be destroyed as soon
as it is closed (including an implicit close when the object is garbage
collected). Under Unix, the directory entry for the file is removed
immediately after the file is created. Other platforms do not support
this; your code should not rely on a temporary file created using this
function having or not having a visible name in the file system.
The mode parameter defaults to 'w+b' so that the file created can
be read and written without being closed. Binary mode is used so that it
behaves consistently on all platforms without regard for the data that is
stored. buffering, encoding and newline are interpreted as for
open().
The dir, prefix and suffix parameters are passed to mkstemp().
The returned object is a true file object on POSIX platforms. On other
platforms, it is a file-like object whose file attribute is the
underlying true file object. This file-like object can be used in a
with statement, just like a normal file.
This function operates exactly as TemporaryFile() does, except that
the file is guaranteed to have a visible name in the file system (on
Unix, the directory entry is not unlinked). That name can be retrieved
from the name attribute of the file object. Whether the name can be
used to open the file a second time, while the named temporary file is
still open, varies across platforms (it can be so used on Unix; it cannot
on Windows NT or later). If delete is true (the default), the file is
deleted as soon as it is closed.
The returned object is always a file-like object whose file
attribute is the underlying true file object. This file-like object can
be used in a with statement, just like a normal file.
This function operates exactly as TemporaryFile() does, except that
data is spooled in memory until the file size exceeds max_size, or
until the file’s fileno() method is called, at which point the
contents are written to disk and operation proceeds as with
TemporaryFile().
The resulting file has one additional method, rollover(), which
causes the file to roll over to an on-disk file regardless of its size.
The returned object is a file-like object whose _file attribute
is either a StringIO object or a true file object, depending on
whether rollover() has been called. This file-like object can be
used in a with statement, just like a normal file.
This function creates a temporary directory using mkdtemp()
(the supplied arguments are passed directly to the underlying function).
The resulting object can be used as a context manager (see
With 语句的上下文管理器). On completion of the context (or destruction
of the temporary directory object), the newly created temporary directory
and all its contents are removed from the filesystem.
The directory name can be retrieved from the name attribute
of the returned object.
The directory can be explicitly cleaned up by calling the
cleanup() method.
Creates a temporary file in the most secure manner possible. There are
no race conditions in the file’s creation, assuming that the platform
properly implements the os.O_EXCL flag for os.open(). The
file is readable and writable only by the creating user ID. If the
platform uses permission bits to indicate whether a file is executable,
the file is executable by no one. The file descriptor is not inherited
by child processes.
Unlike TemporaryFile(), the user of mkstemp() is responsible
for deleting the temporary file when done with it.
If suffix is specified, the file name will end with that suffix,
otherwise there will be no suffix. mkstemp() does not put a dot
between the file name and the suffix; if you need one, put it at the
beginning of suffix.
If prefix is specified, the file name will begin with that prefix;
otherwise, a default prefix is used.
If dir is specified, the file will be created in that directory;
otherwise, a default directory is used. The default directory is chosen
from a platform-dependent list, but the user of the application can
control the directory location by setting the TMPDIR, TEMP or TMP
environment variables. There is thus no guarantee that the generated
filename will have any nice properties, such as not requiring quoting
when passed to external commands via os.popen().
If text is specified, it indicates whether to open the file in binary
mode (the default) or text mode. On some platforms, this makes no
difference.
mkstemp() returns a tuple containing an OS-level handle to an open
file (as would be returned by os.open()) and the absolute pathname
of that file, in that order.
Creates a temporary directory in the most secure manner possible. There
are no race conditions in the directory’s creation. The directory is
readable, writable, and searchable only by the creating user ID.
The user of mkdtemp() is responsible for deleting the temporary
directory and its contents when done with it.
The prefix, suffix, and dir arguments are the same as for
mkstemp().
mkdtemp() returns the absolute pathname of the new directory.
Deprecated since version 2.3: Use mkstemp() instead.
Return an absolute pathname of a file that did not exist at the time the
call is made. The prefix, suffix, and dir arguments are the same
as for mkstemp().
Warning
Use of this function may introduce a security hole in your program. By
the time you get around to doing anything with the file name it returns,
someone else may have beaten you to the punch. mktemp() usage can
be replaced easily with NamedTemporaryFile(), passing it the
delete=False parameter:
>>> f = NamedTemporaryFile(delete=False)
>>> f
<open file '<fdopen>', mode 'w+b' at 0x384698>
>>> f.name
'/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0'
>>> f.write("Hello World!\n")
>>> f.close()
>>> os.unlink(f.name)
>>> os.path.exists(f.name)
False
The module uses two global variables that tell it how to construct a
temporary name. They are initialized at the first call to any of the
functions above. The caller may change them, but this is discouraged; use
the appropriate function arguments, instead.
When set to a value other than None, this variable defines the
default value for the dir argument to all the functions defined in this
module.
If tempdir is unset or None at any call to any of the above
functions, Python searches a standard list of directories and sets
tempdir to the first one which the calling user can create files in.
The list is:
The directory named by the TMPDIR environment variable.
The directory named by the TEMP environment variable.
The directory named by the TMP environment variable.
A platform-specific location:
On Windows, the directories C:\TEMP, C:\TMP,
\TEMP, and \TMP, in that order.
On all other platforms, the directories /tmp, /var/tmp, and
/usr/tmp, in that order.
Return the directory currently selected to create temporary files in. If
tempdir is not None, this simply returns its contents; otherwise,
the search described above is performed, and the result returned.
Here are some examples of typical usage of the tempfile module:
>>> import tempfile
# create a temporary file and write some data to it
>>> fp = tempfile.TemporaryFile()
>>> fp.write(b'Hello world!')
# read data from file
>>> fp.seek(0)
>>> fp.read()
b'Hello world!'
# close the file, it will be removed
>>> fp.close()
# create a temporary file using a context manager
>>> with tempfile.TemporaryFile() as fp:
... fp.write(b'Hello world!')
... fp.seek(0)
... fp.read()
b'Hello world!'
>>>
# file is now closed and removed
# create a temporary directory using the context manager
>>> with tempfile.TemporaryDirectory() as tmpdirname:
... print('created temporary directory', tmpdirname)
>>>
# directory and contents have been removed
The glob module finds all the pathnames matching a specified pattern
according to the rules used by the Unix shell. No tilde expansion is done, but
*, ?, and character ranges expressed with [] will be correctly
matched. This is done by using the os.listdir() and
fnmatch.fnmatch() functions in concert, and not by actually invoking a
subshell. (For tilde and shell variable expansion, use
os.path.expanduser() and os.path.expandvars().)
Return a possibly-empty list of path names that match pathname, which must be
a string containing a path specification. pathname can be either absolute
(like /usr/src/Python-1.5/Makefile) or relative (like
../../Tools/*/*.gif), and can contain shell-style wildcards. Broken
symlinks are included in the results (as in the shell).
Return an iterator which yields the same values as glob()
without actually storing them all simultaneously.
For example, consider a directory containing only the following files:
1.gif, 2.txt, and card.gif. glob() will produce
the following results. Notice how any leading components of the path are
preserved.
This module provides support for Unix shell-style wildcards, which are not the
same as regular expressions (which are documented in the re module). The
special characters used in shell-style wildcards are:
Pattern
Meaning
*
matches everything
?
matches any single character
[seq]
matches any character in seq
[!seq]
matches any character not in seq
Note that the filename separator ('/' on Unix) is not special to this
module. See module glob for pathname expansion (glob uses
fnmatch() to match pathname segments). Similarly, filenames starting with
a period are not special for this module, and are matched by the * and ?
patterns.
Test whether the filename string matches the pattern string, returning
True or False. If the operating system is case-insensitive,
then both parameters will be normalized to all lower- or upper-case before
the comparison is performed. fnmatchcase() can be used to perform a
case-sensitive comparison, regardless of whether that’s standard for the
operating system.
This example will print all file names in the current directory with the
extension .txt:
import fnmatch
import os
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
The linecache module allows one to get any line from any file, while
attempting to optimize internally, using a cache, the common case where many
lines are read from a single file. This is used by the traceback module
to retrieve source lines for inclusion in the formatted traceback.
The linecache module defines the following functions:
Get line lineno from file named filename. This function will never raise an
exception — it will return '' on errors (the terminating newline character
will be included for lines that are found).
If a file named filename is not found, the function will look for it in the
module search path, sys.path, after first checking for a PEP 302__loader__ in module_globals, in case the module was imported from a
zipfile or other non-filesystem import source.
Check the cache for validity. Use this function if files in the cache may have
changed on disk, and you require the updated version. If filename is omitted,
it will check all the entries in the cache.
The shutil module offers a number of high-level operations on files and
collections of files. In particular, functions are provided which support file
copying and removal. For operations on individual files, see also the
os module.
Warning
Even the higher-level file copying functions (copy(), copy2())
cannot copy all file metadata.
On POSIX platforms, this means that file owner and group are lost as well
as ACLs. On Mac OS, the resource fork and other metadata are not used.
This means that resources will be lost and file type and creator codes will
not be correct. On Windows, file owners, ACLs and alternate data streams
are not copied.
Copy the contents of the file-like object fsrc to the file-like object fdst.
The integer length, if given, is the buffer size. In particular, a negative
length value means to copy the data without looping over the source data in
chunks; by default the data is read in chunks to avoid uncontrolled memory
consumption. Note that if the current file position of the fsrc object is not
0, only the contents from the current file position to the end of the file will
be copied.
Copy the contents (no metadata) of the file named src to a file named dst.
dst must be the complete target file name; look at copy() for a copy that
accepts a target directory path. If src and dst are the same files,
Error is raised.
The destination location must be writable; otherwise, an IOError exception
will be raised. If dst already exists, it will be replaced. Special files
such as character or block devices and pipes cannot be copied with this
function. src and dst are path names given as strings.
Copy the permission bits, last access time, last modification time, and flags
from src to dst. The file contents, owner, and group are unaffected. src
and dst are path names given as strings.
Copy the file src to the file or directory dst. If dst is a directory, a
file with the same basename as src is created (or overwritten) in the
directory specified. Permission bits are copied. src and dst are path
names given as strings.
This factory function creates a function that can be used as a callable for
copytree()‘s ignore argument, ignoring files and directories that
match one of the glob-style patterns provided. See the example below.
Recursively copy an entire directory tree rooted at src. The destination
directory, named by dst, must not already exist; it will be created as well
as missing parent directories. Permissions and times of directories are
copied with copystat(), individual files are copied using
copy2().
If symlinks is true, symbolic links in the source tree are represented as
symbolic links in the new tree, but the metadata of the original links is NOT
copied; if false or omitted, the contents and metadata of the linked files
are copied to the new tree.
When symlinks is false, if the file pointed by the symlink doesn’t
exist, a exception will be added in the list of errors raised in
a Error exception at the end of the copy process.
You can set the optional ignore_dangling_symlinks flag to true if you
want to silence this exception. Notice that this option has no effect
on platforms that don’t support os.symlink().
If ignore is given, it must be a callable that will receive as its
arguments the directory being visited by copytree(), and a list of its
contents, as returned by os.listdir(). Since copytree() is
called recursively, the ignore callable will be called once for each
directory that is copied. The callable must return a sequence of directory
and file names relative to the current directory (i.e. a subset of the items
in its second argument); these names will then be ignored in the copy
process. ignore_patterns() can be used to create such a callable that
ignores names based on glob-style patterns.
If exception(s) occur, an Error is raised with a list of reasons.
If copy_function is given, it must be a callable that will be used
to copy each file. It will be called with the source path and the
destination path as arguments. By default, copy2() is used, but any
function that supports the same signature (like copy()) can be used.
Changed in version 3.2:
Changed in version 3.2: Added the copy_function argument to be able to provide a custom copy
function.
Changed in version 3.2:
Changed in version 3.2: Added the ignore_dangling_symlinks argument to silent dangling symlinks
errors when symlinks is false.
Delete an entire directory tree; path must point to a directory (but not a
symbolic link to a directory). If ignore_errors is true, errors resulting
from failed removals will be ignored; if false or omitted, such errors are
handled by calling a handler specified by onerror or, if that is omitted,
they raise an exception.
If onerror is provided, it must be a callable that accepts three
parameters: function, path, and excinfo. The first parameter,
function, is the function which raised the exception; it will be
os.path.islink(), os.listdir(), os.remove() or
os.rmdir(). The second parameter, path, will be the path name passed
to function. The third parameter, excinfo, will be the exception
information return by sys.exc_info(). Exceptions raised by onerror
will not be caught.
Recursively move a file or directory (src) to another location (dst).
If the destination is a directory or a symlink to a directory, then src is
moved inside that directory.
The destination directory must not already exist. If the destination already
exists but is not a directory, it may be overwritten depending on
os.rename() semantics.
If the destination is on the current filesystem, then os.rename() is
used. Otherwise, src is copied (using copy2()) to dst and then
removed.
This exception collects exceptions that are raised during a multi-file
operation. For copytree(), the exception argument is a list of 3-tuples
(srcname, dstname, exception).
This example is the implementation of the copytree() function, described
above, with the docstring omitted. It demonstrates many of the other functions
provided by this module.
def copytree(src, dst, symlinks=False):
names = os.listdir(src)
os.makedirs(dst)
errors = []
for name in names:
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
try:
if symlinks and os.path.islink(srcname):
linkto = os.readlink(srcname)
os.symlink(linkto, dstname)
elif os.path.isdir(srcname):
copytree(srcname, dstname, symlinks)
else:
copy2(srcname, dstname)
# XXX What about devices, sockets etc.?
except (IOError, os.error) as why:
errors.append((srcname, dstname, str(why)))
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error as err:
errors.extend(err.args[0])
try:
copystat(src, dst)
except WindowsError:
# can't copy file access times on Windows
pass
except OSError as why:
errors.extend((src, dst, str(why)))
if errors:
raise Error(errors)
Create an archive file (such as zip or tar) and return its name.
base_name is the name of the file to create, including the path, minus
any format-specific extension. format is the archive format: one of
“zip”, “tar”, “bztar” (if the bz2 module is available) or “gztar”.
root_dir is a directory that will be the root directory of the
archive; for example, we typically chdir into root_dir before creating the
archive.
base_dir is the directory where we start archiving from;
i.e. base_dir will be the common prefix of all files and
directories in the archive.
root_dir and base_dir both default to the current directory.
owner and group are used when creating a tar archive. By default,
uses the current owner and group.
Unpack an archive. filename is the full path of the archive.
extract_dir is the name of the target directory where the archive is
unpacked. If not provided, the current working directory is used.
format is the archive format: one of “zip”, “tar”, or “gztar”. Or any
other format registered with register_unpack_format(). If not
provided, unpack_archive() will use the archive file name extension
and see if an unpacker was registered for that extension. In case none is
found, a ValueError is raised.
Registers an unpack format. name is the name of the format and
extensions is a list of extensions corresponding to the format, like
.zip for Zip files.
function is the callable that will be used to unpack archives. The
callable will receive the path of the archive, followed by the directory
the archive must be extracted to.
When provided, extra_args is a sequence of (name,value) tuples that
will be passed as keywords arguments to the callable.
description can be provided to describe the format, and will be returned
by the get_unpack_formats() function.
This module is the Mac OS 9 (and earlier) implementation of the os.path
module. It can be used to manipulate old-style Macintosh pathnames on Mac OS X
(or any other platform).
The following functions are available in this module: normcase(),
normpath(), isabs(), join(), split(), isdir(),
isfile(), walk(), exists(). For other functions available in
os.path dummy counterparts are available.
The modules described in this chapter support storing Python data in a
persistent form on disk. The pickle and marshal modules can turn
many Python data types into a stream of bytes and then recreate the objects from
the bytes. The various DBM-related modules support a family of hash-based file
formats that store a mapping of strings to other strings.
The pickle module implements a fundamental, but powerful algorithm for
serializing and de-serializing a Python object structure. “Pickling” is the
process whereby a Python object hierarchy is converted into a byte stream, and
“unpickling” is the inverse operation, whereby a byte stream is converted back
into an object hierarchy. Pickling (and unpickling) is alternatively known as
“serialization”, “marshalling,” [1] or “flattening”, however, to avoid
confusion, the terms used here are “pickling” and “unpickling”..
Warning
The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an untrusted
or unauthenticated source.
The pickle module has an transparent optimizer (_pickle) written
in C. It is used whenever available. Otherwise the pure Python implementation is
used.
Python has a more primitive serialization module called marshal, but in
general pickle should always be the preferred way to serialize Python
objects. marshal exists primarily to support Python’s .pyc
files.
The pickle module differs from marshal in several significant ways:
The pickle module keeps track of the objects it has already serialized,
so that later references to the same object won’t be serialized again.
marshal doesn’t do this.
This has implications both for recursive objects and object sharing. Recursive
objects are objects that contain references to themselves. These are not
handled by marshal, and in fact, attempting to marshal recursive objects will
crash your Python interpreter. Object sharing happens when there are multiple
references to the same object in different places in the object hierarchy being
serialized. pickle stores such objects only once, and ensures that all
other references point to the master copy. Shared objects remain shared, which
can be very important for mutable objects.
marshal cannot be used to serialize user-defined classes and their
instances. pickle can save and restore class instances transparently,
however the class definition must be importable and live in the same module as
when the object was stored.
The marshal serialization format is not guaranteed to be portable
across Python versions. Because its primary job in life is to support
.pyc files, the Python implementers reserve the right to change the
serialization format in non-backwards compatible ways should the need arise.
The pickle serialization format is guaranteed to be backwards compatible
across Python releases.
Note that serialization is a more primitive notion than persistence; although
pickle reads and writes file objects, it does not handle the issue of
naming persistent objects, nor the (even more complicated) issue of concurrent
access to persistent objects. The pickle module can transform a complex
object into a byte stream and it can transform the byte stream into an object
with the same internal structure. Perhaps the most obvious thing to do with
these byte streams is to write them onto a file, but it is also conceivable to
send them across a network or store them in a database. The module
shelve provides a simple interface to pickle and unpickle objects on
DBM-style database files.
The data format used by pickle is Python-specific. This has the
advantage that there are no restrictions imposed by external standards such as
XDR (which can’t represent pointer sharing); however it means that non-Python
programs may not be able to reconstruct pickled Python objects.
By default, the pickle data format uses a compact binary representation.
The module pickletools contains tools for analyzing data streams
generated by pickle.
There are currently 4 different protocols which can be used for pickling.
Protocol version 0 is the original human-readable protocol and is
backwards compatible with earlier versions of Python.
Protocol version 1 is the old binary format which is also compatible with
earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more
efficient pickling of new-style classes.
Protocol version 3 was added in Python 3.0. It has explicit support for
bytes and cannot be unpickled by Python 2.x pickle modules. This is
the current recommended protocol, use it whenever it is possible.
Refer to PEP 307 for information about improvements brought by
protocol 2. See pickletools‘s source code for extensive
comments about opcodes used by pickle protocols.
To serialize an object hierarchy, you first create a pickler, then you call the
pickler’s dump() method. To de-serialize a data stream, you first create
an unpickler, then you call the unpickler’s load() method. The
pickle module provides the following constant:
The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
Currently the default protocol is 3; a backward-incompatible protocol
designed for Python 3.0.
The pickle module provides the following functions to make the pickling
process more convenient:
Write a pickled representation of obj to the open file objectfile.
This is equivalent to Pickler(file,protocol).dump(obj).
The optional protocol argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The file argument must have a write() method that accepts a single bytes
argument. It can thus be an on-disk file opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this
interface.
If fix_imports is True and protocol is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x,
so that the pickle data stream is readable with Python 2.x.
Return the pickled representation of the object as a bytes
object, instead of writing it to a file.
The optional protocol argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
If fix_imports is True and protocol is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x,
so that the pickle data stream is readable with Python 2.x.
Read a pickled object representation from the open file objectfile
and return the reconstituted object hierarchy specified therein. This is
equivalent to Unpickler(file).load().
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object’s representation are
ignored.
The argument file must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus file can be an on-disk file opened
for binary reading, a io.BytesIO object, or any other custom object
that meets this interface.
Optional keyword arguments are fix_imports, encoding and errors,
which are used to control compatibility support for pickle stream generated
by Python 2.x. If fix_imports is True, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The encoding and
errors tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to ‘ASCII’ and ‘strict’, respectively.
Read a pickled object hierarchy from a bytes object and return the
reconstituted object hierarchy specified therein
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object’s representation are
ignored.
Optional keyword arguments are fix_imports, encoding and errors,
which are used to control compatibility support for pickle stream generated
by Python 2.x. If fix_imports is True, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The encoding and
errors tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to ‘ASCII’ and ‘strict’, respectively.
Error raised when there a problem unpickling an object, such as a data
corruption or a security violation. It inherits PickleError.
Note that other exceptions may also be raised during unpickling, including
(but not necessarily limited to) AttributeError, EOFError, ImportError, and
IndexError.
class pickle.Pickler(file, protocol=None, *, fix_imports=True)¶
This takes a binary file for writing a pickle data stream.
The optional protocol argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The file argument must have a write() method that accepts a single bytes
argument. It can thus be an on-disk file opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this interface.
If fix_imports is True and protocol is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x,
so that the pickle data stream is readable with Python 2.x.
Do nothing by default. This exists so a subclass can override it.
If persistent_id() returns None, obj is pickled as usual. Any
other value causes Pickler to emit the returned value as a
persistent ID for obj. The meaning of this persistent ID should be
defined by Unpickler.persistent_load(). Note that the value
returned by persistent_id() cannot itself have a persistent ID.
Deprecated. Enable fast mode if set to a true value. The fast mode
disables the usage of memo, therefore speeding the pickling process by not
generating superfluous PUT opcodes. It should not be used with
self-referential objects, doing otherwise will cause Pickler to
recurse infinitely.
class pickle.Unpickler(file, *, fix_imports=True, encoding="ASCII", errors="strict")¶
This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no
protocol argument is needed.
The argument file must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus file can be an on-disk file object opened
for binary reading, a io.BytesIO object, or any other custom object
that meets this interface.
Optional keyword arguments are fix_imports, encoding and errors,
which are used to control compatibility support for pickle stream generated
by Python 2.x. If fix_imports is True, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The encoding and
errors tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to ‘ASCII’ and ‘strict’, respectively.
Read a pickled object representation from the open file object given in
the constructor, and return the reconstituted object hierarchy specified
therein. Bytes past the pickled object’s representation are ignored.
If defined, persistent_load() should return the object specified by
the persistent ID pid. If an invalid persistent ID is encountered, an
UnpickingError should be raised.
Import module if necessary and return the object called name from it,
where the module and name arguments are str objects. Note,
unlike its name suggests, find_class() is also used for finding
functions.
Subclasses may override this to gain control over what type of objects and
how they can be loaded, potentially reducing security risks. Refer to
Restricting Globals for details.
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose __dict__ or __setstate__() is
picklable (see section Pickling Class Instances for details)
Attempts to pickle unpicklable objects will raise the PicklingError
exception; when this happens, an unspecified number of bytes may have already
been written to the underlying file. Trying to pickle a highly recursive data
structure may exceed the maximum recursion depth, a RuntimeError will be
raised in this case. You can carefully raise this limit with
sys.setrecursionlimit().
Note that functions (built-in and user-defined) are pickled by “fully qualified”
name reference, not by value. This means that only the function name is
pickled, along with the name of module the function is defined in. Neither the
function’s code, nor any of its function attributes are pickled. Thus the
defining module must be importable in the unpickling environment, and the module
must contain the named object, otherwise an exception will be raised. [2]
Similarly, classes are pickled by named reference, so the same restrictions in
the unpickling environment apply. Note that none of the class’s code or data is
pickled, so in the following example the class attribute attr is not
restored in the unpickling environment:
class Foo:
attr = 'A class attribute'
picklestring = pickle.dumps(Foo)
These restrictions are why picklable functions and classes must be defined in
the top level of a module.
Similarly, when class instances are pickled, their class’s code and data are not
pickled along with them. Only the instance data are pickled. This is done on
purpose, so you can fix bugs in a class or add methods to the class and still
load objects that were created with an earlier version of the class. If you
plan to have long-lived objects that will see many versions of a class, it may
be worthwhile to put a version number in the objects so that suitable
conversions can be made by the class’s __setstate__() method.
In this section, we describe the general mechanisms available to you to define,
customize, and control how class instances are pickled and unpickled.
In most cases, no additional code is needed to make instances picklable. By
default, pickle will retrieve the class and the attributes of an instance via
introspection. When a class instance is unpickled, its __init__() method
is usually not invoked. The default behaviour first creates an uninitialized
instance and then restores the saved attributes. The following code shows an
implementation of this behaviour:
In protocol 2 and newer, classes that implements the __getnewargs__()
method can dictate the values passed to the __new__() method upon
unpickling. This is often needed for classes whose __new__() method
requires arguments.
Classes can further influence how their instances are pickled; if the class
defines the method __getstate__(), it is called and the returned object
is pickled as the contents for the instance, instead of the contents of the
instance’s dictionary. If the __getstate__() method is absent, the
instance’s __dict__ is pickled as usual.
Upon unpickling, if the class defines __setstate__(), it is called with
the unpickled state. In that case, there is no requirement for the state
object to be a dictionary. Otherwise, the pickled state must be a dictionary
and its items are assigned to the new instance’s dictionary.
Refer to the section Handling Stateful Objects for more information about how to use
the methods __getstate__() and __setstate__().
Note
At unpickling time, some methods like __getattr__(),
__getattribute__(), or __setattr__() may be called upon the
instance. In case those methods rely on some internal invariant being true,
the type should implement __getnewargs__() to establish such an
invariant; otherwise, neither __new__() nor __init__() will be
called.
As we shall see, pickle does not use directly the methods described above. In
fact, these methods are part of the copy protocol which implements the
__reduce__() special method. The copy protocol provides a unified
interface for retrieving the data necessary for pickling and copying
objects. [3]
Although powerful, implementing __reduce__() directly in your classes is
error prone. For this reason, class designers should use the high-level
interface (i.e., __getnewargs__(), __getstate__() and
__setstate__()) whenever possible. We will show, however, cases where
using __reduce__() is the only option or leads to more efficient pickling
or both.
The interface is currently defined as follows. The __reduce__() method
takes no argument and shall return either a string or preferably a tuple (the
returned object is often referred to as the “reduce value”).
If a string is returned, the string should be interpreted as the name of a
global variable. It should be the object’s local name relative to its
module; the pickle module searches the module namespace to determine the
object’s module. This behaviour is typically useful for singletons.
When a tuple is returned, it must be between two and five items long.
Optional items can either be omitted, or None can be provided as their
value. The semantics of each item are in order:
A callable object that will be called to create the initial version of the
object.
A tuple of arguments for the callable object. An empty tuple must be given
if the callable does not accept any argument.
Optionally, the object’s state, which will be passed to the object’s
__setstate__() method as previously described. If the object has no
such method then, the value must be a dictionary and it will be added to
the object’s __dict__ attribute.
Optionally, an iterator (and not a sequence) yielding successive items.
These items will be appended to the object either using
obj.append(item) or, in batch, using obj.extend(list_of_items).
This is primarily used for list subclasses, but may be used by other
classes as long as they have append() and extend() methods with
the appropriate signature. (Whether append() or extend() is
used depends on which pickle protocol version is used as well as the number
of items to append, so both must be supported.)
Optionally, an iterator (not a sequence) yielding successive key-value
pairs. These items will be stored to the object using obj[key]=value. This is primarily used for dictionary subclasses, but may be used
by other classes as long as they implement __setitem__().
Alternatively, a __reduce_ex__() method may be defined. The only
difference is this method should take a single integer argument, the protocol
version. When defined, pickle will prefer it over the __reduce__()
method. In addition, __reduce__() automatically becomes a synonym for
the extended version. The main use for this method is to provide
backwards-compatible reduce values for older Python releases.
For the benefit of object persistence, the pickle module supports the
notion of a reference to an object outside the pickled data stream. Such
objects are referenced by a persistent ID, which should be either a string of
alphanumeric characters (for protocol 0) [4] or just an arbitrary object (for
any newer protocol).
The resolution of such persistent IDs is not defined by the pickle
module; it will delegate this resolution to the user defined methods on the
pickler and unpickler, persistent_id() and persistent_load()
respectively.
To pickle objects that have an external persistent id, the pickler must have a
custom persistent_id() method that takes an object as an argument and
returns either None or the persistent id for that object. When None is
returned, the pickler simply pickles the object as normal. When a persistent ID
string is returned, the pickler will pickle that object, along with a marker so
that the unpickler will recognize it as a persistent ID.
To unpickle external objects, the unpickler must have a custom
persistent_load() method that takes a persistent ID object and returns the
referenced object.
Here is a comprehensive example presenting how persistent ID can be used to
pickle external objects by reference.
# Simple example presenting how persistent ID can be used to pickle
# external objects by reference.
import pickle
import sqlite3
from collections import namedtuple
# Simple class representing a record in our database.
MemoRecord = namedtuple("MemoRecord", "key, task")
class DBPickler(pickle.Pickler):
def persistent_id(self, obj):
# Instead of pickling MemoRecord as a regular class instance, we emit a
# persistent ID.
if isinstance(obj, MemoRecord):
# Here, our persistent ID is simply a tuple, containing a tag and a
# key, which refers to a specific record in the database.
return ("MemoRecord", obj.key)
else:
# If obj does not have a persistent ID, return None. This means obj
# needs to be pickled as usual.
return None
class DBUnpickler(pickle.Unpickler):
def __init__(self, file, connection):
super().__init__(file)
self.connection = connection
def persistent_load(self, pid):
# This method is invoked whenever a persistent ID is encountered.
# Here, pid is the tuple returned by DBPickler.
cursor = self.connection.cursor()
type_tag, key_id = pid
if type_tag == "MemoRecord":
# Fetch the referenced record from the database and return it.
cursor.execute("SELECT * FROM memos WHERE key=?", (str(key_id),))
key, task = cursor.fetchone()
return MemoRecord(key, task)
else:
# Always raises an error if you cannot return the correct object.
# Otherwise, the unpickler will think None is the object referenced
# by the persistent ID.
raise pickle.UnpicklingError("unsupported persistent object")
def main():
import io
import pprint
# Initialize and populate our database.
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
cursor.execute("CREATE TABLE memos(key INTEGER PRIMARY KEY, task TEXT)")
tasks = (
'give food to fish',
'prepare group meeting',
'fight with a zebra',
)
for task in tasks:
cursor.execute("INSERT INTO memos VALUES(NULL, ?)", (task,))
# Fetch the records to be pickled.
cursor.execute("SELECT * FROM memos")
memos = [MemoRecord(key, task) for key, task in cursor]
# Save the records using our custom DBPickler.
file = io.BytesIO()
DBPickler(file).dump(memos)
print("Pickled records:")
pprint.pprint(memos)
# Update a record, just for good measure.
cursor.execute("UPDATE memos SET task='learn italian' WHERE key=1")
# Load the records from the pickle data stream.
file.seek(0)
memos = DBUnpickler(file, conn).load()
print("Unpickled records:")
pprint.pprint(memos)
if __name__ == '__main__':
main()
Here’s an example that shows how to modify pickling behavior for a class.
The TextReader class opens a text file, and returns the line number and
line contents each time its readline() method is called. If a
TextReader instance is pickled, all attributes except the file object
member are saved. When the instance is unpickled, the file is reopened, and
reading resumes from the last location. The __setstate__() and
__getstate__() methods are used to implement this behavior.
class TextReader:
"""Print and number lines in a text file."""
def __init__(self, filename):
self.filename = filename
self.file = open(filename)
self.lineno = 0
def readline(self):
self.lineno += 1
line = self.file.readline()
if not line:
return None
if line.endswith('\n'):
line = line[:-1]
return "%i: %s" % (self.lineno, line)
def __getstate__(self):
# Copy the object's state from self.__dict__ which contains
# all our instance attributes. Always use the dict.copy()
# method to avoid modifying the original state.
state = self.__dict__.copy()
# Remove the unpicklable entries.
del state['file']
return state
def __setstate__(self, state):
# Restore instance attributes (i.e., filename and lineno).
self.__dict__.update(state)
# Restore the previously opened file's state. To do so, we need to
# reopen it and read from it until the line count is restored.
file = open(self.filename)
for _ in range(self.lineno):
file.readline()
# Finally, save the file.
self.file = file
A sample usage might be something like this:
>>> reader = TextReader("hello.txt")
>>> reader.readline()
'1: Hello world!'
>>> reader.readline()
'2: I am line number two.'
>>> new_reader = pickle.loads(pickle.dumps(reader))
>>> new_reader.readline()
'3: Goodbye!'
By default, unpickling will import any class or function that it finds in the
pickle data. For many applications, this behaviour is unacceptable as it
permits the unpickler to import and invoke arbitrary code. Just consider what
this hand-crafted pickle data stream does when loaded:
>>> import pickle
>>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
hello world
0
In this example, the unpickler imports the os.system() function and then
apply the string argument “echo hello world”. Although this example is
inoffensive, it is not difficult to imagine one that could damage your system.
For this reason, you may want to control what gets unpickled by customizing
Unpickler.find_class(). Unlike its name suggests, find_class() is
called whenever a global (i.e., a class or a function) is requested. Thus it is
possible to either forbid completely globals or restrict them to a safe subset.
Here is an example of an unpickler allowing only few safe classes from the
builtins module to be loaded:
import builtins
import io
import pickle
safe_builtins = {
'range',
'complex',
'set',
'frozenset',
'slice',
}
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
# Only allow safe classes from builtins.
if module == "builtins" and name in safe_builtins:
return getattr(builtins, name)
# Forbid everything else.
raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
(module, name))
def restricted_loads(s):
"""Helper function analogous to pickle.loads()."""
return RestrictedUnpickler(io.BytesIO(s)).load()
A sample usage of our unpickler working has intended:
As our examples shows, you have to be careful with what you allow to be
unpickled. Therefore if security is a concern, you may want to consider
alternatives such as the marshalling API in xmlrpc.client or
third-party solutions.
For the simplest code, use the dump() and load() functions.
import pickle
# An arbitrary collection of objects supported by pickle.
data = {
'a': [1, 2.0, 3, 4+6j],
'b': ("character string", b"byte string"),
'c': set([None, True, False])
}
with open('data.pickle', 'wb') as f:
# Pickle the 'data' dictionary using the highest protocol available.
pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
The following example reads the resulting pickled data.
import pickle
with open('data.pickle', 'rb') as f:
# The protocol version used is detected automatically, so we do not
# have to specify it.
data = pickle.load(f)
The limitation on alphanumeric characters is due to the fact
the persistent IDs, in protocol 0, are delimited by the newline
character. Therefore if any kind of newline characters occurs in
persistent IDs, the resulting pickle will become unreadable.
The copyreg module provides support for the pickle module. The
copy module is likely to use this in the future as well. It provides
configuration information about object constructors which are not classes.
Such constructors may be factory functions or class instances.
Declares that function should be used as a “reduction” function for objects
of type type. function should return either a string or a tuple
containing two or three elements.
The optional constructor parameter, if provided, is a callable object which
can be used to reconstruct the object when called with the tuple of arguments
returned by function at pickling time. TypeError will be raised if
object is a class or constructor is not callable.
See the pickle module for more details on the interface expected of
function and constructor.
A “shelf” is a persistent, dictionary-like object. The difference with “dbm”
databases is that the values (not the keys!) in a shelf can be essentially
arbitrary Python objects — anything that the pickle module can handle.
This includes most class instances, recursive data types, and objects containing
lots of shared sub-objects. The keys are ordinary strings.
Open a persistent dictionary. The filename specified is the base filename for
the underlying database. As a side-effect, an extension may be added to the
filename and more than one file may be created. By default, the underlying
database file is opened for reading and writing. The optional flag parameter
has the same interpretation as the flag parameter of dbm.open().
By default, version 3 pickles are used to serialize values. The version of the
pickle protocol can be specified with the protocol parameter.
Because of Python semantics, a shelf cannot know when a mutable
persistent-dictionary entry is modified. By default modified objects are
written only when assigned to the shelf (see Example). If the
optional writeback parameter is set to True, all entries accessed are also
cached in memory, and written back on sync() and
close(); this can make it handier to mutate mutable entries in
the persistent dictionary, but, if many entries are accessed, it can consume
vast amounts of memory for the cache, and it can make the close operation
very slow since all accessed entries are written back (there is no way to
determine which accessed entries are mutable, nor which ones were actually
mutated).
Note
Do not rely on the shelf being closed automatically; always call
close() explicitly when you don’t need it any more, or use a
with statement with contextlib.closing().
Warning
Because the shelve module is backed by pickle, it is insecure
to load a shelf from an untrusted source. Like with pickle, loading a shelf
can execute arbitrary code.
Shelf objects support all methods supported by dictionaries. This eases the
transition from dictionary based scripts to those requiring persistent storage.
Write back all entries in the cache if the shelf was opened with writeback
set to True. Also empty the cache and synchronize the persistent
dictionary on disk, if feasible. This is called automatically when the shelf
is closed with close().
The choice of which database package will be used (such as dbm.ndbm or
dbm.gnu) depends on which interface is available. Therefore it is not
safe to open the database directly using dbm. The database is also
(unfortunately) subject to the limitations of dbm, if it is used —
this means that (the pickled representation of) the objects stored in the
database should be fairly small, and in rare cases key collisions may cause
the database to refuse updates.
The shelve module does not support concurrent read/write access to
shelved objects. (Multiple simultaneous read accesses are safe.) When a
program has a shelf open for writing, no other program should have it open for
reading or writing. Unix file locking can be used to solve this, but this
differs across Unix versions and requires knowledge about the database
implementation used.
class shelve.Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')¶
By default, version 0 pickles are used to serialize values. The version of the
pickle protocol can be specified with the protocol parameter. See the
pickle documentation for a discussion of the pickle protocols.
If the writeback parameter is True, the object will hold a cache of all
entries accessed and write them back to the dict at sync and close times.
This allows natural operations on mutable entries, but can consume much more
memory and make sync and close take a long time.
The keyencoding parameter is the encoding used to encode keys before they
are used with the underlying dict.
New in version 3.2:
New in version 3.2: The keyencoding parameter; previously, keys were always encoded in
UTF-8.
class shelve.BsdDbShelf(dict, protocol=None, writeback=False, keyencoding='utf-8')¶
A subclass of Shelf which exposes first(), next(),
previous(), last() and set_location() which are available
in the third-party bsddb module from pybsddb but not in other database
modules. The dict object passed to the constructor must support those
methods. This is generally accomplished by calling one of
bsddb.hashopen(), bsddb.btopen() or bsddb.rnopen(). The
optional protocol, writeback, and keyencoding parameters have the same
interpretation as for the Shelf class.
class shelve.DbfilenameShelf(filename, flag='c', protocol=None, writeback=False)¶
A subclass of Shelf which accepts a filename instead of a dict-like
object. The underlying file will be opened using dbm.open(). By
default, the file will be created and opened for both read and write. The
optional flag parameter has the same interpretation as for the open()
function. The optional protocol and writeback parameters have the same
interpretation as for the Shelf class.
To summarize the interface (key is a string, data is an arbitrary
object):
import shelve
d = shelve.open(filename) # open -- file may get suffix added by low-level
# library
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
klist = list(d.keys()) # a list of all existing keys (slow!)
# as d was opened WITHOUT writeback=True, beware:
d['xx'] = [0, 1, 2] # this works as expected, but...
d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it
# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.
d.close() # close it
This module contains functions that can read and write Python values in a binary
format. The format is specific to Python, but independent of machine
architecture issues (e.g., you can write a Python value to a file on a PC,
transport the file to a Sun, and read it back there). Details of the format are
undocumented on purpose; it may change between Python versions (although it
rarely does). [1]
This is not a general “persistence” module. For general persistence and
transfer of Python objects through RPC calls, see the modules pickle and
shelve. The marshal module exists mainly to support reading and
writing the “pseudo-compiled” code for Python modules of .pyc files.
Therefore, the Python maintainers reserve the right to modify the marshal format
in backward incompatible ways should the need arise. If you’re serializing and
de-serializing Python objects, use the pickle module instead – the
performance is comparable, version independence is guaranteed, and pickle
supports a substantially wider range of objects than marshal.
Warning
The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source.
Not all Python object types are supported; in general, only objects whose value
is independent from a particular invocation of Python can be written and read by
this module. The following types are supported: booleans, integers, floating
point numbers, complex numbers, strings, bytes, bytearrays, tuples, lists, sets,
frozensets, dictionaries, and code objects, where it should be understood that
tuples, lists, sets, frozensets and dictionaries are only supported as long as
the values contained therein are themselves supported; and recursive lists, sets
and dictionaries should not be written (they will cause infinite loops). The
singletons None, Ellipsis and StopIteration can also be
marshalled and unmarshalled.
There are functions that read/write files as well as functions operating on
strings.
Write the value on the open file. The value must be a supported type. The
file must be an open file object such as sys.stdout or returned by
open() or os.popen(). It must be opened in binary mode ('wb'
or 'w+b').
If the value has (or contains an object that has) an unsupported type, a
ValueError exception is raised — but garbage data will also be written
to the file. The object will not be properly read back by load().
The version argument indicates the data format that dump should use
(see below).
Read one value from the open file and return it. If no valid value is read
(e.g. because the data has a different Python version’s incompatible marshal
format), raise EOFError, ValueError or TypeError. The
file must be an open file object opened in binary mode ('rb' or
'r+b').
Note
If an object containing an unsupported type was marshalled with dump(),
load() will substitute None for the unmarshallable type.
Return the string that would be written to a file by dump(value,file). The
value must be a supported type. Raise a ValueError exception if value
has (or contains an object that has) an unsupported type.
The version argument indicates the data format that dumps should use
(see below).
Indicates the format that the module uses. Version 0 is the historical
format, version 1 shares interned strings and version 2 uses a binary format
for floating point numbers. The current version is 2.
The name of this module stems from a bit of terminology used by the designers of
Modula-3 (amongst others), who use the term “marshalling” for shipping of data
around in a self-contained form. Strictly speaking, “to marshal” means to
convert some data from internal to external form (in an RPC buffer for instance)
and “unmarshalling” for the reverse process.
dbm is a generic interface to variants of the DBM database —
dbm.gnu or dbm.ndbm. If none of these modules is installed, the
slow-but-simple implementation in module dbm.dumb will be used. There
is a third party interface to
the Oracle Berkeley DB.
A tuple containing the exceptions that can be raised by each of the supported
modules, with a unique exception also named dbm.error as the first
item — the latter is used when dbm.error is raised.
This function attempts to guess which of the several simple database modules
available — dbm.gnu, dbm.ndbm or dbm.dumb — should
be used to open a given file.
Returns one of the following values: None if the file can’t be opened
because it’s unreadable or doesn’t exist; the empty string ('') if the
file’s format can’t be guessed; or a string containing the required module
name, such as 'dbm.ndbm' or 'dbm.gnu'.
Open the database file file and return a corresponding object.
If the database file already exists, the whichdb() function is used to
determine its type and the appropriate module is used; if it does not exist,
the first module listed above that can be imported is used.
The optional flag argument can be:
Value
Meaning
'r'
Open existing database for reading only
(default)
'w'
Open existing database for reading and
writing
'c'
Open database for reading and writing,
creating it if it doesn’t exist
'n'
Always create a new, empty database, open
for reading and writing
The optional mode argument is the Unix mode of the file, used only when the
database has to be created. It defaults to octal 0o666 (and will be
modified by the prevailing umask).
The object returned by open() supports the same basic functionality as
dictionaries; keys and their corresponding values can be stored, retrieved, and
deleted, and the in operator and the keys() method are
available, as well as get() and setdefault().
Changed in version 3.2:
Changed in version 3.2: get() and setdefault() are now available in all database modules.
Key and values are always stored as bytes. This means that when
strings are used they are implicitly converted to the default encoding before
being stored.
The following example records some hostnames and a corresponding title, and
then prints out the contents of the database:
import dbm
# Open database, creating it if necessary.
db = dbm.open('cache', 'c')
# Record some values
db[b'hello'] = b'there'
db['www.python.org'] = 'Python Website'
db['www.cnn.com'] = 'Cable News Network'
# Note that the keys are considered bytes now.
assert db[b'www.python.org'] == b'Python Website'
# Notice how the value is now in bytes.
assert db['www.cnn.com'] == b'Cable News Network'
# Often-used methods of the dict interface work too.
print(db.get('python.org', b'not present'))
# Storing a non-string key or value will raise an exception (most
# likely a TypeError).
db['www.yahoo.com'] = 4
# Close when done.
db.close()
This module is quite similar to the dbm module, but uses the GNU library
gdbm instead to provide some additional functionality. Please note that the
file formats created by dbm.gnu and dbm.ndbm are incompatible.
The dbm.gnu module provides an interface to the GNU DBM library.
dbm.gnu.gdbm objects behave like mappings (dictionaries), except that keys and
values are always converted to bytes before storing. Printing a gdbm
object doesn’t print the
keys and values, and the items() and values() methods are not
supported.
Open a gdbm database and return a gdbm object. The filename
argument is the name of the database file.
The optional flag argument can be:
Value
Meaning
'r'
Open existing database for reading only
(default)
'w'
Open existing database for reading and
writing
'c'
Open database for reading and writing,
creating it if it doesn’t exist
'n'
Always create a new, empty database, open
for reading and writing
The following additional characters may be appended to the flag to control
how the database is opened:
Value
Meaning
'f'
Open the database in fast mode. Writes
to the database will not be synchronized.
's'
Synchronized mode. This will cause changes
to the database to be immediately written
to the file.
'u'
Do not lock database.
Not all flags are valid for all versions of gdbm. The module constant
open_flags is a string of supported flag characters. The exception
error is raised if an invalid flag is specified.
The optional mode argument is the Unix mode of the file, used only when the
database has to be created. It defaults to octal 0o666.
In addition to the dictionary-like methods, gdbm objects have the
following methods:
It’s possible to loop over every key in the database using this method and the
nextkey() method. The traversal is ordered by gdbm‘s internal
hash values, and won’t be sorted by the key values. This method returns
the starting key.
Returns the key that follows key in the traversal. The following code prints
every key in the database db, without having to create a list in memory that
contains them all:
k = db.firstkey()
while k != None:
print(k)
k = db.nextkey(k)
If you have carried out a lot of deletions and would like to shrink the space
used by the gdbm file, this routine will reorganize the database. gdbm
objects will not shorten the length of a database file except by using this
reorganization; otherwise, deleted file space will be kept and reused as new
(key, value) pairs are added.
The dbm.ndbm module provides an interface to the Unix “(n)dbm” library.
Dbm objects behave like mappings (dictionaries), except that keys and values are
always stored as bytes. Printing a dbm object doesn’t print the keys and
values, and the items() and values() methods are not supported.
This module can be used with the “classic” ndbm interface or the GNU GDBM
compatibility interface. On Unix, the configure script will attempt
to locate the appropriate header file to simplify building this module.
Open a dbm database and return a dbm object. The filename argument is the
name of the database file (without the .dir or .pag extensions).
The optional flag argument must be one of these values:
Value
Meaning
'r'
Open existing database for reading only
(default)
'w'
Open existing database for reading and
writing
'c'
Open database for reading and writing,
creating it if it doesn’t exist
'n'
Always create a new, empty database, open
for reading and writing
The optional mode argument is the Unix mode of the file, used only when the
database has to be created. It defaults to octal 0o666 (and will be
modified by the prevailing umask).
The dbm.dumb module is intended as a last resort fallback for the
dbm module when a more robust module is not available. The dbm.dumb
module is not written for speed and is not nearly as heavily used as the other
database modules.
The dbm.dumb module provides a persistent dictionary-like interface which
is written entirely in Python. Unlike other modules such as dbm.gnu no
external library is required. As with other persistent mappings, the keys and
values are always stored as bytes.
Open a dumbdbm database and return a dumbdbm object. The filename argument is
the basename of the database file (without any specific extensions). When a
dumbdbm database is created, files with .dat and .dir extensions
are created.
The optional flag argument is currently ignored; the database is always opened
for update, and will be created if it does not exist.
The optional mode argument is the Unix mode of the file, used only when the
database has to be created. It defaults to octal 0o666 (and will be modified
by the prevailing umask).
In addition to the methods provided by the collections.MutableMapping class,
dumbdbm objects provide the following method:
Synchronize the on-disk directory and data files. This method is called
by the Shelve.sync() method.
sqlite3 — DB-API 2.0 interface for SQLite databases¶
SQLite is a C library that provides a lightweight disk-based database that
doesn’t require a separate server process and allows accessing the database
using a nonstandard variant of the SQL query language. Some applications can use
SQLite for internal data storage. It’s also possible to prototype an
application using SQLite and then port the code to a larger database such as
PostgreSQL or Oracle.
sqlite3 was written by Gerhard Häring and provides a SQL interface compliant
with the DB-API 2.0 specification described by PEP 249.
To use the module, you must first create a Connection object that
represents the database. Here the data will be stored in the
/tmp/example file:
conn = sqlite3.connect('/tmp/example')
You can also supply the special name :memory: to create a database in RAM.
Once you have a Connection, you can create a Cursor object
and call its execute() method to perform SQL commands:
c = conn.cursor()
# Create table
c.execute('''create table stocks
(date text, trans text, symbol text,
qty real, price real)''')
# Insert a row of data
c.execute("""insert into stocks
values ('2006-01-05','BUY','RHAT',100,35.14)""")
# Save (commit) the changes
conn.commit()
# We can also close the cursor if we are done with it
c.close()
Usually your SQL operations will need to use values from Python variables. You
shouldn’t assemble your query using Python’s string operations because doing so
is insecure; it makes your program vulnerable to an SQL injection attack.
Instead, use the DB-API’s parameter substitution. Put ? as a placeholder
wherever you want to use a value, and then provide a tuple of values as the
second argument to the cursor’s execute() method. (Other database
modules may use a different placeholder, such as %s or :1.) For
example:
# Never do this -- insecure!
symbol = 'IBM'
c.execute("... where symbol = '%s'" % symbol)
# Do this instead
t = (symbol,)
c.execute('select * from stocks where symbol=?', t)
# Larger example
for t in [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),
]:
c.execute('insert into stocks values (?,?,?,?,?)', t)
To retrieve data after executing a SELECT statement, you can either treat the
cursor as an iterator, call the cursor’s fetchone() method to
retrieve a single matching row, or call fetchall() to get a list of the
matching rows.
This example uses the iterator form:
>>> c = conn.cursor()
>>> c.execute('select * from stocks order by price')
>>> for row in c:
... print(row)
...
('2006-01-05', 'BUY', 'RHAT', 100, 35.14)
('2006-03-28', 'BUY', 'IBM', 1000, 45.0)
('2006-04-06', 'SELL', 'IBM', 500, 53.0)
('2006-04-05', 'BUY', 'MSOFT', 1000, 72.0)
>>>
This constant is meant to be used with the detect_types parameter of the
connect() function.
Setting it makes the sqlite3 module parse the declared type for each
column it returns. It will parse out the first word of the declared type,
i. e. for “integer primary key”, it will parse out “integer”, or for
“number(10)” it will parse out “number”. Then for that column, it will look
into the converters dictionary and use the converter function registered for
that type there.
This constant is meant to be used with the detect_types parameter of the
connect() function.
Setting this makes the SQLite interface parse the column name for each column it
returns. It will look for a string formed [mytype] in there, and then decide
that ‘mytype’ is the type of the column. It will try to find an entry of
‘mytype’ in the converters dictionary and then use the converter function found
there to return the value. The column name found in Cursor.description
is only the first word of the column name, i. e. if you use something like
'as"x[datetime]"' in your SQL, then we will parse out everything until the
first blank for the column name: the column name would simply be “x”.
Opens a connection to the SQLite database file database. You can use
":memory:" to open a database connection to a database that resides in RAM
instead of on disk.
When a database is accessed by multiple connections, and one of the processes
modifies the database, the SQLite database is locked until that transaction is
committed. The timeout parameter specifies how long the connection should wait
for the lock to go away until raising an exception. The default for the timeout
parameter is 5.0 (five seconds).
SQLite natively supports only the types TEXT, INTEGER, FLOAT, BLOB and NULL. If
you want to use other types you must add support for them yourself. The
detect_types parameter and the using custom converters registered with the
module-level register_converter() function allow you to easily do that.
detect_types defaults to 0 (i. e. off, no type detection), you can set it to
any combination of PARSE_DECLTYPES and PARSE_COLNAMES to turn
type detection on.
By default, the sqlite3 module uses its Connection class for the
connect call. You can, however, subclass the Connection class and make
connect() use your class instead by providing your class for the factory
parameter.
The sqlite3 module internally uses a statement cache to avoid SQL parsing
overhead. If you want to explicitly set the number of statements that are cached
for the connection, you can set the cached_statements parameter. The currently
implemented default is to cache 100 statements.
Registers a callable to convert a bytestring from the database into a custom
Python type. The callable will be invoked for all database values that are of
the type typename. Confer the parameter detect_types of the connect()
function for how the type detection works. Note that the case of typename and
the name of the type in your query must match!
Registers a callable to convert the custom Python type type into one of
SQLite’s supported types. The callable callable accepts as single parameter
the Python value, and must return a value of the following types: int,
float, str or bytes.
Returns True if the string sql contains one or more complete SQL
statements terminated by semicolons. It does not verify that the SQL is
syntactically correct, only that there are no unclosed string literals and the
statement is terminated by a semicolon.
This can be used to build a shell for SQLite, as in the following example:
# A minimal SQLite shell for experiments
import sqlite3
con = sqlite3.connect(":memory:")
con.isolation_level = None
cur = con.cursor()
buffer = ""
print("Enter your SQL commands to execute in sqlite3.")
print("Enter a blank line to exit.")
while True:
line = input()
if line == "":
break
buffer += line
if sqlite3.complete_statement(buffer):
try:
buffer = buffer.strip()
cur.execute(buffer)
if buffer.lstrip().upper().startswith("SELECT"):
print(cur.fetchall())
except sqlite3.Error as e:
print("An error occurred:", e.args[0])
buffer = ""
con.close()
By default you will not get any tracebacks in user-defined functions,
aggregates, converters, authorizer callbacks etc. If you want to debug them, you
can call this function with flag as True. Afterwards, you will get tracebacks
from callbacks on sys.stderr. Use False to disable the feature
again.
Get or set the current isolation level. None for autocommit mode or
one of “DEFERRED”, “IMMEDIATE” or “EXCLUSIVE”. See section
Controlling Transactions for a more detailed explanation.
This method commits the current transaction. If you don’t call this method,
anything you did since the last call to commit() is not visible from from
other database connections. If you wonder why you don’t see the data you’ve
written to the database, please check you didn’t forget to call this method.
This closes the database connection. Note that this does not automatically
call commit(). If you just close your database connection without
calling commit() first, your changes will be lost!
This is a nonstandard shortcut that creates an intermediate cursor object by
calling the cursor method, then calls the cursor’s execute method with the parameters given.
This is a nonstandard shortcut that creates an intermediate cursor object by
calling the cursor method, then calls the cursor’s executemany method with the parameters given.
This is a nonstandard shortcut that creates an intermediate cursor object by
calling the cursor method, then calls the cursor’s executescript method with the parameters given.
Creates a user-defined function that you can later use from within SQL
statements under the function name name. num_params is the number of
parameters the function accepts, and func is a Python callable that is called
as the SQL function.
The function can return any of the types supported by SQLite: bytes, str, int,
float and None.
The aggregate class must implement a step method, which accepts the number
of parameters num_params, and a finalize method which will return the
final result of the aggregate.
The finalize method can return any of the types supported by SQLite:
bytes, str, int, float and None.
Example:
import sqlite3
class MySum:
def __init__(self):
self.count = 0
def step(self, value):
self.count += value
def finalize(self):
return self.count
con = sqlite3.connect(":memory:")
con.create_aggregate("mysum", 1, MySum)
cur = con.cursor()
cur.execute("create table test(i)")
cur.execute("insert into test(i) values (1)")
cur.execute("insert into test(i) values (2)")
cur.execute("select mysum(i) from test")
print(cur.fetchone()[0])
Creates a collation with the specified name and callable. The callable will
be passed two string arguments. It should return -1 if the first is ordered
lower than the second, 0 if they are ordered equal and 1 if the first is ordered
higher than the second. Note that this controls sorting (ORDER BY in SQL) so
your comparisons don’t affect other SQL operations.
Note that the callable will get its parameters as Python bytestrings, which will
normally be encoded in UTF-8.
The following example shows a custom collation that sorts “the wrong way”:
import sqlite3
def collate_reverse(string1, string2):
if string1 == string2:
return 0
elif string1 < string2:
return 1
else:
return -1
con = sqlite3.connect(":memory:")
con.create_collation("reverse", collate_reverse)
cur = con.cursor()
cur.execute("create table test(x)")
cur.executemany("insert into test(x) values (?)", [("a",), ("b",)])
cur.execute("select x from test order by x collate reverse")
for row in cur:
print(row)
con.close()
To remove a collation, call create_collation with None as callable:
You can call this method from a different thread to abort any queries that might
be executing on the connection. The query will then abort and the caller will
get an exception.
This routine registers a callback. The callback is invoked for each attempt to
access a column of a table in the database. The callback should return
SQLITE_OK if access is allowed, SQLITE_DENY if the entire SQL
statement should be aborted with an error and SQLITE_IGNORE if the
column should be treated as a NULL value. These constants are available in the
sqlite3 module.
The first argument to the callback signifies what kind of operation is to be
authorized. The second and third argument will be arguments or None
depending on the first argument. The 4th argument is the name of the database
(“main”, “temp”, etc.) if applicable. The 5th argument is the name of the
inner-most trigger or view that is responsible for the access attempt or
None if this access attempt is directly from input SQL code.
Please consult the SQLite documentation about the possible values for the first
argument and the meaning of the second and third argument depending on the first
one. All necessary constants are available in the sqlite3 module.
This routine registers a callback. The callback is invoked for every n
instructions of the SQLite virtual machine. This is useful if you want to
get called from SQLite during long-running operations, for example to update
a GUI.
If you want to clear any previously installed progress handler, call the
method with None for handler.
This routine allows/disallows the SQLite engine to load SQLite extensions
from shared libraries. SQLite extensions can define new functions,
aggregates or whole new virtual table implementations. One well-known
extension is the fulltext-search extension distributed with SQLite.
New in version 3.2:
New in version 3.2.
import sqlite3
con = sqlite3.connect(":memory:")
# enable extension loading
con.enable_load_extension(True)
# Load the fulltext search extension
con.execute("select load_extension('./fts3.so')")
# alternatively you can load the extension using an API call:
# con.load_extension("./fts3.so")
# disable extension laoding again
con.enable_load_extension(False)
# example from SQLite wiki
con.execute("create virtual table recipe using fts3(name, ingredients)")
con.executescript("""
insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');
insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery');
insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour');
insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter');
""")
for row in con.execute("select rowid, name, ingredients from recipe where name match 'pie'"):
print(row)
Loadable extensions are disabled by default. See [1].
This routine loads a SQLite extension from a shared library. You have to
enable extension loading with enable_load_extension() before you can
use this routine.
New in version 3.2:
New in version 3.2.
Loadable extensions are disabled by default. See [1].
You can change this attribute to a callable that accepts the cursor and the
original row as a tuple and will return the real result row. This way, you can
implement more advanced ways of returning results, such as returning an object
that can also access columns by name.
Example:
import sqlite3
def dict_factory(cursor, row):
d = {}
for idx, col in enumerate(cursor.description):
d[col[0]] = row[idx]
return d
con = sqlite3.connect(":memory:")
con.row_factory = dict_factory
cur = con.cursor()
cur.execute("select 1 as a")
print(cur.fetchone()["a"])
If returning a tuple doesn’t suffice and you want name-based access to
columns, you should consider setting row_factory to the
highly-optimized sqlite3.Row type. Row provides both
index-based and case-insensitive name-based access to columns with almost no
memory overhead. It will probably be better than your own custom
dictionary-based approach or even a db_row based solution.
Using this attribute you can control what objects are returned for the TEXT
data type. By default, this attribute is set to str and the
sqlite3 module will return Unicode objects for TEXT. If you want to
return bytestrings instead, you can set it to bytes.
For efficiency reasons, there’s also a way to return str objects
only for non-ASCII data, and bytes otherwise. To activate it, set
this attribute to sqlite3.OptimizedUnicode.
You can also set it to any other callable that accepts a single bytestring
parameter and returns the resulting object.
See the following example code for illustration:
import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
# Create the table
con.execute("create table person(lastname, firstname)")
AUSTRIA = "\xd6sterreich"
# by default, rows are returned as Unicode
cur.execute("select ?", (AUSTRIA,))
row = cur.fetchone()
assert row[0] == AUSTRIA
# but we can make sqlite3 always return bytestrings ...
con.text_factory = str
cur.execute("select ?", (AUSTRIA,))
row = cur.fetchone()
assert type(row[0]) == str
# the bytestrings will be encoded in UTF-8, unless you stored garbage in the
# database ...
assert row[0] == AUSTRIA.encode("utf-8")
# we can also implement a custom text_factory ...
# here we implement one that will ignore Unicode characters that cannot be
# decoded from UTF-8
con.text_factory = lambda x: str(x, "utf-8", "ignore")
cur.execute("select ?", ("this is latin1 and would normally create errors" +
"\xe4\xf6\xfc".encode("latin1"),))
row = cur.fetchone()
assert type(row[0]) == str
# sqlite3 offers a built-in optimized text_factory that will return bytestring
# objects, if the data is in ASCII only, and otherwise return unicode objects
con.text_factory = sqlite3.OptimizedUnicode
cur.execute("select ?", (AUSTRIA,))
row = cur.fetchone()
assert type(row[0]) == str
cur.execute("select ?", ("Germany",))
row = cur.fetchone()
assert type(row[0]) == str
Returns an iterator to dump the database in an SQL text format. Useful when
saving an in-memory database for later restoration. This function provides
the same capabilities as the .dump command in the sqlite3
shell.
Example:
# Convert file existing_db.db to SQL dump file dump.sql
import sqlite3, os
con = sqlite3.connect('existing_db.db')
with open('dump.sql', 'w') as f:
for line in con.iterdump():
f.write('%s\n' % line)
Executes an SQL statement. The SQL statement may be parametrized (i. e.
placeholders instead of SQL literals). The sqlite3 module supports two
kinds of placeholders: question marks (qmark style) and named placeholders
(named style).
This example shows how to use parameters with qmark style:
import sqlite3
con = sqlite3.connect("mydb")
cur = con.cursor()
who = "Yeltsin"
age = 72
cur.execute("select name_last, age from people where name_last=? and age=?", (who, age))
print(cur.fetchone())
This example shows how to use the named style:
import sqlite3
con = sqlite3.connect("mydb")
cur = con.cursor()
who = "Yeltsin"
age = 72
cur.execute("select name_last, age from people where name_last=:who and age=:age",
{"who": who, "age": age})
print(cur.fetchone())
execute() will only execute a single SQL statement. If you try to execute
more than one statement with it, it will raise a Warning. Use
executescript() if you want to execute multiple SQL statements with one
call.
Executes an SQL command against all parameter sequences or mappings found in
the sequence sql. The sqlite3 module also allows using an
iterator yielding parameters instead of a sequence.
import sqlite3
class IterChars:
def __init__(self):
self.count = ord('a')
def __iter__(self):
return self
def __next__(self):
if self.count > ord('z'):
raise StopIteration
self.count += 1
return (chr(self.count - 1),) # this is a 1-tuple
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table characters(c)")
theIter = IterChars()
cur.executemany("insert into characters(c) values (?)", theIter)
cur.execute("select c from characters")
print(cur.fetchall())
import sqlite3
def char_generator():
import string
for c in string.letters[:26]:
yield (c,)
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table characters(c)")
cur.executemany("insert into characters(c) values (?)", char_generator())
cur.execute("select c from characters")
print(cur.fetchall())
This is a nonstandard convenience method for executing multiple SQL statements
at once. It issues a COMMIT statement first, then executes the SQL script it
gets as a parameter.
Fetches the next set of rows of a query result, returning a list. An empty
list is returned when no more rows are available.
The number of rows to fetch per call is specified by the size parameter.
If it is not given, the cursor’s arraysize determines the number of rows
to be fetched. The method should try to fetch as many rows as indicated by
the size parameter. If this is not possible due to the specified number of
rows not being available, fewer rows may be returned.
Note there are performance considerations involved with the size parameter.
For optimal performance, it is usually best to use the arraysize attribute.
If the size parameter is used, then it is best for it to retain the same
value from one fetchmany() call to the next.
Fetches all (remaining) rows of a query result, returning a list. Note that
the cursor’s arraysize attribute can affect the performance of this operation.
An empty list is returned when no rows are available.
Although the Cursor class of the sqlite3 module implements this
attribute, the database engine’s own support for the determination of “rows
affected”/”rows selected” is quirky.
For DELETE statements, SQLite reports rowcount as 0 if you make a
DELETEFROMtable without any condition.
For executemany() statements, the number of modifications are summed up
into rowcount.
As required by the Python DB API Spec, the rowcount attribute “is -1 in
case no executeXX() has been performed on the cursor or the rowcount of the
last operation is not determinable by the interface”.
This includes SELECT statements because we cannot determine the number of
rows a query produced until all rows were fetched.
This read-only attribute provides the rowid of the last modified row. It is
only set if you issued a INSERT statement using the execute()
method. For operations other than INSERT or when executemany() is
called, lastrowid is set to None.
This read-only attribute provides the column names of the last query. To
remain compatible with the Python DB API, it returns a 7-tuple for each
column where the last six items of each tuple are None.
It is set for SELECT statements without any matching rows as well.
The type system of the sqlite3 module is extensible in two ways: you can
store additional Python types in a SQLite database via object adaptation, and
you can let the sqlite3 module convert SQLite types to different Python
types via converters.
Using adapters to store additional Python types in SQLite databases¶
As described before, SQLite supports only a limited set of types natively. To
use other Python types with SQLite, you must adapt them to one of the
sqlite3 module’s supported types for SQLite: one of NoneType, int, float,
str, bytes.
The sqlite3 module uses Python object adaptation, as described in
PEP 246 for this. The protocol to use is PrepareProtocol.
There are two ways to enable the sqlite3 module to adapt a custom Python
type to one of the supported ones.
This is a good approach if you write the class yourself. Let’s suppose you have
a class like this:
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
Now you want to store the point in a single SQLite column. First you’ll have to
choose one of the supported types first to be used for representing the point.
Let’s just use str and separate the coordinates using a semicolon. Then you need
to give your class a method __conform__(self,protocol) which must return
the converted value. The parameter protocol will be PrepareProtocol.
import sqlite3
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
def __conform__(self, protocol):
if protocol is sqlite3.PrepareProtocol:
return "%f;%f" % (self.x, self.y)
con = sqlite3.connect(":memory:")
cur = con.cursor()
p = Point(4.0, -3.2)
cur.execute("select ?", (p,))
print(cur.fetchone()[0])
The other possibility is to create a function that converts the type to the
string representation and register the function with register_adapter().
import sqlite3
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
def adapt_point(point):
return "%f;%f" % (point.x, point.y)
sqlite3.register_adapter(Point, adapt_point)
con = sqlite3.connect(":memory:")
cur = con.cursor()
p = Point(4.0, -3.2)
cur.execute("select ?", (p,))
print(cur.fetchone()[0])
The sqlite3 module has two default adapters for Python’s built-in
datetime.date and datetime.datetime types. Now let’s suppose
we want to store datetime.datetime objects not in ISO representation,
but as a Unix timestamp.
import sqlite3
import datetime
import time
def adapt_datetime(ts):
return time.mktime(ts.timetuple())
sqlite3.register_adapter(datetime.datetime, adapt_datetime)
con = sqlite3.connect(":memory:")
cur = con.cursor()
now = datetime.datetime.now()
cur.execute("select ?", (now,))
print(cur.fetchone()[0])
Writing an adapter lets you send custom Python types to SQLite. But to make it
really useful we need to make the Python to SQLite to Python roundtrip work.
Enter converters.
Let’s go back to the Point class. We stored the x and y coordinates
separated via semicolons as strings in SQLite.
First, we’ll define a converter function that accepts the string as a parameter
and constructs a Point object from it.
Note
Converter functions always get called with a string, no matter under which
data type you sent the value to SQLite.
def convert_point(s):
x, y = map(float, s.split(";"))
return Point(x, y)
Now you need to make the sqlite3 module know that what you select from
the database is actually a point. There are two ways of doing this:
There are default adapters for the date and datetime types in the datetime
module. They will be sent as ISO dates/ISO timestamps to SQLite.
The default converters are registered under the name “date” for
datetime.date and under the name “timestamp” for
datetime.datetime.
This way, you can use date/timestamps from Python without any additional
fiddling in most cases. The format of the adapters is also compatible with the
experimental SQLite date/time functions.
The following example demonstrates this.
import sqlite3
import datetime
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
cur.execute("create table test(d date, ts timestamp)")
today = datetime.date.today()
now = datetime.datetime.now()
cur.execute("insert into test(d, ts) values (?, ?)", (today, now))
cur.execute("select d, ts from test")
row = cur.fetchone()
print(today, "=>", row[0], type(row[0]))
print(now, "=>", row[1], type(row[1]))
cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"')
row = cur.fetchone()
print("current_date", row[0], type(row[0]))
print("current_timestamp", row[1], type(row[1]))
By default, the sqlite3 module opens transactions implicitly before a
Data Modification Language (DML) statement (i.e.
INSERT/UPDATE/DELETE/REPLACE), and commits transactions
implicitly before a non-DML, non-query statement (i. e.
anything other than SELECT or the aforementioned).
So if you are within a transaction and issue a command like CREATETABLE..., VACUUM, PRAGMA, the sqlite3 module will commit implicitly
before executing that command. There are two reasons for doing that. The first
is that some of these commands don’t work within transactions. The other reason
is that sqlite3 needs to keep track of the transaction state (if a transaction
is active or not). The current transaction state is exposed through the
Connection.in_transaction attribute of the connection object.
You can control which kind of BEGIN statements sqlite3 implicitly executes
(or none at all) via the isolation_level parameter to the connect()
call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.
Otherwise leave it at its default, which will result in a plain “BEGIN”
statement, or set it to one of SQLite’s supported isolation levels: “DEFERRED”,
“IMMEDIATE” or “EXCLUSIVE”.
Using the nonstandard execute(), executemany() and
executescript() methods of the Connection object, your code can
be written more concisely because you don’t have to create the (often
superfluous) Cursor objects explicitly. Instead, the Cursor
objects are created implicitly and these shortcut methods return the cursor
objects. This way, you can execute a SELECT statement and iterate over it
directly using only a single call on the Connection object.
import sqlite3
persons = [
("Hugo", "Boss"),
("Calvin", "Klein")
]
con = sqlite3.connect(":memory:")
# Create the table
con.execute("create table person(firstname, lastname)")
# Fill the table
con.executemany("insert into person(firstname, lastname) values (?, ?)", persons)
# Print the table contents
for row in con.execute("select firstname, lastname from person"):
print(row)
# Using a dummy WHERE clause to not let SQLite take the shortcut table deletes.
print("I just deleted", con.execute("delete from person where 1=1").rowcount, "rows")
Connection objects can be used as context managers
that automatically commit or rollback transactions. In the event of an
exception, the transaction is rolled back; otherwise, the transaction is
committed:
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("create table person (id integer primary key, firstname varchar unique)")
# Successful, con.commit() is called automatically afterwards
with con:
con.execute("insert into person(firstname) values (?)", ("Joe",))
# con.rollback() is called after the with block finishes with an exception, the
# exception is still raised and must be catched
try:
with con:
con.execute("insert into person(firstname) values (?)", ("Joe",))
except sqlite3.IntegrityError:
print("couldn't add Joe twice")
Older SQLite versions had issues with sharing connections between threads.
That’s why the Python module disallows sharing connections and cursors between
threads. If you still try to do so, you will get an exception at runtime.
The only exception is calling the interrupt() method, which
only makes sense to call from a different thread.
Footnotes
[1]
(1, 2) The sqlite3 module is not built with loadable extension support by
default, because some platforms (notably Mac OS X) have SQLite
libraries which are compiled without this feature. To get loadable
extension support, you must pass –enable-loadable-sqlite-extensions to
configure.
The modules described in this chapter support data compression with the zlib,
gzip, and bzip2 algorithms, and the creation of ZIP- and tar-format archives.
For applications that require data compression, the functions in this module
allow compression and decompression, using the zlib library. The zlib library
has its own home page at http://www.zlib.net. There are known
incompatibilities between the Python module and versions of the zlib library
earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using
1.1.4 or later.
zlib’s functions have many options and often need to be used in a particular
order. This documentation doesn’t attempt to cover all of the permutations;
consult the zlib manual at http://www.zlib.net/manual.html for authoritative
information.
For reading and writing .gz files see the gzip module. For
other archive formats, see the bz2, zipfile, and
tarfile modules.
The available exception and functions in this module are:
Computes a Adler-32 checksum of data. (An Adler-32 checksum is almost as
reliable as a CRC32 but can be computed much more quickly.) If value is
present, it is used as the starting value of the checksum; otherwise, a fixed
default value is used. This allows computing a running checksum over the
concatenation of several inputs. The algorithm is not cryptographically
strong, and should not be used for authentication or digital signatures. Since
the algorithm is designed for use as a checksum algorithm, it is not suitable
for use as a general hash algorithm.
Always returns an unsigned 32-bit integer.
Note
To generate the same numeric value across all Python versions and
platforms use adler32(data) & 0xffffffff. If you are only using
the checksum in packed binary format this is not necessary as the
return value is the correct 32bit binary representation
regardless of sign.
Compresses the bytes in data, returning a bytes object containing compressed data.
level is an integer from 1 to 9 controlling the level of compression;
1 is fastest and produces the least compression, 9 is slowest and
produces the most. The default value is 6. Raises the error
exception if any error occurs.
Returns a compression object, to be used for compressing data streams that won’t
fit into memory at once. level is an integer from 1 to 9 controlling
the level of compression; 1 is fastest and produces the least compression,
9 is slowest and produces the most. The default value is 6.
Computes a CRC (Cyclic Redundancy Check) checksum of data. If value is
present, it is used as the starting value of the checksum; otherwise, a fixed
default value is used. This allows computing a running checksum over the
concatenation of several inputs. The algorithm is not cryptographically
strong, and should not be used for authentication or digital signatures. Since
the algorithm is designed for use as a checksum algorithm, it is not suitable
for use as a general hash algorithm.
Always returns an unsigned 32-bit integer.
Note
To generate the same numeric value across all Python versions and
platforms use crc32(data) & 0xffffffff. If you are only using
the checksum in packed binary format this is not necessary as the
return value is the correct 32bit binary representation
regardless of sign.
Decompresses the bytes in data, returning a bytes object containing the
uncompressed data. The wbits parameter controls the size of the window
buffer, and is discussed further below.
If bufsize is given, it is used as the initial size of the output
buffer. Raises the error exception if any error occurs.
The absolute value of wbits is the base two logarithm of the size of the
history buffer (the “window size”) used when compressing data. Its absolute
value should be between 8 and 15 for the most recent versions of the zlib
library, larger values resulting in better compression at the expense of greater
memory usage. When decompressing a stream, wbits must not be smaller
than the size originally used to compress the stream; using a too-small
value will result in an exception. The default value is therefore the
highest value, 15. When wbits is negative, the standard
gzip header is suppressed.
bufsize is the initial size of the buffer used to hold decompressed data. If
more space is required, the buffer size will be increased as needed, so you
don’t have to get this value exactly right; tuning it will only save a few calls
to malloc(). The default size is 16384.
Returns a decompression object, to be used for decompressing data streams that
won’t fit into memory at once. The wbits parameter controls the size of the
window buffer.
Compression objects support the following methods:
Compress data, returning a bytes object containing compressed data for at least
part of the data in data. This data should be concatenated to the output
produced by any preceding calls to the compress() method. Some input may
be kept in internal buffers for later processing.
All pending input is processed, and a bytes object containing the remaining compressed
output is returned. mode can be selected from the constants
Z_SYNC_FLUSH, Z_FULL_FLUSH, or Z_FINISH,
defaulting to Z_FINISH. Z_SYNC_FLUSH and
Z_FULL_FLUSH allow compressing further bytestrings of data, while
Z_FINISH finishes the compressed stream and prevents compressing any
more data. After calling flush() with mode set to Z_FINISH,
the compress() method cannot be called again; the only realistic action is
to delete the object.
A bytes object which contains any bytes past the end of the compressed data. That is,
this remains "" until the last byte that contains compression data is
available. If the whole bytestring turned out to contain compressed data, this is
b"", an empty bytes object.
The only way to determine where a bytestring of compressed data ends is by actually
decompressing it. This means that when compressed data is contained part of a
larger file, you can only find the end of it by reading data and feeding it
followed by some non-empty bytestring into a decompression object’s
decompress() method until the unused_data attribute is no longer
empty.
A bytes object that contains any data that was not consumed by the last
decompress() call because it exceeded the limit for the uncompressed data
buffer. This data has not yet been seen by the zlib machinery, so you must feed
it (possibly with further data concatenated to it) back to a subsequent
decompress() method call in order to get correct output.
Decompress data, returning a bytes object containing the uncompressed data
corresponding to at least part of the data in string. This data should be
concatenated to the output produced by any preceding calls to the
decompress() method. Some of the input data may be preserved in internal
buffers for later processing.
If the optional parameter max_length is supplied then the return value will be
no longer than max_length. This may mean that not all of the compressed input
can be processed; and unconsumed data will be stored in the attribute
unconsumed_tail. This bytestring must be passed to a subsequent call to
decompress() if decompression is to continue. If max_length is not
supplied then the whole input is decompressed, and unconsumed_tail is
empty.
All pending input is processed, and a bytes object containing the remaining
uncompressed output is returned. After calling flush(), the
decompress() method cannot be called again; the only realistic action is
to delete the object.
The optional parameter length sets the initial size of the output buffer.
Returns a copy of the decompression object. This can be used to save the state
of the decompressor midway through the data stream in order to speed up random
seeks into the stream at a future point.
This module provides a simple interface to compress and decompress files just
like the GNU programs gzip and gunzip would.
The data compression is provided by the zlib module.
The gzip module provides the GzipFile class. The GzipFile
class reads and writes gzip-format files, automatically compressing
or decompressing the data so that it looks like an ordinary file object.
Note that additional file formats which can be decompressed by the
gzip and gunzip programs, such as those produced by
compress and pack, are not supported by this module.
For other archive formats, see the bz2, zipfile, and
tarfile modules.
The module defines the following items:
class gzip.GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None)¶
Constructor for the GzipFile class, which simulates most of the
methods of a file object, with the exception of the truncate()
method. At least one of fileobj and filename must be given a non-trivial
value.
The new class instance is based on fileobj, which can be a regular file, a
StringIO object, or any other object which simulates a file. It
defaults to None, in which case filename is opened to provide a file
object.
When fileobj is not None, the filename argument is only used to be
included in the gzip file header, which may includes the original
filename of the uncompressed file. It defaults to the filename of fileobj, if
discernible; otherwise, it defaults to the empty string, and in this case the
original filename is not included in the header.
The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w',
or 'wb', depending on whether the file will be read or written. The default
is the mode of fileobj if discernible; otherwise, the default is 'rb'. If
not given, the ‘b’ flag will be added to the mode to ensure the file is opened
in binary mode for cross-platform portability.
The compresslevel argument is an integer from 1 to 9 controlling the
level of compression; 1 is fastest and produces the least compression, and
9 is slowest and produces the most compression. The default is 9.
The mtime argument is an optional numeric timestamp to be written to
the stream when compressing. All gzip compressed streams are
required to contain a timestamp. If omitted or None, the current
time is used. This module ignores the timestamp when decompressing;
however, some programs, such as gunzip, make use of it.
The format of the timestamp is the same as that of the return value of
time.time() and of the st_mtime attribute of the object returned
by os.stat().
Calling a GzipFile object’s close() method does not close
fileobj, since you might wish to append more material after the compressed
data. This also allows you to pass a io.BytesIO object opened for
writing as fileobj, and retrieve the resulting memory buffer using the
io.BytesIO object’s getvalue() method.
GzipFile supports the io.BufferedIOBase interface,
including iteration and the with statement. Only the
read1() and truncate() methods aren’t implemented.
Read n uncompressed bytes without advancing the file position.
At most one single read on the compressed stream is done to satisfy
the call. The number of bytes returned may be more or less than
requested.
New in version 3.2:
New in version 3.2.
Changed in version 3.1:
Changed in version 3.1: Support for the with statement was added.
Changed in version 3.2:
Changed in version 3.2: Support for zero-padded files was added.
Changed in version 3.2:
Changed in version 3.2: Support for unseekable files was added.
This is a shorthand for GzipFile(filename,mode,compresslevel).
The filename argument is required; mode defaults to 'rb' and
compresslevel defaults to 9.
This module provides a comprehensive interface for the bz2 compression library.
It implements a complete file interface, one-shot (de)compression functions, and
types for sequential (de)compression.
Handling of compressed files is offered by the BZ2File class.
class bz2.BZ2File(filename, mode='r', buffering=0, compresslevel=9)¶
Open a bz2 file. Mode can be either 'r' or 'w', for reading (default)
or writing. When opened for writing, the file will be created if it doesn’t
exist, and truncated otherwise. If buffering is given, 0 means
unbuffered, and larger numbers specify the buffer size; the default is
0. If compresslevel is given, it must be a number between 1 and
9; the default is 9. Add a 'U' to mode to open the file for input
with universal newline support. Any line ending in the input file will be
seen as a '\n' in Python. Also, a file so opened gains the attribute
newlines; the value for this attribute is one of None (no newline
read yet), '\r', '\n', '\r\n' or a tuple containing all the
newline types seen. Universal newlines are available only when
reading. Instances support iteration in the same way as normal file
instances.
Close the file. Sets data attribute closed to true. A closed file
cannot be used for further I/O operations. close() may be called
more than once without error.
Return the next line from the file, as a byte string, retaining newline.
A non-negative size argument limits the maximum number of bytes to
return (an incomplete line may be returned then). Return an empty byte
string at EOF.
Move to new file position. Argument offset is a byte count. Optional
argument whence defaults to os.SEEK_SET or 0 (offset from start
of file; offset should be >=0); other values are os.SEEK_CUR or
1 (move relative to current position; offset can be positive or
negative), and os.SEEK_END or 2 (move relative to end of file;
offset is usually negative, although many platforms allow seeking beyond
the end of a file).
Note that seeking of bz2 files is emulated, and depending on the
parameters the operation may be extremely slow.
Write the sequence of byte strings to the file. Note that newlines are not
added. The sequence can be any iterable object producing byte strings.
This is equivalent to calling write() for each byte string.
Create a new compressor object. This object may be used to compress data
sequentially. If you want to compress data in one shot, use the
compress() function instead. The compresslevel parameter, if given,
must be a number between 1 and 9; the default is 9.
Provide more data to the compressor object. It will return chunks of
compressed data whenever possible. When you’ve finished providing data to
compress, call the flush() method to finish the compression process,
and return what is left in internal buffers.
Create a new decompressor object. This object may be used to decompress data
sequentially. If you want to decompress data in one shot, use the
decompress() function instead.
Provide more data to the decompressor object. It will return chunks of
decompressed data whenever possible. If you try to decompress data after
the end of stream is found, EOFError will be raised. If any data
was found after the end of stream, it’ll be ignored and saved in
unused_data attribute.
Compress data in one shot. If you want to compress data sequentially, use
an instance of BZ2Compressor instead. The compresslevel parameter,
if given, must be a number between 1 and 9; the default is 9.
The ZIP file format is a common archive and compression standard. This module
provides tools to create, read, write, append, and list a ZIP file. Any
advanced use of this module will require an understanding of the format, as
defined in PKZIP Application Note.
This module does not currently handle multi-disk ZIP files.
It can handle ZIP files that use the ZIP64 extensions
(that is ZIP files that are more than 4 GByte in size). It supports
decryption of encrypted files in ZIP archives, but it currently cannot
create an encrypted file. Decryption is extremely slow as it is
implemented in native Python rather than C.
For other archive formats, see the bz2, gzip, and
tarfile modules.
The error raised when a ZIP file would require ZIP64 functionality but that has
not been enabled.
class zipfile.ZipFile
The class for reading and writing ZIP files. See section
ZipFile Objects for constructor details.
class zipfile.PyZipFile
Class for creating ZIP archives containing Python libraries.
class zipfile.ZipInfo(filename='NoName', date_time=(1980, 1, 1, 0, 0, 0))¶
Class used to represent information about a member of an archive. Instances
of this class are returned by the getinfo() and infolist()
methods of ZipFile objects. Most users of the zipfile module
will not need to create these, but only use those created by this
module. filename should be the full name of the archive member, and
date_time should be a tuple containing six fields which describe the time
of the last modification to the file; the fields are described in section
ZipInfo Objects.
class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=False)¶
Open a ZIP file, where file can be either a path to a file (a string) or a
file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file, or 'a' to append to an
existing file. If mode is 'a' and file refers to an existing ZIP
file, then additional files are added to it. If file does not refer to a
ZIP file, then a new ZIP archive is appended to the file. This is meant for
adding a ZIP archive to another file (such as python.exe). If
mode is a and the file does not exist at all, it is created.
compression is the ZIP compression method to use when writing the archive,
and should be ZIP_STORED or ZIP_DEFLATED; unrecognized
values will cause RuntimeError to be raised. If ZIP_DEFLATED
is specified but the zlib module is not available, RuntimeError
is also raised. The default is ZIP_STORED. If allowZip64 is
True zipfile will create ZIP files that use the ZIP64 extensions when
the zipfile is larger than 2 GB. If it is false (the default) zipfile
will raise an exception when the ZIP file would require ZIP64 extensions.
ZIP64 extensions are disabled by default because the default zip
and unzip commands on Unix (the InfoZIP utilities) don’t support
these extensions.
If the file is created with mode 'a' or 'w' and then
close()d without adding any files to the archive, the appropriate
ZIP structures for an empty archive will be written to the file.
ZipFile is also a context manager and therefore supports the
with statement. In the example, myzip is closed after the
with statement’s suite is finished—even if an exception occurs:
with ZipFile('spam.zip', 'w') as myzip:
myzip.write('eggs.txt')
New in version 3.2:
New in version 3.2: Added the ability to use ZipFile as a context manager.
Return a ZipInfo object with information about the archive member
name. Calling getinfo() for a name not currently contained in the
archive will raise a KeyError.
Return a list containing a ZipInfo object for each member of the
archive. The objects are in the same order as their entries in the actual ZIP
file on disk if an existing archive was opened.
Extract a member from the archive as a file-like object (ZipExtFile). name is
the name of the file in the archive, or a ZipInfo object. The mode
parameter, if included, must be one of the following: 'r' (the default),
'U', or 'rU'. Choosing 'U' or 'rU' will enable universal newline
support in the read-only object. pwd is the password used for encrypted files.
Calling open() on a closed ZipFile will raise a RuntimeError.
Note
The file-like object is read-only and provides the following methods:
read(), readline(), readlines(), __iter__(),
__next__().
Note
If the ZipFile was created by passing in a file-like object as the first
argument to the constructor, then the object returned by open() shares the
ZipFile’s file pointer. Under these circumstances, the object returned by
open() should not be used after any additional operations are performed
on the ZipFile object. If the ZipFile was created by passing in a string (the
filename) as the first argument to the constructor, then open() will
create a new file object that will be held by the ZipExtFile, allowing it to
operate independently of the ZipFile.
Note
The open(), read() and extract() methods can take a filename
or a ZipInfo object. You will appreciate this when trying to read a
ZIP file that contains members with duplicate names.
Extract a member from the archive to the current working directory; member
must be its full name or a ZipInfo object). Its file information is
extracted as accurately as possible. path specifies a different directory
to extract to. member can be a filename or a ZipInfo object.
pwd is the password used for encrypted files.
Extract all members from the archive to the current working directory. path
specifies a different directory to extract to. members is optional and must
be a subset of the list returned by namelist(). pwd is the password
used for encrypted files.
Warning
Never extract archives from untrusted sources without prior inspection.
It is possible that files are created outside of path, e.g. members
that have absolute filenames starting with "/" or filenames with two
dots "..".
Return the bytes of the file name in the archive. name is the name of the
file in the archive, or a ZipInfo object. The archive must be open for
read or append. pwd is the password used for encrypted files and, if specified,
it will override the default password set with setpassword(). Calling
read() on a closed ZipFile will raise a RuntimeError.
Read all the files in the archive and check their CRC’s and file headers.
Return the name of the first bad file, or else return None. Calling
testzip() on a closed ZipFile will raise a RuntimeError.
Write the file named filename to the archive, giving it the archive name
arcname (by default, this will be the same as filename, but without a drive
letter and with leading path separators removed). If given, compress_type
overrides the value given for the compression parameter to the constructor for
the new entry. The archive must be open with mode 'w' or 'a' – calling
write() on a ZipFile created with mode 'r' will raise a
RuntimeError. Calling write() on a closed ZipFile will raise a
RuntimeError.
Note
There is no official file name encoding for ZIP files. If you have unicode file
names, you must convert them to byte strings in your desired encoding before
passing them to write(). WinZip interprets all file names as encoded in
CP437, also known as DOS Latin.
Note
Archive names should be relative to the archive root, that is, they should not
start with a path separator.
Note
If arcname (or filename, if arcname is not given) contains a null
byte, the name of the file in the archive will be truncated at the null byte.
Write the string bytes to the archive; zinfo_or_arcname is either the file
name it will be given in the archive, or a ZipInfo instance. If it’s
an instance, at least the filename, date, and time must be given. If it’s a
name, the date and time is set to the current date and time. The archive must be
opened with mode 'w' or 'a' – calling writestr() on a ZipFile
created with mode 'r' will raise a RuntimeError. Calling
writestr() on a closed ZipFile will raise a RuntimeError.
If given, compress_type overrides the value given for the compression
parameter to the constructor for the new entry, or in the zinfo_or_arcname
(if that is a ZipInfo instance).
Note
When passing a ZipInfo instance as the zinfo_or_arcname parameter,
the compression method used will be that specified in the compress_type
member of the given ZipInfo instance. By default, the
ZipInfo constructor sets this member to ZIP_STORED.
Changed in version 3.2:
Changed in version 3.2: The compression_type argument.
The level of debug output to use. This may be set from 0 (the default, no
output) to 3 (the most output). Debugging information is written to
sys.stdout.
The comment text associated with the ZIP file. If assigning a comment to a
ZipFile instance created with mode ‘a’ or ‘w’, this should be a
string no longer than 65535 bytes. Comments longer than this will be
truncated in the written archive when ZipFile.close() is called.
Search for files *.py and add the corresponding file to the
archive.
If the optimize parameter to PyZipFile was not given or -1,
the corresponding file is a *.pyo file if available, else a
*.pyc file, compiling if necessary.
If the optimize parameter to PyZipFile was 0, 1 or
2, only files with that optimization level (see compile()) are
added to the archive, compiling if necessary.
If the pathname is a file, the filename must end with .py, and
just the (corresponding *.py[co]) file is added at the top level
(no path information). If the pathname is a file that does not end with
.py, a RuntimeError will be raised. If it is a directory,
and the directory is not a package directory, then all the files
*.py[co] are added at the top level. If the directory is a
package directory, then all *.py[co] are added under the package
name as a file path, and if any subdirectories are package directories,
all of these are added recursively. basename is intended for internal
use only. The writepy() method makes archives with file names like
this:
Instances of the ZipInfo class are returned by the getinfo() and
infolist() methods of ZipFile objects. Each object stores
information about a single member of the ZIP archive.
The tarfile module makes it possible to read and write tar
archives, including those using gzip or bz2 compression.
(.zip files can be read and written using the zipfile module.)
Some facts and figures:
reads and writes gzip and bz2 compressed archives.
read/write support for the POSIX.1-1988 (ustar) format.
read/write support for the GNU tar format including longname and longlink
extensions, read-only support for all variants of the sparse extension
including restoration of sparse files.
read/write support for the POSIX.1-2001 (pax) format.
handles directories, regular files, hardlinks, symbolic links, fifos,
character devices and block devices and is able to acquire and restore file
information like timestamp, access permissions and owner.
Return a TarFile object for the pathname name. For detailed
information on TarFile objects and the keyword arguments that are
allowed, see TarFile Objects.
mode has to be a string of the form 'filemode[:compression]', it defaults
to 'r'. Here is a full list of mode combinations:
mode
action
'r'or'r:*'
Open for reading with transparent
compression (recommended).
'r:'
Open for reading exclusively without
compression.
'r:gz'
Open for reading with gzip compression.
'r:bz2'
Open for reading with bzip2 compression.
'a'or'a:'
Open for appending with no compression. The
file is created if it does not exist.
'w'or'w:'
Open for uncompressed writing.
'w:gz'
Open for gzip compressed writing.
'w:bz2'
Open for bzip2 compressed writing.
Note that 'a:gz' or 'a:bz2' is not possible. If mode is not suitable
to open a certain (compressed) file for reading, ReadError is raised. Use
mode'r' to avoid this. If a compression method is not supported,
CompressionError is raised.
If fileobj is specified, it is used as an alternative to a file object
opened in binary mode for name. It is supposed to be at position 0.
For special purposes, there is a second format for mode:
'filemode|[compression]'. tarfile.open() will return a TarFile
object that processes its data as a stream of blocks. No random seeking will
be done on the file. If given, fileobj may be any object that has a
read() or write() method (depending on the mode). bufsize
specifies the blocksize and defaults to 20*512 bytes. Use this variant
in combination with e.g. sys.stdin, a socket file object or a tape
device. However, such a TarFile object is limited in that it does
not allow to be accessed randomly, see Examples. The currently
possible modes:
Mode
Action
'r|*'
Open a stream of tar blocks for reading
with transparent compression.
'r|'
Open a stream of uncompressed tar blocks
for reading.
The TarFile object provides an interface to a tar archive. A tar
archive is a sequence of blocks. An archive member (a stored file) is made up of
a header block followed by data blocks. It is possible to store a file in a tar
archive several times. Each archive member is represented by a TarInfo
object, see TarInfo Objects for details.
A TarFile object can be used as a context manager in a with
statement. It will automatically be closed when the block is completed. Please
note that in the event of an exception an archive opened for writing will not
be finalized; only the internally used file object will be closed. See the
Examples section for a use case.
New in version 3.2:
New in version 3.2: Added support for the context manager protocol.
All following arguments are optional and can be accessed as instance attributes
as well.
name is the pathname of the archive. It can be omitted if fileobj is given.
In this case, the file object’s name attribute is used if it exists.
mode is either 'r' to read from an existing archive, 'a' to append
data to an existing file or 'w' to create a new file overwriting an existing
one.
If fileobj is given, it is used for reading or writing data. If it can be
determined, mode is overridden by fileobj‘s mode. fileobj will be used
from position 0.
format controls the archive format. It must be one of the constants
USTAR_FORMAT, GNU_FORMAT or PAX_FORMAT that are
defined at module level.
The tarinfo argument can be used to replace the default TarInfo class
with a different one.
If dereference is False, add symbolic and hard links to the archive. If it
is True, add the content of the target files to the archive. This has no
effect on systems that do not support symbolic links.
If ignore_zeros is False, treat an empty block as the end of the archive.
If it is True, skip empty (and invalid) blocks and try to get as many members
as possible. This is only useful for reading concatenated or damaged archives.
debug can be set from 0 (no debug messages) up to 3 (all debug
messages). The messages are written to sys.stderr.
If errorlevel is 0, all errors are ignored when using TarFile.extract().
Nevertheless, they appear as error messages in the debug output, when debugging
is enabled. If 1, all fatal errors are raised as OSError or
IOError exceptions. If 2, all non-fatal errors are raised as
TarError exceptions as well.
The encoding and errors arguments define the character encoding to be
used for reading or writing the archive and how conversion errors are going
to be handled. The default settings will work for most users.
See section Unicode issues for in-depth information.
Changed in version 3.2:
Changed in version 3.2: Use 'surrogateescape' as the default for the errors argument.
The pax_headers argument is an optional dictionary of strings which
will be added as a pax global header if format is PAX_FORMAT.
Print a table of contents to sys.stdout. If verbose is False,
only the names of the members are printed. If it is True, output
similar to that of ls -l is produced.
Extract all members from the archive to the current working directory or
directory path. If optional members is given, it must be a subset of the
list returned by getmembers(). Directory information like owner,
modification time and permissions are set after all members have been extracted.
This is done to work around two problems: A directory’s modification time is
reset each time a file is created in it. And, if a directory’s permissions do
not allow writing, extracting files to it will fail.
Warning
Never extract archives from untrusted sources without prior inspection.
It is possible that files are created outside of path, e.g. members
that have absolute filenames starting with "/" or filenames with two
dots "..".
Extract a member from the archive to the current working directory, using its
full name. Its file information is extracted as accurately as possible. member
may be a filename or a TarInfo object. You can specify a different
directory using path. File attributes (owner, mtime, mode) are set unless
set_attrs is False.
Note
The extract() method does not take care of several extraction issues.
In most cases you should consider using the extractall() method.
Extract a member from the archive as a file object. member may be a filename
or a TarInfo object. If member is a regular file, a file-like
object is returned. If member is a link, a file-like object is constructed from
the link’s target. If member is none of the above, None is returned.
Note
The file-like object is read-only. It provides the methods
read(), readline(), readlines(), seek(), tell(),
and close(), and also supports iteration over its lines.
Add the file name to the archive. name may be any type of file
(directory, fifo, symbolic link, etc.). If given, arcname specifies an
alternative name for the file in the archive. Directories are added
recursively by default. This can be avoided by setting recursive to
False. If exclude is given, it must be a function that takes one
filename argument and returns a boolean value. Depending on this value the
respective file is either excluded (True) or added
(False). If filter is specified it must be a keyword argument. It
should be a function that takes a TarInfo object argument and
returns the changed TarInfo object. If it instead returns
None the TarInfo object will be excluded from the
archive. See Examples for an example.
Changed in version 3.2:
Changed in version 3.2: Added the filter parameter.
Deprecated since version 3.2:
Deprecated since version 3.2: The exclude parameter is deprecated, please use the filter parameter
instead.
Add the TarInfo object tarinfo to the archive. If fileobj is given,
tarinfo.size bytes are read from it and added to the archive. You can
create TarInfo objects using gettarinfo().
Note
On Windows platforms, fileobj should always be opened with mode 'rb' to
avoid irritation about the file size.
Create a TarInfo object for either the file name or the file
objectfileobj (using os.fstat() on its file descriptor). You can modify
some of the TarInfo‘s attributes before you add it using addfile().
If given, arcname specifies an alternative name for the file in the archive.
A TarInfo object represents one member in a TarFile. Aside
from storing all required attributes of a file (like file type, size, time,
permissions, owner etc.), it provides some useful methods to determine its type.
It does not contain the file’s data itself.
TarInfo objects are returned by TarFile‘s methods
getmember(), getmembers() and gettarinfo().
File type. type is usually one of these constants: REGTYPE,
AREGTYPE, LNKTYPE, SYMTYPE, DIRTYPE,
FIFOTYPE, CONTTYPE, CHRTYPE, BLKTYPE,
GNUTYPE_SPARSE. To determine the type of a TarInfo object
more conveniently, use the is_*() methods below.
How to extract an entire tar archive to the current working directory:
import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()
How to extract a subset of a tar archive with TarFile.extractall() using
a generator function instead of a list:
import os
import tarfile
def py_files(members):
for tarinfo in members:
if os.path.splitext(tarinfo.name)[1] == ".py":
yield tarinfo
tar = tarfile.open("sample.tar.gz")
tar.extractall(members=py_files(tar))
tar.close()
How to create an uncompressed tar archive from a list of filenames:
import tarfile
tar = tarfile.open("sample.tar", "w")
for name in ["foo", "bar", "quux"]:
tar.add(name)
tar.close()
There are three tar formats that can be created with the tarfile module:
The POSIX.1-1988 ustar format (USTAR_FORMAT). It supports filenames
up to a length of at best 256 characters and linknames up to 100 characters. The
maximum file size is 8 gigabytes. This is an old and limited but widely
supported format.
The GNU tar format (GNU_FORMAT). It supports long filenames and
linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
standard on GNU/Linux systems. tarfile fully supports the GNU tar
extensions for long names, sparse file support is read-only.
The POSIX.1-2001 pax format (PAX_FORMAT). It is the most flexible
format with virtually no limits. It supports long filenames and linknames, large
files and stores pathnames in a portable way. However, not all tar
implementations today are able to handle pax archives properly.
The pax format is an extension to the existing ustar format. It uses extra
headers for information that cannot be stored otherwise. There are two flavours
of pax headers: Extended headers only affect the subsequent file header, global
headers are valid for the complete archive and affect all following files. All
the data in a pax header is encoded in UTF-8 for portability reasons.
There are some more variants of the tar format which can be read, but not
created:
The ancient V7 format. This is the first tar format from Unix Seventh Edition,
storing only regular files and directories. Names must not be longer than 100
characters, there is no user/group name information. Some archives have
miscalculated header checksums in case of fields with non-ASCII characters.
The SunOS tar extended format. This format is a variant of the POSIX.1-2001
pax format, but is not compatible.
The tar format was originally conceived to make backups on tape drives with the
main focus on preserving file system information. Nowadays tar archives are
commonly used for file distribution and exchanging archives over networks. One
problem of the original format (which is the basis of all other formats) is
that there is no concept of supporting different character encodings. For
example, an ordinary tar archive created on a UTF-8 system cannot be read
correctly on a Latin-1 system if it contains non-ASCII characters. Textual
metadata (like filenames, linknames, user/group names) will appear damaged.
Unfortunately, there is no way to autodetect the encoding of an archive. The
pax format was designed to solve this problem. It stores non-ASCII metadata
using the universal character encoding UTF-8.
The details of character conversion in tarfile are controlled by the
encoding and errors keyword arguments of the TarFile class.
encoding defines the character encoding to use for the metadata in the
archive. The default value is sys.getfilesystemencoding() or 'ascii'
as a fallback. Depending on whether the archive is read or written, the
metadata must be either decoded or encoded. If encoding is not set
appropriately, this conversion may fail.
In case of PAX_FORMAT archives, encoding is generally not needed
because all the metadata is stored using UTF-8. encoding is only used in
the rare cases when binary pax headers are decoded or when strings with
surrogate characters are stored.
The so-called CSV (Comma Separated Values) format is the most common import and
export format for spreadsheets and databases. There is no “CSV standard”, so
the format is operationally defined by the many applications which read and
write it. The lack of a standard means that subtle differences often exist in
the data produced and consumed by different applications. These differences can
make it annoying to process CSV files from multiple sources. Still, while the
delimiters and quoting characters vary, the overall format is similar enough
that it is possible to write a single module which can efficiently manipulate
such data, hiding the details of reading and writing the data from the
programmer.
The csv module implements classes to read and write tabular data in CSV
format. It allows programmers to say, “write this data in the format preferred
by Excel,” or “read data from this file which was generated by Excel,” without
knowing the precise details of the CSV format used by Excel. Programmers can
also describe the CSV formats understood by other applications or define their
own special-purpose CSV formats.
The csv module’s reader and writer objects read and
write sequences. Programmers can also read and write data in dictionary form
using the DictReader and DictWriter classes.
Return a reader object which will iterate over lines in the given csvfile.
csvfile can be any object which supports the iterator protocol and returns a
string each time its __next__() method is called — file objects and list objects are both suitable. If csvfile is a file object,
it should be opened with newline=''. [1] An optional
dialect parameter can be given which is used to define a set of parameters
specific to a particular CSV dialect. It may be an instance of a subclass of
the Dialect class or one of the strings returned by the
list_dialects() function. The other optional fmtparams keyword arguments
can be given to override individual formatting parameters in the current
dialect. For full details about the dialect and formatting parameters, see
section Dialects and Formatting Parameters.
Each row read from the csv file is returned as a list of strings. No
automatic data type conversion is performed unless the QUOTE_NONNUMERIC format
option is specified (in which case unquoted fields are transformed into floats).
Return a writer object responsible for converting the user’s data into delimited
strings on the given file-like object. csvfile can be any object with a
write() method. If csvfile is a file object, it should be opened with
newline=''[1]. An optional dialect
parameter can be given which is used to define a set of parameters specific to a
particular CSV dialect. It may be an instance of a subclass of the
Dialect class or one of the strings returned by the
list_dialects() function. The other optional fmtparams keyword arguments
can be given to override individual formatting parameters in the current
dialect. For full details about the dialect and formatting parameters, see
section Dialects and Formatting Parameters. To make it
as easy as possible to interface with modules which implement the DB API, the
value None is written as the empty string. While this isn’t a
reversible transformation, it makes it easier to dump SQL NULL data values to
CSV files without preprocessing the data returned from a cursor.fetch* call.
All other non-string data are stringified with str() before being written.
Associate dialect with name. name must be a string. The
dialect can be specified either by passing a sub-class of Dialect, or
by fmtparams keyword arguments, or both, with keyword arguments overriding
parameters of the dialect. For full details about the dialect and formatting
parameters, see section Dialects and Formatting Parameters.
class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)¶
Create an object which operates like a regular reader but maps the information
read into a dict whose keys are given by the optional fieldnames parameter.
If the fieldnames parameter is omitted, the values in the first row of the
csvfile will be used as the fieldnames. If the row read has more fields
than the fieldnames sequence, the remaining data is added as a sequence
keyed by the value of restkey. If the row read has fewer fields than the
fieldnames sequence, the remaining keys take the value of the optional
restval parameter. Any other optional or keyword arguments are passed to
the underlying reader instance.
class csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)¶
Create an object which operates like a regular writer but maps dictionaries onto
output rows. The fieldnames parameter identifies the order in which values in
the dictionary passed to the writerow() method are written to the
csvfile. The optional restval parameter specifies the value to be written
if the dictionary is missing a key in fieldnames. If the dictionary passed to
the writerow() method contains a key not found in fieldnames, the
optional extrasaction parameter indicates what action to take. If it is set
to 'raise' a ValueError is raised. If it is set to 'ignore',
extra values in the dictionary are ignored. Any other optional or keyword
arguments are passed to the underlying writer instance.
Note that unlike the DictReader class, the fieldnames parameter of
the DictWriter is not optional. Since Python’s dict objects
are not ordered, there is not enough information available to deduce the order
in which the row should be written to the csvfile.
The Dialect class is a container class relied on primarily for its
attributes, which are used to define the parameters for a specific
reader or writer instance.
The unix_dialect class defines the usual properties of a CSV file
generated on UNIX systems, i.e. using '\n' as line terminator and quoting
all fields. It is registered with the dialect name 'unix'.
Analyze the given sample and return a Dialect subclass
reflecting the parameters found. If the optional delimiters parameter
is given, it is interpreted as a string containing possible valid
delimiter characters.
Instructs writer objects to only quote those fields which contain
special characters such as delimiter, quotechar or any of the characters in
lineterminator.
Instructs writer objects to never quote fields. When the current
delimiter occurs in output data it is preceded by the current escapechar
character. If escapechar is not set, the writer will raise Error if
any characters that require escaping are encountered.
Instructs reader to perform no special processing of quote characters.
To make it easier to specify the format of input and output records, specific
formatting parameters are grouped together into dialects. A dialect is a
subclass of the Dialect class having a set of specific methods and a
single validate() method. When creating reader or
writer objects, the programmer can specify a string or a subclass of
the Dialect class as the dialect parameter. In addition to, or instead
of, the dialect parameter, the programmer can also specify individual
formatting parameters, which have the same names as the attributes defined below
for the Dialect class.
Controls how instances of quotechar appearing inside a field should be
themselves be quoted. When True, the character is doubled. When
False, the escapechar is used as a prefix to the quotechar. It
defaults to True.
On output, if doublequote is False and no escapechar is set,
Error is raised if a quotechar is found in a field.
A one-character string used by the writer to escape the delimiter if quoting
is set to QUOTE_NONE and the quotechar if doublequote is
False. On reading, the escapechar removes any special meaning from
the following character. It defaults to None, which disables escaping.
A one-character string used to quote fields containing special characters, such
as the delimiter or quotechar, or which contain new-line characters. It
defaults to '"'.
Controls when quotes should be generated by the writer and recognised by the
reader. It can take on any of the QUOTE_* constants (see section
Module Contents) and defaults to QUOTE_MINIMAL.
Writer objects (DictWriter instances and objects returned by
the writer() function) have the following public methods. A row must be
a sequence of strings or numbers for Writer objects and a dictionary
mapping fieldnames to strings or numbers (by passing them through str()
first) for DictWriter objects. Note that complex numbers are written
out surrounded by parens. This may cause some problems for other programs which
read CSV files (assuming they support complex numbers at all).
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
Reading a file with an alternate format:
import csv
with open('passwd', newline='') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print(row)
The corresponding simplest possible writing example is:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Since open() is used to open a CSV file for reading, the file
will by default be decoded into unicode using the system default
encoding (see locale.getpreferredencoding()). To decode a file
using a different encoding, use the encoding argument of open:
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
The same applies to writing in something other than the system default
encoding: specify the encoding argument when opening the output file.
Registering a new dialect:
import csv
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
with open('passwd', newline='') as f:
reader = csv.reader(f, 'unixpwd')
A slightly more advanced use of the reader — catching and reporting errors:
import csv, sys
filename = 'some.csv'
with open(filename, newline='') as f:
reader = csv.reader(f)
try:
for row in reader:
print(row)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
And while the module doesn’t directly support parsing strings, it can easily be
done:
import csv
for row in csv.reader(['one,two,three']):
print(row)
Footnotes
[1]
(1, 2) If newline='' is not specified, newlines embedded inside quoted fields
will not be interpreted correctly, and on platforms that use \r\n linendings
on write an extra \r will be added. It should always be safe to specify
newline='', since the csv module does its own (universal) newline handling.
This module provides the ConfigParser class which implements a basic
configuration language which provides a structure similar to what’s found in
Microsoft Windows INI files. You can use this to write Python programs which
can be customized by end users easily.
Note
This library does not interpret or write the value-type prefixes used in
the Windows Registry extended version of INI syntax.
The structure of INI files is described in the following section. Essentially, the file
consists of sections, each of which contains keys with values.
configparser classes can read and write such files. Let’s start by
creating the above configuration file programatically.
As you can see, we can treat a config parser much like a dictionary.
There are differences, outlined later, but
the behavior is very close to what you would expect from a dictionary.
Now that we have created and saved a configuration file, let’s read it
back and explore the data it holds.
As we can see above, the API is pretty straightforward. The only bit of magic
involves the DEFAULT section which provides default values for all other
sections [1]. Note also that keys in sections are
case-insensitive and stored in lowercase [1].
Config parsers do not guess datatypes of values in configuration files, always
storing them internally as strings. This means that if you need other
datatypes, you should convert on your own:
Extracting Boolean values is not that simple, though. Passing the value
to bool() would do no good since bool('False') is still
True. This is why config parsers also provide getboolean().
This method is case-insensitive and recognizes Boolean values from
'yes'/'no', 'on'/'off' and '1'/'0'[1].
For example:
Apart from getboolean(), config parsers also provide equivalent
getint() and getfloat() methods, but these are far less
useful since conversion using int() and float() is
sufficient for these types.
Please note that default values have precedence over fallback values.
For instance, in our example the 'CompressionLevel' key was
specified only in the 'DEFAULT' section. If we try to get it from
the section 'topsecret.server.com', we will always get the default,
even if we specify a fallback:
>>> topsecret.get('CompressionLevel', '3')
'9'
One more thing to be aware of is that the parser-level get() method
provides a custom, more complex interface, maintained for backwards
compatibility. When using this method, a fallback value can be provided via
the fallback keyword-only argument:
>>> config.get('bitbucket.org', 'monster',
... fallback='No such things as monsters')
'No such things as monsters'
The same fallback argument can be used with the getint(),
getfloat() and getboolean() methods, for example:
A configuration file consists of sections, each led by a [section] header,
followed by key/value entries separated by a specific string (= or : by
default [1]). By default, section names are case sensitive but keys are not
[1]. Leading and trailing whitespace is removed from keys and values.
Values can be omitted, in which case the key/value delimiter may also be left
out. Values can also span multiple lines, as long as they are indented deeper
than the first line of the value. Depending on the parser’s mode, blank lines
may be treated as parts of multiline values or ignored.
Configuration files may include comments, prefixed by specific
characters (# and ; by default [1]). Comments may appear on
their own on an otherwise empty line, possibly indented. [1]
For example:
[Simple Values]
key=value
spaces in keys=allowed
spaces in values=allowed as well
spaces around the delimiter = obviously
you can also use : to delimit keys from values
[All Values Are Strings]
values like this: 1000000
or this: 3.14159265359
are they treated as numbers? : no
integers, floats and booleans are held as: strings
can use the API to get converted values directly: true
[Multiline Values]
chorus: I'm a lumberjack, and I'm okay
I sleep all night and I work all day
[No Values]
key_without_value
empty string value here =
[You can use comments]
# like this
; or this
# By default only in an empty line.
# Inline comments can be harmful because they prevent users
# from using the delimiting characters as parts of values.
# That being said, this can be customized.
[Sections Can Be Indented]
can_values_be_as_well = True
does_that_mean_anything_special = False
purpose = formatting for readability
multiline_values = are
handled just fine as
long as they are indented
deeper than the first line
of a value
# Did I mention we can indent comments, too?
The default implementation used by ConfigParser. It enables
values to contain format strings which refer to other values in the same
section, or values in the special default section [1]. Additional default
values can be provided on initialization.
In the example above, ConfigParser with interpolation set to
BasicInterpolation() would resolve %(home_dir)s to the value of
home_dir (/Users in this case). %(my_dir)s in effect would
resolve to /Users/lumberjack. All interpolations are done on demand so
keys used in the chain of references do not have to be specified in any
specific order in the configuration file.
With interpolation set to None, the parser would simply return
%(my_dir)s/Pictures as the value of my_pictures and
%(home_dir)s/lumberjack as the value of my_dir.
An alternative handler for interpolation which implements a more advanced
syntax, used for instance in zc.buildout. Extended interpolation is
using ${section:option} to denote a value from a foreign section.
Interpolation can span multiple levels. For convenience, if the section:
part is omitted, interpolation defaults to the current section (and possibly
the default values from the special section).
For example, the configuration specified above with basic interpolation,
would look like this with extended interpolation:
Mapping protocol access is a generic name for functionality that enables using
custom objects as if they were dictionaries. In case of configparser,
the mapping interface implementation is using the
parser['section']['option'] notation.
parser['section'] in particular returns a proxy for the section’s data in
the parser. This means that the values are not copied but they are taken from
the original parser on demand. What’s even more important is that when values
are changed on a section proxy, they are actually mutated in the original
parser.
configparser objects behave as close to actual dictionaries as possible.
The mapping interface is complete and adheres to the MutableMapping ABC.
However, there are a few differences that should be taken into account:
By default, all keys in sections are accessible in a case-insensitive manner
[1]. E.g. foroptioninparser["section"] yields only optionxform‘ed
option key names. This means lowercased keys by default. At the same time,
for a section that holds the key 'a', both expressions return True:
"a" in parser["section"]
"A" in parser["section"]
All sections include DEFAULTSECT values as well which means that
.clear() on a section may not leave the section visibly empty. This is
because default values cannot be deleted from the section (because technically
they are not there). If they are overriden in the section, deleting causes
the default value to be visible again. Trying to delete a default value
causes a KeyError.
Trying to delete the DEFAULTSECT raises ValueError.
parser.get(section,option,**kwargs) - the second argument is not
a fallback value. Note however that the section-level get() methods are
compatible both with the mapping protocol and the classic configparser API.
parser.items() is compatible with the mapping protocol (returns a list of
section_name, section_proxy pairs including the DEFAULTSECT). However,
this method can also be invoked with arguments: parser.items(section,raw,vars). The latter call returns a list of option, value pairs for
a specified section, with all interpolations expanded (unless
raw=True is provided).
The mapping protocol is implemented on top of the existing legacy API so that
subclasses overriding the original interface still should have mappings working
as expected.
There are nearly as many INI format variants as there are applications using it.
configparser goes a long way to provide support for the largest sensible
set of INI styles available. The default functionality is mainly dictated by
historical background and it’s very likely that you will want to customize some
of the features.
The most common way to change the way a specific config parser works is to use
the __init__() options:
defaults, default value: None
This option accepts a dictionary of key-value pairs which will be initially
put in the DEFAULT section. This makes for an elegant way to support
concise configuration files that don’t specify values which are the same as
the documented default.
Hint: if you want to specify default values for a specific section, use
read_dict() before you read the actual file.
This option has a major impact on how the mapping protocol will behave and how
the written configuration files look. With the default ordered
dictionary, every section is stored in the order they were added to the
parser. Same goes for options within sections.
An alternative dictionary type can be used for example to sort sections and
options on write-back. You can also use a regular dictionary for performance
reasons.
Please note: there are ways to add a set of key-value pairs in a single
operation. When you use a regular dictionary in those operations, the order
of the keys may be random. For example:
Some configuration files are known to include settings without values, but
which otherwise conform to the syntax supported by configparser. The
allow_no_value parameter to the constructor can be used to
indicate that such values should be accepted:
>>> import configparser
>>> sample_config = """
... [mysqld]
... user = mysql
... pid-file = /var/run/mysqld/mysqld.pid
... skip-external-locking
... old_passwords = 1
... skip-bdb
... # we don't need ACID today
... skip-innodb
... """
>>> config = configparser.ConfigParser(allow_no_value=True)
>>> config.read_string(sample_config)
>>> # Settings with values are treated as before:
>>> config["mysqld"]["user"]
'mysql'
>>> # Settings without values provide None:
>>> config["mysqld"]["skip-bdb"]
>>> # Settings which aren't specified still raise an error:
>>> config["mysqld"]["does-not-exist"]
Traceback (most recent call last):
...
KeyError: 'does-not-exist'
delimiters, default value: ('=',':')
Delimiters are substrings that delimit keys from values within a section. The
first occurence of a delimiting substring on a line is considered a delimiter.
This means values (but not keys) can contain the delimiters.
Comment prefixes are strings that indicate the start of a valid comment within
a config file. comment_prefixes are used only on otherwise empty lines
(optionally indented) whereas inline_comment_prefixes can be used after
every valid value (e.g. section names, options and empty lines as well). By
default inline comments are disabled and '#' and ';' are used as
prefixes for whole line comments.
Changed in version 3.2:
Changed in version 3.2: In previous versions of configparser behaviour matched
comment_prefixes=('#',';') and inline_comment_prefixes=(';',).
Please note that config parsers don’t support escaping of comment prefixes so
using inline_comment_prefixes may prevent users from specifying option
values with characters used as comment prefixes. When in doubt, avoid setting
inline_comment_prefixes. In any circumstances, the only way of storing
comment prefix characters at the beginning of a line in multiline values is to
interpolate the prefix, for example:
>>> from configparser import ConfigParser, ExtendedInterpolation
>>> parser = ConfigParser(interpolation=ExtendedInterpolation())
>>> # the default BasicInterpolation could be used as well
>>> parser.read_string("""
... [DEFAULT]
... hash = #
...
... [hashes]
... shebang =
... ${hash}!/usr/bin/env python
... ${hash} -*- coding: utf-8 -*-
...
... extensions =
... enabled_extension
... another_extension
... #disabled_by_comment
... yet_another_extension
...
... interpolation not necessary = if # is not at line start
... even in multiline values = line #1
... line #2
... line #3
... """)
>>> print(parser['hashes']['shebang'])
#!/usr/bin/env python
# -*- coding: utf-8 -*-
>>> print(parser['hashes']['extensions'])
enabled_extension
another_extension
yet_another_extension
>>> print(parser['hashes']['interpolation not necessary'])
if # is not at line start
>>> print(parser['hashes']['even in multiline values'])
line #1
line #2
line #3
strict, default value: True
When set to True, the parser will not allow for any section or option
duplicates while reading from a single source (using read_file(),
read_string() or read_dict()). It is recommended to use strict
parsers in new applications.
Changed in version 3.2:
Changed in version 3.2: In previous versions of configparser behaviour matched
strict=False.
empty_lines_in_values, default value: True
In config parsers, values can span multiple lines as long as they are
indented more than the key that holds them. By default parsers also let
empty lines to be parts of values. At the same time, keys can be arbitrarily
indented themselves to improve readability. In consequence, when
configuration files get big and complex, it is easy for the user to lose
track of the file structure. Take for instance:
[Section]key=multiline value with a gotchathis=is still a part of the multiline value of 'key'
This can be especially problematic for the user to see if she’s using a
proportional font to edit the file. That is why when your application does
not need values with empty lines, you should consider disallowing them. This
will make empty lines split keys every time. In the example above, it would
produce two keys, key and this.
default_section, default value: configparser.DEFAULTSECT (that is:
"DEFAULT")
The convention of allowing a special section of default values for other
sections or interpolation purposes is a powerful concept of this library,
letting users create complex declarative configurations. This section is
normally called "DEFAULT" but this can be customized to point to any
other valid section name. Some typical values include: "general" or
"common". The name provided is used for recognizing default sections when
reading from any source and is used when writing configuration back to
a file. Its current value can be retrieved using the
parser_instance.default_section attribute and may be modified at runtime
(i.e. to convert files from one format to another).
Interpolation behaviour may be customized by providing a custom handler
through the interpolation argument. None can be used to turn off
interpolation completely, ExtendedInterpolation() provides a more
advanced variant inspired by zc.buildout. More on the subject in the
dedicated documentation section.
RawConfigParser has a default value of None.
More advanced customization may be achieved by overriding default values of
these parser attributes. The defaults are defined on the classes, so they
may be overriden by subclasses or by attribute assignment.
By default when using getboolean(), config parsers consider the
following values True: '1', 'yes', 'true', 'on' and the
following values False: '0', 'no', 'false', 'off'. You
can override this by specifying a custom dictionary of strings and their
Boolean outcomes. For example:
This method transforms option names on every read, get, or set
operation. The default converts the name to lowercase. This also
means that when a configuration file gets written, all keys will be
lowercase. Override this method if that’s unsuitable.
For example:
A compiled regular expression used to parse section headers. The default
matches [section] to the name "section". Whitespace is considered part
of the section name, thus [larch] will be read as a section of name
"larch". Override this attribute if that’s unsuitable. For example:
While ConfigParser objects also use an OPTCRE attribute for recognizing
option lines, it’s not recommended to override it because that would
interfere with constructor options allow_no_value and delimiters.
Mainly because of backwards compatibility concerns, configparser
provides also a legacy API with explicit get/set methods. While there
are valid use cases for the methods outlined below, mapping protocol access is
preferred for new projects. The legacy API is at times more advanced,
low-level and downright counterintuitive.
An example of writing to a configuration file:
import configparser
config = configparser.RawConfigParser()
# Please note that using RawConfigParser's set functions, you can assign
# non-string values to keys internally, but will receive an error when
# attempting to write to a file or when you get it in non-raw mode. Setting
# values using the mapping protocol or ConfigParser's set() does not allow
# such assignments to take place.
config.add_section('Section1')
config.set('Section1', 'int', '15')
config.set('Section1', 'bool', 'true')
config.set('Section1', 'float', '3.1415')
config.set('Section1', 'baz', 'fun')
config.set('Section1', 'bar', 'Python')
config.set('Section1', 'foo', '%(bar)s is %(baz)s!')
# Writing our configuration file to 'example.cfg'
with open('example.cfg', 'w') as configfile:
config.write(configfile)
An example of reading the configuration file again:
import configparser
config = configparser.RawConfigParser()
config.read('example.cfg')
# getfloat() raises an exception if the value is not a float
# getint() and getboolean() also do this for their respective types
float = config.getfloat('Section1', 'float')
int = config.getint('Section1', 'int')
print(float + int)
# Notice that the next output does not interpolate '%(bar)s' or '%(baz)s'.
# This is because we are using a RawConfigParser().
if config.getboolean('Section1', 'bool'):
print(config.get('Section1', 'foo'))
import configparser
cfg = configparser.ConfigParser()
cfg.read('example.cfg')
# Set the optional `raw` argument of get() to True if you wish to disable
# interpolation in a single get operation.
print(cfg.get('Section1', 'foo', raw=False)) # -> "Python is fun!"
print(cfg.get('Section1', 'foo', raw=True)) # -> "%(bar)s is %(baz)s!"
# The optional `vars` argument is a dict with members that will take
# precedence in interpolation.
print(cfg.get('Section1', 'foo', vars={'bar': 'Documentation',
'baz': 'evil'}))
# The optional `fallback` argument can be used to provide a fallback value
print(cfg.get('Section1', 'foo'))
# -> "Python is fun!"
print(cfg.get('Section1', 'foo', fallback='Monty is not.'))
# -> "Python is fun!"
print(cfg.get('Section1', 'monster', fallback='No such things as monsters.'))
# -> "No such things as monsters."
# A bare print(cfg.get('Section1', 'monster')) would raise NoOptionError
# but we can also use:
print(cfg.get('Section1', 'monster', fallback=None))
# -> None
Default values are available in both types of ConfigParsers. They are used in
interpolation if an option used is not defined elsewhere.
import configparser
# New instance with 'bar' and 'baz' defaulting to 'Life' and 'hard' each
config = configparser.ConfigParser({'bar': 'Life', 'baz': 'hard'})
config.read('example.cfg')
print(config.get('Section1', 'foo')) # -> "Python is fun!"
config.remove_option('Section1', 'bar')
config.remove_option('Section1', 'baz')
print(config.get('Section1', 'foo')) # -> "Life is hard!"
The main configuration parser. When defaults is given, it is initialized
into the dictionary of intrinsic defaults. When dict_type is given, it
will be used to create the dictionary objects for the list of sections, for
the options within a section, and for the default values.
When delimiters is given, it is used as the set of substrings that
divide keys from values. When comment_prefixes is given, it will be used
as the set of substrings that prefix comments in otherwise empty lines.
Comments can be indented. When inline_comment_prefixes is given, it will be
used as the set of substrings that prefix comments in non-empty lines.
When strict is True (the default), the parser won’t allow for
any section or option duplicates while reading from a single source (file,
string or dictionary), raising DuplicateSectionError or
DuplicateOptionError. When empty_lines_in_values is False
(default: True), each empty line marks the end of an option. Otherwise,
internal empty lines of a multiline option are kept as part of the value.
When allow_no_value is True (default: False), options without
values are accepted; the value held for these is None and they are
serialized without the trailing delimiter.
When default_section is given, it specifies the name for the special
section holding default values for other sections and interpolation purposes
(normally named "DEFAULT"). This value can be retrieved and changed on
runtime using the default_section instance attribute.
Interpolation behaviour may be customized by providing a custom handler
through the interpolation argument. None can be used to turn off
interpolation completely, ExtendedInterpolation() provides a more
advanced variant inspired by zc.buildout. More on the subject in the
dedicated documentation section.
All option names used in interpolation will be passed through the
optionxform() method just like any other option name reference. For
example, using the default implementation of optionxform() (which
converts option names to lower case), the values foo%(bar)s and foo%(BAR)s are equivalent.
Add a section named section to the instance. If a section by the given
name already exists, DuplicateSectionError is raised. If the
default section name is passed, ValueError is raised. The name
of the section must be a string; if not, TypeError is raised.
Changed in version 3.2:
Changed in version 3.2: Non-string section names raise TypeError.
If the given section exists, and contains the given option, return
True; otherwise return False. If the specified
section is None or an empty string, DEFAULT is assumed.
Attempt to read and parse a list of filenames, returning a list of
filenames which were successfully parsed. If filenames is a string, it
is treated as a single filename. If a file named in filenames cannot
be opened, that file will be ignored. This is designed so that you can
specify a list of potential configuration file locations (for example,
the current directory, the user’s home directory, and some system-wide
directory), and all existing configuration files in the list will be
read. If none of the named files exist, the ConfigParser
instance will contain an empty dataset. An application which requires
initial values to be loaded from a file should load the required file or
files using read_file() before calling read() for any
optional files:
import configparser, os
config = configparser.ConfigParser()
config.read_file(open('defaults.cfg'))
config.read(['site.cfg', os.path.expanduser('~/.myapp.cfg')],
encoding='cp1250')
New in version 3.2:
New in version 3.2: The encoding parameter. Previously, all files were read using the
default encoding for open().
Read and parse configuration data from f which must be an iterable
yielding Unicode strings (for example files opened in text mode).
Optional argument source specifies the name of the file being read. If
not given and f has a name attribute, that is used for
source; the default is '<???>'.
Optional argument source specifies a context-specific name of the
string passed. If not given, '<string>' is used. This should
commonly be a filesystem path or a URL.
Load configuration from any object that provides a dict-like items()
method. Keys are section names, values are dictionaries with keys and
values that should be present in the section. If the used dictionary
type preserves order, sections and their keys will be added in order.
Values are automatically converted to strings.
Optional argument source specifies a context-specific name of the
dictionary passed. If not given, <dict> is used.
This method can be used to copy state between parsers.
Get an option value for the named section. If vars is provided, it
must be a dictionary. The option is looked up in vars (if provided),
section, and in DEFAULTSECT in that order. If the key is not found
and fallback is provided, it is used as a fallback value. None can
be provided as a fallback value.
All the '%' interpolations are expanded in the return values, unless
the raw argument is true. Values for interpolation keys are looked up
in the same manner as the option.
Changed in version 3.2:
Changed in version 3.2: Arguments raw, vars and fallback are keyword only to protect
users from trying to use the third argument as the fallback fallback
(especially when using the mapping protocol).
A convenience method which coerces the option in the specified section
to a floating point number. See get() for explanation of raw,
vars and fallback.
A convenience method which coerces the option in the specified section
to a Boolean value. Note that the accepted values for the option are
'1', 'yes', 'true', and 'on', which cause this method to
return True, and '0', 'no', 'false', and 'off', which
cause it to return False. These string values are checked in a
case-insensitive manner. Any other value will cause it to raise
ValueError. See get() for explanation of raw, vars and
fallback.
When section is not given, return a list of section_name,
section_proxy pairs, including DEFAULTSECT.
Otherwise, return a list of name, value pairs for the options in the
given section. Optional arguments have the same meaning as for the
get() method.
Changed in version 3.2:
Changed in version 3.2: Items present in vars no longer appear in the result. The previous
behaviour mixed actual parser options with variables provided for
interpolation.
If the given section exists, set the given option to the specified value;
otherwise raise NoSectionError. option and value must be
strings; if not, TypeError is raised.
Write a representation of the configuration to the specified file
object, which must be opened in text mode (accepting strings). This
representation can be parsed by a future read() call. If
space_around_delimiters is true, delimiters between
keys and values are surrounded by spaces.
Remove the specified option from the specified section. If the
section does not exist, raise NoSectionError. If the option
existed to be removed, return True; otherwise return
False.
Transforms the option name option as found in an input file or as passed
in by client code to the form that should be used in the internal
structures. The default implementation returns a lower-case version of
option; subclasses may override this or client code can set an attribute
of this name on instances to affect this behavior.
You don’t need to subclass the parser to use this method, you can also
set it on an instance, to a function that takes a string argument and
returns a string. Setting it to str, for example, would make option
names case sensitive:
Deprecated since version 3.2: Use read_file() instead.
Changed in version 3.2:
Changed in version 3.2: readfp() now iterates on f instead of calling f.readline().
For existing code calling readfp() with arguments which don’t
support iteration, the following generator may be used as a wrapper
around the file-like object:
def readline_generator(f):
line = f.readline()
while line:
yield line
line = f.readline()
Instead of parser.readfp(f) use
parser.read_file(readline_generator(f)).
Legacy variant of the ConfigParser with interpolation disabled
by default and unsafe add_section and set methods.
Note
Consider using ConfigParser instead which checks types of
the values to be stored internally. If you don’t want interpolation, you
can use ConfigParser(interpolation=None).
Add a section named section to the instance. If a section by the given
name already exists, DuplicateSectionError is raised. If the
default section name is passed, ValueError is raised.
Type of section is not checked which lets users create non-string named
sections. This behaviour is unsupported and may cause internal errors.
If the given section exists, set the given option to the specified value;
otherwise raise NoSectionError. While it is possible to use
RawConfigParser (or ConfigParser with raw parameters
set to true) for internal storage of non-string values, full
functionality (including interpolation and output to files) can only be
achieved using string values.
This method lets users assign non-string values to keys internally. This
behaviour is unsupported and will cause errors when attempting to write
to a file or get it in non-raw mode. Use the mapping protocol API
which does not allow such assignments to take place.
Exception raised if add_section() is called with the name of a section
that is already present or in strict parsers when a section if found more
than once in a single input file, string or dictionary.
New in version 3.2:
New in version 3.2: Optional source and lineno attributes and arguments to
__init__() were added.
Exception raised by strict parsers if a single option appears twice during
reading from a single file, string or dictionary. This catches misspellings
and case sensitivity-related errors, e.g. a dictionary may have two keys
representing the same case-insensitive configuration key.
Exception raised when errors occur attempting to parse a file.
Changed in version 3.2:
Changed in version 3.2: The filename attribute and __init__() argument were renamed to
source for consistency.
Footnotes
[1]
(1, 2, 3, 4, 5, 6, 7, 8, 9) Config parsers allow for heavy customization. If you are interested in
changing the behaviour outlined by the footnote reference, consult the
Customizing Parser Behaviour section.
A netrc instance or subclass instance encapsulates data from a netrc
file. The initialization argument, if present, specifies the file to parse. If
no argument is given, the file .netrc in the user’s home directory will
be read. Parse errors will raise NetrcParseError with diagnostic
information including the file name, line number, and terminating token.
Exception raised by the netrc class when syntactical errors are
encountered in source text. Instances of this exception provide three
interesting attributes: msg is a textual explanation of the error,
filename is the name of the source file, and lineno gives the
line number on which the error was found.
Return a 3-tuple (login,account,password) of authenticators for host.
If the netrc file did not contain an entry for the given host, return the tuple
associated with the ‘default’ entry. If neither matching host nor default entry
is available, return None.
Passwords are limited to a subset of the ASCII character set. All ASCII
punctuation is allowed in passwords, however, note that whitespace and
non-printable characters are not allowed in passwords. This is a limitation
of the way the .netrc file is parsed and may be removed in the future.
The xdrlib module supports the External Data Representation Standard as
described in RFC 1014, written by Sun Microsystems, Inc. June 1987. It
supports most of the data types described in the RFC.
The xdrlib module defines two classes, one for packing variables into XDR
representation, and another for unpacking from XDR representation. There are
also two exception classes.
In general, you can pack any of the most common XDR data types by calling the
appropriate pack_type() method. Each method takes a single argument, the
value to pack. The following simple data type packing methods are supported:
pack_uint(), pack_int(), pack_enum(), pack_bool(),
pack_uhyper(), and pack_hyper().
Packs a fixed length string, s. n is the length of the string but it is
not packed into the data buffer. The string is padded with null bytes if
necessary to guaranteed 4 byte alignment.
Packs a variable length string, s. The length of the string is first packed
as an unsigned integer, then the string data is packed with
pack_fstring().
Packs a list of homogeneous items. This method is useful for lists with an
indeterminate size; i.e. the size is not available until the entire list has
been walked. For each item in the list, an unsigned integer 1 is packed
first, followed by the data value from the list. pack_item is the function
that is called to pack the individual item. At the end of the list, an unsigned
integer 0 is packed.
For example, to pack a list of integers, the code might appear like this:
import xdrlib
p = xdrlib.Packer()
p.pack_list([1, 2, 3], p.pack_int)
Packs a fixed length list (array) of homogeneous items. n is the length of
the list; it is not packed into the buffer, but a ValueError exception
is raised if len(array) is not equal to n. As above, pack_item is the
function used to pack each element.
Packs a variable length list of homogeneous items. First, the length of the
list is packed as an unsigned integer, then each element is packed as in
pack_farray() above.
Indicates unpack completion. Raises an Error exception if all of the
data has not been unpacked.
In addition, every data type that can be packed with a Packer, can be
unpacked with an Unpacker. Unpacking methods are of the form
unpack_type(), and take no arguments. They return the unpacked object.
Unpacks and returns a variable length string. The length of the string is first
unpacked as an unsigned integer, then the string data is unpacked with
unpack_fstring().
Unpacks and returns a list of homogeneous items. The list is unpacked one
element at a time by first unpacking an unsigned integer flag. If the flag is
1, then the item is unpacked and appended to the list. A flag of 0
indicates the end of the list. unpack_item is the function that is called to
unpack the items.
Unpacks and returns (as a list) a fixed length array of homogeneous items. n
is number of list elements to expect in the buffer. As above, unpack_item is
the function used to unpack each element.
Unpacks and returns a variable length list of homogeneous items. First, the
length of the list is unpacked as an unsigned integer, then each element is
unpacked as in unpack_farray() above.
This module provides an interface for reading and writing the “property list”
XML files used mainly by Mac OS X.
The property list (.plist) file format is a simple XML pickle supporting
basic object types, like dictionaries, lists, numbers and strings. Usually the
top level object is a dictionary.
Values can be strings, integers, floats, booleans, tuples, lists, dictionaries
(but only with string keys), Data or datetime.datetime
objects. String values (including dictionary keys) have to be unicode strings –
they will be written out as UTF-8.
The <data> plist type is supported through the Data class. This is
a thin wrapper around a Python bytes object. Use Data if your strings
contain control characters.
Read a plist file. pathOrFile may either be a file name or a (readable)
file object. Return the unpacked root object (which usually is a
dictionary).
The XML data is parsed using the Expat parser from xml.parsers.expat
– see its documentation for possible exceptions on ill-formed XML.
Unknown elements will simply be ignored by the plist parser.
Return a “data” wrapper object around the bytes object data. This is used
in functions converting from/to plists to represent the <data> type
available in plists.
It has one attribute, data, that can be used to retrieve the Python
bytes object stored in it.
The modules described in this chapter implement various algorithms of a
cryptographic nature. They are available at the discretion of the installation.
Here’s an overview:
This module implements a common interface to many different secure hash and
message digest algorithms. Included are the FIPS secure hash algorithms SHA1,
SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5
algorithm (defined in Internet RFC 1321). The terms “secure hash” and
“message digest” are interchangeable. Older algorithms were called message
digests. The modern term is secure hash.
Note
If you want the adler32 or crc32 hash functions they are available in
the zlib module.
Warning
Some algorithms have known hash collision weaknesses, see the FAQ at the end.
There is one constructor method named for each type of hash. All return
a hash object with the same simple interface. For example: use sha1() to
create a SHA1 hash object. You can now feed this object with objects conforming
to the buffer interface (normally bytes objects) using the
update() method. At any point you can ask it for the digest of the
concatenation of the data fed to it so far using the digest() or
hexdigest() methods.
Note
For better multithreading performance, the Python GIL is released for
strings of more than 2047 bytes at object creation or on update.
Note
Feeding string objects is to update() is not supported, as hashes work
on bytes, not on characters.
Constructors for hash algorithms that are always present in this module are
md5(), sha1(), sha224(), sha256(), sha384(), and
sha512(). Additional algorithms may also be available depending upon the
OpenSSL library that Python uses on your platform.
For example, to obtain the digest of the byte string b'Nobodyinspectsthespammishrepetition':
Is a generic constructor that takes the string name of the desired
algorithm as its first parameter. It also exists to allow access to the
above listed hashes as well as any other algorithms that your OpenSSL
library may offer. The named constructors are much faster than new()
and should be preferred.
Using new() with an algorithm provided by OpenSSL:
>>> h = hashlib.new('ripemd160')
>>> h.update(b"Nobody inspects the spammish repetition")
>>> h.hexdigest()
'cc4a5ce1b3df48aec5d22d1f16b894a0b894eccc'
Hashlib provides the following constant attributes:
Contains the names of the hash algorithms that are available
in the running Python interpreter. These names will be recognized
when passed to new(). algorithms_guaranteed
will always be a subset. Duplicate algorithms with different
name formats may appear in this set (thanks to OpenSSL).
New in version 3.2:
New in version 3.2.
The following values are provided as constant attributes of the hash objects
returned by the constructors:
Update the hash object with the object arg, which must be interpretable as
a buffer of bytes. Repeated calls are equivalent to a single call with the
concatenation of all the arguments: m.update(a);m.update(b) is
equivalent to m.update(a+b).
Changed in version 3.1:
Changed in version 3.1: The Python GIL is released to allow other threads to run while hash
updates on data larger than 2048 bytes is taking place when using hash
algorithms supplied by OpenSSL.
Return the digest of the data passed to the update() method so far.
This is a bytes object of size digest_size which may contain bytes in
the whole range from 0 to 255.
Like digest() except the digest is returned as a string object of
double length, containing only hexadecimal digits. This may be used to
exchange the value safely in email or other non-binary environments.
Return a new hmac object. key is a bytes object giving the secret key. If
msg is present, the method call update(msg) is made. digestmod is
the digest constructor or module for the HMAC object to use. It defaults to
the hashlib.md5() constructor.
Update the hmac object with the bytes object msg. Repeated calls are
equivalent to a single call with the concatenation of all the arguments:
m.update(a);m.update(b) is equivalent to m.update(a+b).
Return the digest of the bytes passed to the update() method so far.
This bytes object will be the same length as the digest_size of the digest
given to the constructor. It may contain non-ASCII bytes, including NUL
bytes.
Like digest() except the digest is returned as a string twice the
length containing only hexadecimal digits. This may be used to exchange the
value safely in email or other non-binary environments.
The Python module providing secure hash functions.
Hardcore cypherpunks will probably find the cryptographic modules written by
A.M. Kuchling of further interest; the package contains modules for various
encryption algorithms, most notably AES. These modules are not distributed with
Python but available separately. See the URL
http://www.pycrypto.org for more information.
The modules described in this chapter provide interfaces to operating system
features that are available on (almost) all operating systems, such as files and
a clock. The interfaces are generally modeled after the Unix or C interfaces,
but they are available on most other systems as well. Here’s an overview:
This module provides a portable way of using operating system dependent
functionality. If you just want to read or write a file see open(), if
you want to manipulate paths, see the os.path module, and if you want to
read all the lines in all the files on the command line see the fileinput
module. For creating temporary files and directories see the tempfile
module, and for high-level file and directory handling see the shutil
module.
Notes on the availability of these functions:
The design of all built-in operating system dependent modules of Python is
such that as long as the same functionality is available, it uses the same
interface; for example, the function os.stat(path) returns stat
information about path in the same format (which happens to have originated
with the POSIX interface).
Extensions peculiar to a particular operating system are also available
through the os module, but using them is of course a threat to
portability.
All functions accepting path or file names accept both bytes and string
objects, and result in an object of the same type, if a path or file name is
returned.
Note
If not separately noted, all functions that claim “Availability: Unix” are
supported on Mac OS X, which builds on a Unix core.
An “Availability: Unix” note means that this function is commonly found on
Unix systems. It does not make any claims about its existence on a specific
operating system.
If not separately noted, all functions that claim “Availability: Unix” are
supported on Mac OS X, which builds on a Unix core.
Note
All functions in this module raise OSError in the case of invalid or
inaccessible file names and paths, or other arguments that have the correct
type, but are not accepted by the operating system.
The name of the operating system dependent module imported. The following
names have currently been registered: 'posix', 'nt', 'mac',
'os2', 'ce', 'java'.
See also
sys.platform has a finer granularity. os.uname() gives
system-dependent version information.
The platform module provides detailed checks for the
system’s identity.
File Names, Command Line Arguments, and Environment Variables¶
In Python, file names, command line arguments, and environment variables are
represented using the string type. On some systems, decoding these strings to
and from bytes is necessary before passing them to the operating system. Python
uses the file system encoding to perform this conversion (see
sys.getfilesystemencoding()).
Changed in version 3.1:
Changed in version 3.1: On some systems, conversion using the file system encoding may fail. In this
case, Python uses the surrogateescape encoding error handler, which means
that undecodable bytes are replaced by a Unicode character U+DCxx on
decoding, and these are again translated to the original byte on encoding.
The file system encoding must guarantee to successfully decode all bytes
below 128. If the file system encoding fails to provide this guarantee, API
functions may raise UnicodeErrors.
A mapping object representing the string environment. For example,
environ['HOME'] is the pathname of your home directory (on some platforms),
and is equivalent to getenv("HOME") in C.
This mapping is captured the first time the os module is imported,
typically during Python startup as part of processing site.py. Changes
to the environment made after this time are not reflected in os.environ,
except for changes made by modifying os.environ directly.
If the platform supports the putenv() function, this mapping may be used
to modify the environment as well as query the environment. putenv() will
be called automatically when the mapping is modified.
On Unix, keys and values use sys.getfilesystemencoding() and
'surrogateescape' error handler. Use environb if you would like
to use a different encoding.
Note
Calling putenv() directly does not change os.environ, so it’s better
to modify os.environ.
Note
On some platforms, including FreeBSD and Mac OS X, setting environ may
cause memory leaks. Refer to the system documentation for
putenv().
If putenv() is not provided, a modified copy of this mapping may be
passed to the appropriate process-creation functions to cause child processes
to use a modified environment.
If the platform supports the unsetenv() function, you can delete items in
this mapping to unset environment variables. unsetenv() will be called
automatically when an item is deleted from os.environ, and when
one of the pop() or clear() methods is called.
Bytes version of environ: a mapping object representing the
environment as byte strings. environ and environb are
synchronized (modify environb updates environ, and vice
versa).
Returns the list of directories that will be searched for a named
executable, similar to a shell, when launching a process.
env, when specified, should be an environment variable dictionary
to lookup the PATH in.
By default, when env is None, environ is used.
Call the system initgroups() to initialize the group access list with all of
the groups of which the specified username is a member, plus the specified
group id.
Return the name of the user logged in on the controlling terminal of the
process. For most purposes, it is more useful to use the environment variables
LOGNAME or USERNAME to find out who the user is, or
pwd.getpwuid(os.getuid())[0] to get the login name of the currently
effective user id.
Return the parent’s process id. When the parent process has exited, on Unix
the id returned is the one of the init process (1), on Windows it is still
the same id, which may be already reused by another process.
Availability: Unix, Windows
Changed in version 3.2:
Changed in version 3.2: Added support for Windows.
Return the value of the environment variable key if it exists, or
default if it doesn’t. key, default and the result are str.
On Unix, keys and values are decoded with sys.getfilesystemencoding()
and 'surrogateescape' error handler. Use os.getenvb() if you
would like to use a different encoding.
Set the environment variable named key to the string value. Such
changes to the environment affect subprocesses started with os.system(),
popen() or fork() and execv().
Availability: most flavors of Unix, Windows.
Note
On some platforms, including FreeBSD and Mac OS X, setting environ may
cause memory leaks. Refer to the system documentation for putenv.
When putenv() is supported, assignments to items in os.environ are
automatically translated into corresponding calls to putenv(); however,
calls to putenv() don’t update os.environ, so it is actually
preferable to assign to items of os.environ.
Set the list of supplemental group ids associated with the current process to
groups. groups must be a sequence, and each element must be an integer
identifying a group. This operation is typically available only to the superuser.
Call the system call setpgid() to set the process group id of the
process with id pid to the process group with id pgrp. See the Unix manual
for the semantics.
Return the error message corresponding to the error code in code.
On platforms where strerror() returns NULL when given an unknown
error number, ValueError is raised.
Return a 5-tuple containing information identifying the current operating
system. The tuple contains 5 strings: (sysname,nodename,release,version,machine). Some systems truncate the nodename to 8 characters or to the
leading component; a better way to get the hostname is
socket.gethostname() or even
socket.gethostbyaddr(socket.gethostname()).
Unset (delete) the environment variable named key. Such changes to the
environment affect subprocesses started with os.system(), popen() or
fork() and execv().
When unsetenv() is supported, deletion of items in os.environ is
automatically translated into a corresponding call to unsetenv(); however,
calls to unsetenv() don’t update os.environ, so it is actually
preferable to delete items of os.environ.
Return an open file object connected to the file descriptor fd. The mode
and bufsize arguments have the same meaning as the corresponding arguments to
the built-in open() function.
When specified, the mode argument must start with one of the letters
'r', 'w', or 'a', otherwise a ValueError is raised.
On Unix, when the mode argument starts with 'a', the O_APPEND flag is
set on the file descriptor (which the fdopen() implementation already
does on most platforms).
These functions operate on I/O streams referenced using file descriptors.
File descriptors are small integers corresponding to a file that has been opened
by the current process. For example, standard input is usually file descriptor
0, standard output is 1, and standard error is 2. Further files opened by a
process will then be assigned 3, 4, 5, and so forth. The name “file descriptor”
is slightly deceptive; on Unix platforms, sockets and pipes are also referenced
by file descriptors.
The fileno() method can be used to obtain the file descriptor
associated with a file object when required. Note that using the file
descriptor directly will bypass the file object methods, ignoring aspects such
as internal buffering of data.
This function is intended for low-level I/O and must be applied to a file
descriptor as returned by os.open() or pipe(). To close a “file
object” returned by the built-in function open() or by popen() or
fdopen(), use its close() method.
Return system configuration information relevant to an open file. name
specifies the configuration value to retrieve; it may be a string which is the
name of a defined system value; these names are specified in a number of
standards (POSIX.1, Unix 95, Unix 98, and others). Some platforms define
additional names as well. The names known to the host operating system are
given in the pathconf_names dictionary. For configuration variables not
included in that mapping, passing an integer for name is also accepted.
If name is a string and is not known, ValueError is raised. If a
specific value for name is not supported by the host system, even if it is
included in pathconf_names, an OSError is raised with
errno.EINVAL for the error number.
Force write of file with filedescriptor fd to disk. On Unix, this calls the
native fsync() function; on Windows, the MS _commit() function.
If you’re starting with a buffered Python file objectf, first do
f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal
buffers associated with f are written to disk.
Set the current position of file descriptor fd to position pos, modified
by how: SEEK_SET or 0 to set the position relative to the
beginning of the file; SEEK_CUR or 1 to set it relative to the
current position; os.SEEK_END or 2 to set it relative to the end of
the file.
Open the file file and set various flags according to flags and possibly
its mode according to mode. The default mode is 0o777 (octal), and
the current umask value is first masked out. Return the file descriptor for
the newly opened file.
For a description of the flag and mode values, see the C run-time documentation;
flag constants (like O_RDONLY and O_WRONLY) are defined in
this module too (see open() flag constants). In particular, on Windows adding
O_BINARY is needed to open files in binary mode.
Availability: Unix, Windows.
Note
This function is intended for low-level I/O. For normal usage, use the
built-in function open(), which returns a file object with
read() and write() methods (and many more). To
wrap a file descriptor in a file object, use fdopen().
Open a new pseudo-terminal pair. Return a pair of file descriptors (master,slave) for the pty and the tty, respectively. For a (slightly) more portable
approach, use the pty module.
Read at most n bytes from file descriptor fd. Return a bytestring containing the
bytes read. If the end of the file referred to by fd has been reached, an
empty bytes object is returned.
Availability: Unix, Windows.
Note
This function is intended for low-level I/O and must be applied to a file
descriptor as returned by os.open() or pipe(). To read a “file object”
returned by the built-in function open() or by popen() or
fdopen(), or sys.stdin, use its read() or
readline() methods.
Return a string which specifies the terminal device associated with
file descriptor fd. If fd is not associated with a terminal device, an
exception is raised.
Write the bytestring in str to file descriptor fd. Return the number of
bytes actually written.
Availability: Unix, Windows.
Note
This function is intended for low-level I/O and must be applied to a file
descriptor as returned by os.open() or pipe(). To write a “file
object” returned by the built-in function open() or by popen() or
fdopen(), or sys.stdout or sys.stderr, use its
write() method.
The following constants are options for the flags parameter to the
open() function. They can be combined using the bitwise OR operator
|. Some of them are not available on all platforms. For descriptions of
their availability and use, consult the open(2) manual page on Unix
or the MSDN on Windows.
Use the real uid/gid to test for access to path. Note that most operations
will use the effective uid/gid, therefore this routine can be used in a
suid/sgid environment to test if the invoking user has the specified access to
path. mode should be F_OK to test the existence of path, or it
can be the inclusive OR of one or more of R_OK, W_OK, and
X_OK to test permissions. Return True if access is allowed,
False if not. See the Unix man page access(2) for more
information.
Availability: Unix, Windows.
Note
Using access() to check if a user is authorized to e.g. open a file
before actually doing so using open() creates a security hole,
because the user might exploit the short time interval between checking
and opening the file to manipulate it. It’s preferable to use EAFP
techniques. For example:
if os.access("myfile", os.R_OK):
with open("myfile") as fp:
return fp.read()
return "some default data"
is better written as:
try:
fp = open("myfile")
except IOError as e:
if e.errno == errno.EACCESS:
return "some default data"
# Not a permission error.
raise
else:
with fp:
return fp.read()
Note
I/O operations may fail even when access() indicates that they would
succeed, particularly for operations on network filesystems which may have
permissions semantics beyond the usual POSIX permission-bit model.
Change the current working directory to the directory represented by the file
descriptor fd. The descriptor must refer to an opened directory, not an open
file.
Change the mode of path to the numeric mode. mode may take one of the
following values (as defined in the stat module) or bitwise ORed
combinations of them:
Although Windows supports chmod(), you can only set the file’s read-only
flag with it (via the stat.S_IWRITE and stat.S_IREAD
constants or a corresponding integer value). All other bits are
ignored.
Change the mode of path to the numeric mode. If path is a symlink, this
affects the symlink rather than the target. See the docs for chmod()
for possible values of mode.
Return a list containing the names of the entries in the directory given by
path (default: '.'). The list is in arbitrary order. It does not include the special
entries '.' and '..' even if they are present in the directory.
This function can be called with a bytes or string argument, and returns
filenames of the same datatype.
Availability: Unix, Windows.
Changed in version 3.2:
Changed in version 3.2: The path parameter became optional.
Perform the equivalent of an lstat() system call on the given path.
Similar to stat(), but does not follow symbolic links. On
platforms that do not support symbolic links, this is an alias for
stat().
Changed in version 3.2:
Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links.
Create a FIFO (a named pipe) named path with numeric mode mode. The
default mode is 0o666 (octal). The current umask value is first masked
out from the mode.
FIFOs are pipes that can be accessed like regular files. FIFOs exist until they
are deleted (for example with os.unlink()). Generally, FIFOs are used as
rendezvous between “client” and “server” type processes: the server opens the
FIFO for reading, and the client opens it for writing. Note that mkfifo()
doesn’t open the FIFO — it just creates the rendezvous point.
Create a filesystem node (file, device special file or named pipe) named
filename. mode specifies both the permissions to use and the type of node
to be created, being combined (bitwise OR) with one of stat.S_IFREG,
stat.S_IFCHR, stat.S_IFBLK, and stat.S_IFIFO (those constants are
available in stat). For stat.S_IFCHR and stat.S_IFBLK,
device defines the newly created device special file (probably using
os.makedev()), otherwise it is ignored.
Create a directory named path with numeric mode mode. The default mode
is 0o777 (octal). On some systems, mode is ignored. Where it is used,
the current umask value is first masked out. If the directory already
exists, OSError is raised.
It is also possible to create temporary directories; see the
tempfile module’s tempfile.mkdtemp() function.
Recursive directory creation function. Like mkdir(), but makes all
intermediate-level directories needed to contain the leaf directory. If
the target directory with the same mode as specified already exists,
raises an OSError exception if exist_ok is False, otherwise no
exception is raised. If the directory cannot be created in other cases,
raises an OSError exception. The default mode is 0o777 (octal).
On some systems, mode is ignored. Where it is used, the current umask
value is first masked out.
Note
makedirs() will become confused if the path elements to create
include pardir.
Return system configuration information relevant to a named file. name
specifies the configuration value to retrieve; it may be a string which is the
name of a defined system value; these names are specified in a number of
standards (POSIX.1, Unix 95, Unix 98, and others). Some platforms define
additional names as well. The names known to the host operating system are
given in the pathconf_names dictionary. For configuration variables not
included in that mapping, passing an integer for name is also accepted.
If name is a string and is not known, ValueError is raised. If a
specific value for name is not supported by the host system, even if it is
included in pathconf_names, an OSError is raised with
errno.EINVAL for the error number.
Dictionary mapping names accepted by pathconf() and fpathconf() to
the integer values defined for those names by the host operating system. This
can be used to determine the set of names known to the system. Availability:
Unix.
Return a string representing the path to which the symbolic link points. The
result may be either an absolute or relative pathname; if it is relative, it may
be converted to an absolute pathname using os.path.join(os.path.dirname(path),result).
If the path is a string object, the result will also be a string object,
and the call may raise an UnicodeDecodeError. If the path is a bytes
object, the result will be a bytes object.
Availability: Unix, Windows
Changed in version 3.2:
Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links.
Remove (delete) the file path. If path is a directory, OSError is
raised; see rmdir() below to remove a directory. This is identical to
the unlink() function documented below. On Windows, attempting to
remove a file that is in use causes an exception to be raised; on Unix, the
directory entry is removed but the storage allocated to the file is not made
available until the original file is no longer in use.
Remove directories recursively. Works like rmdir() except that, if the
leaf directory is successfully removed, removedirs() tries to
successively remove every parent directory mentioned in path until an error
is raised (which is ignored, because it generally means that a parent directory
is not empty). For example, os.removedirs('foo/bar/baz') will first remove
the directory 'foo/bar/baz', and then remove 'foo/bar' and 'foo' if
they are empty. Raises OSError if the leaf directory could not be
successfully removed.
Rename the file or directory src to dst. If dst is a directory,
OSError will be raised. On Unix, if dst exists and is a file, it will
be replaced silently if the user has permission. The operation may fail on some
Unix flavors if src and dst are on different filesystems. If successful,
the renaming will be an atomic operation (this is a POSIX requirement). On
Windows, if dst already exists, OSError will be raised even if it is a
file; there may be no way to implement an atomic rename when dst names an
existing file.
Recursive directory or file renaming function. Works like rename(), except
creation of any intermediate directories needed to make the new pathname good is
attempted first. After the rename, directories corresponding to rightmost path
segments of the old name will be pruned away using removedirs().
Note
This function can fail with the new directory structure made if you lack
permissions needed to remove the leaf directory or file.
Remove (delete) the directory path. Only works when the directory is
empty, otherwise, OSError is raised. In order to remove whole
directory trees, shutil.rmtree() can be used.
Perform the equivalent of a stat() system call on the given path.
(This function follows symlinks; to stat a symlink use lstat().)
The return value is an object whose attributes correspond to the members
of the stat structure, namely:
st_mode - protection bits,
st_ino - inode number,
st_dev - device,
st_nlink - number of hard links,
st_uid - user id of owner,
st_gid - group id of owner,
st_size - size of file, in bytes,
st_atime - time of most recent access,
st_mtime - time of most recent content modification,
st_ctime - platform dependent; time of most recent metadata change on
Unix, or the time of creation on Windows)
On some Unix systems (such as Linux), the following attributes may also be
available:
st_blocks - number of blocks allocated for file
st_blksize - filesystem blocksize
st_rdev - type of device if an inode device
st_flags - user defined flags for file
On other Unix systems (such as FreeBSD), the following attributes may be
available (but may be only filled out if root tries to use them):
st_gen - file generation number
st_birthtime - time of file creation
On Mac OS systems, the following attributes may also be available:
st_rsize
st_creator
st_type
Note
The exact meaning and resolution of the st_atime,
st_mtime, and st_ctime attributes depend on the operating
system and the file system. For example, on Windows systems using the FAT
or FAT32 file systems, st_mtime has 2-second resolution, and
st_atime has only 1-day resolution. See your operating system
documentation for details.
For backward compatibility, the return value of stat() is also accessible
as a tuple of at least 10 integers giving the most important (and portable)
members of the stat structure, in the order st_mode,
st_ino, st_dev, st_nlink, st_uid,
st_gid, st_size, st_atime, st_mtime,
st_ctime. More items may be added at the end by some implementations.
The standard module stat defines functions and constants that are useful
for extracting information from a stat structure. (On Windows, some
items are filled with dummy values.)
Determine whether stat_result represents time stamps as float objects.
If newvalue is True, future calls to stat() return floats, if it is
False, future calls return ints. If newvalue is omitted, return the
current setting.
For compatibility with older Python versions, accessing stat_result as
a tuple always returns integers.
Python now returns float values by default. Applications which do not work
correctly with floating point time stamps can use this function to restore the
old behaviour.
The resolution of the timestamps (that is the smallest possible fraction)
depends on the system. Some systems only support second resolution; on these
systems, the fraction will always be zero.
It is recommended that this setting is only changed at program startup time in
the __main__ module; libraries should never change this setting. If an
application uses a library that works incorrectly if floating point time stamps
are processed, this application should turn the feature off until the library
has been corrected.
Perform a statvfs() system call on the given path. The return value is
an object whose attributes describe the filesystem on the given path, and
correspond to the members of the statvfs structure, namely:
f_bsize, f_frsize, f_blocks, f_bfree,
f_bavail, f_files, f_ffree, f_favail,
f_flag, f_namemax.
Two module-level constants are defined for the f_flag attribute’s
bit-flags: if ST_RDONLY is set, the filesystem is mounted
read-only, and if ST_NOSUID is set, the semantics of
setuid/setgid bits are disabled or not supported.
Changed in version 3.2:
Changed in version 3.2: The ST_RDONLY and ST_NOSUID constants were added.
Create a symbolic link pointing to source named link_name.
On Windows, symlink version takes an additional optional parameter,
target_is_directory, which defaults to False.
On Windows, a symlink represents a file or a directory, and does not morph to
the target dynamically. For this reason, when creating a symlink on Windows,
if the target is not already present, the symlink will default to being a
file symlink. If target_is_directory is set to True, the symlink will
be created as a directory symlink. This parameter is ignored if the target
exists (and the symlink is created with the same type as the target).
Symbolic link support was introduced in Windows 6.0 (Vista). symlink()
will raise a NotImplementedError on Windows versions earlier than 6.0.
Note
The SeCreateSymbolicLinkPrivilege is required in order to successfully
create symlinks. This privilege is not typically granted to regular
users but is available to accounts which can escalate privileges to the
administrator level. Either obtaining the privilege or running your
application as an administrator are ways to successfully create symlinks.
OSError is raised when the function is called by an unprivileged
user.
Availability: Unix, Windows.
Changed in version 3.2:
Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links.
Set the access and modified times of the file specified by path. If times
is None, then the file’s access and modified times are set to the current
time. (The effect is similar to running the Unix program touch on
the path.) Otherwise, times must be a 2-tuple of numbers, of the form
(atime,mtime) which is used to set the access and modified times,
respectively. Whether a directory can be given for path depends on whether
the operating system implements directories as files (for example, Windows
does not). Note that the exact times you set here may not be returned by a
subsequent stat() call, depending on the resolution with which your
operating system records access and modification times; see stat().
Generate the file names in a directory tree by walking the tree
either top-down or bottom-up. For each directory in the tree rooted at directory
top (including top itself), it yields a 3-tuple (dirpath,dirnames,filenames).
dirpath is a string, the path to the directory. dirnames is a list of the
names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists contain no path components. To get a full path
(which begins with top) to a file or directory in dirpath, do
os.path.join(dirpath,name).
If optional argument topdown is True or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top-down). If topdown is False, the triple for a
directory is generated after the triples for all of its subdirectories
(directories are generated bottom-up).
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only
recurse into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even to inform
walk() about directories the caller creates or renames before it resumes
walk() again. Modifying dirnames when topdown is False is
ineffective, because in bottom-up mode the directories in dirnames are
generated before dirpath itself is generated.
By default errors from the listdir() call are ignored. If optional
argument onerror is specified, it should be a function; it will be called with
one argument, an OSError instance. It can report the error to continue
with the walk, or raise the exception to abort the walk. Note that the filename
is available as the filename attribute of the exception object.
By default, walk() will not walk down into symbolic links that resolve to
directories. Set followlinks to True to visit directories pointed to by
symlinks, on systems that support them.
Note
Be aware that setting followlinks to True can lead to infinite recursion if a
link points to a parent directory of itself. walk() does not keep track of
the directories it visited already.
Note
If you pass a relative pathname, don’t change the current working directory
between resumptions of walk(). walk() never changes the current
directory, and assumes that its caller doesn’t either.
This example displays the number of bytes taken by non-directory files in each
directory under the starting directory, except that it doesn’t look under any
CVS subdirectory:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end=" ")
print(sum(getsize(join(root, name)) for name in files), end=" ")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
In the next example, walking the tree bottom-up is essential: rmdir()
doesn’t allow deleting a directory before the directory is empty:
# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
These functions may be used to create and manage processes.
The various exec*() functions take a list of arguments for the new
program loaded into the process. In each case, the first of these arguments is
passed to the new program as its own name rather than as an argument a user may
have typed on a command line. For the C programmer, this is the argv[0]
passed to a program’s main(). For example, os.execv('/bin/echo',['foo','bar']) will only print bar on standard output; foo will seem
to be ignored.
Generate a SIGABRT signal to the current process. On Unix, the default
behavior is to produce a core dump; on Windows, the process immediately returns
an exit code of 3. Be aware that calling this function will not call the
Python signal handler registered for SIGABRT with
signal.signal().
These functions all execute a new program, replacing the current process; they
do not return. On Unix, the new executable is loaded into the current process,
and will have the same process id as the caller. Errors will be reported as
OSError exceptions.
The current process is replaced immediately. Open file objects and
descriptors are not flushed, so if there may be data buffered
on these open files, you should flush them using
sys.stdout.flush() or os.fsync() before calling an
exec*() function.
The “l” and “v” variants of the exec*() functions differ in how
command-line arguments are passed. The “l” variants are perhaps the easiest
to work with if the number of parameters is fixed when the code is written; the
individual parameters simply become additional parameters to the execl*()
functions. The “v” variants are good when the number of parameters is
variable, with the arguments being passed in a list or tuple as the args
parameter. In either case, the arguments to the child process should start with
the name of the command being run, but this is not enforced.
The variants which include a “p” near the end (execlp(),
execlpe(), execvp(), and execvpe()) will use the
PATH environment variable to locate the program file. When the
environment is being replaced (using one of the exec*e() variants,
discussed in the next paragraph), the new environment is used as the source of
the PATH variable. The other variants, execl(), execle(),
execv(), and execve(), will not use the PATH variable to
locate the executable; path must contain an appropriate absolute or relative
path.
For execle(), execlpe(), execve(), and execvpe() (note
that these all end in “e”), the env parameter must be a mapping which is
used to define the environment variables for the new process (these are used
instead of the current process’ environment); the functions execl(),
execlp(), execv(), and execvp() all cause the new process to
inherit the environment of the current process.
Exit the process with status n, without calling cleanup handlers, flushing
stdio buffers, etc.
Availability: Unix, Windows.
Note
The standard way to exit is sys.exit(n). _exit() should
normally only be used in the child process after a fork().
The following exit codes are defined and can be used with _exit(),
although they are not required. These are typically used for system programs
written in Python, such as a mail server’s external command delivery program.
Note
Some of these may not be available on all Unix platforms, since there is some
variation. These constants are defined where they are defined by the underlying
platform.
Exit code that means a temporary failure occurred. This indicates something
that may not really be an error, such as a network connection that couldn’t be
made during a retryable operation.
Fork a child process, using a new pseudo-terminal as the child’s controlling
terminal. Return a pair of (pid,fd), where pid is 0 in the child, the
new child’s process id in the parent, and fd is the file descriptor of the
master end of the pseudo-terminal. For a more portable approach, use the
pty module. If an error occurs OSError is raised.
Send signal sig to the process pid. Constants for the specific signals
available on the host platform are defined in the signal module.
Windows: The signal.CTRL_C_EVENT and
signal.CTRL_BREAK_EVENT signals are special signals which can
only be sent to console processes which share a common console window,
e.g., some subprocesses. Any other value for sig will cause the process
to be unconditionally killed by the TerminateProcess API, and the exit code
will be set to sig. The Windows version of kill() additionally takes
process handles to be killed.
(Note that the subprocess module provides more powerful facilities for
spawning new processes and retrieving their results; using that module is
preferable to using these functions. Check especially the
Replacing Older Functions with the subprocess Module section.)
If mode is P_NOWAIT, this function returns the process id of the new
process; if mode is P_WAIT, returns the process’s exit code if it
exits normally, or -signal, where signal is the signal that killed the
process. On Windows, the process id will actually be the process handle, so can
be used with the waitpid() function.
The “l” and “v” variants of the spawn*() functions differ in how
command-line arguments are passed. The “l” variants are perhaps the easiest
to work with if the number of parameters is fixed when the code is written; the
individual parameters simply become additional parameters to the
spawnl*() functions. The “v” variants are good when the number of
parameters is variable, with the arguments being passed in a list or tuple as
the args parameter. In either case, the arguments to the child process must
start with the name of the command being run.
The variants which include a second “p” near the end (spawnlp(),
spawnlpe(), spawnvp(), and spawnvpe()) will use the
PATH environment variable to locate the program file. When the
environment is being replaced (using one of the spawn*e() variants,
discussed in the next paragraph), the new environment is used as the source of
the PATH variable. The other variants, spawnl(),
spawnle(), spawnv(), and spawnve(), will not use the
PATH variable to locate the executable; path must contain an
appropriate absolute or relative path.
For spawnle(), spawnlpe(), spawnve(), and spawnvpe()
(note that these all end in “e”), the env parameter must be a mapping
which is used to define the environment variables for the new process (they are
used instead of the current process’ environment); the functions
spawnl(), spawnlp(), spawnv(), and spawnvp() all cause
the new process to inherit the environment of the current process. Note that
keys and values in the env dictionary must be strings; invalid keys or
values will cause the function to fail, with a return value of 127.
As an example, the following calls to spawnlp() and spawnvpe() are
equivalent:
import os
os.spawnlp(os.P_WAIT, 'cp', 'cp', 'index.html', '/dev/null')
L = ['cp', 'index.html', '/dev/null']
os.spawnvpe(os.P_WAIT, 'cp', L, os.environ)
Possible values for the mode parameter to the spawn*() family of
functions. If either of these values is given, the spawn*() functions
will return as soon as the new process has been created, with the process id as
the return value.
Possible value for the mode parameter to the spawn*() family of
functions. If this is given as mode, the spawn*() functions will not
return until the new process has run to completion and will return the exit code
of the process the run is successful, or -signal if a signal kills the
process.
Possible values for the mode parameter to the spawn*() family of
functions. These are less portable than those listed above. P_DETACH
is similar to P_NOWAIT, but the new process is detached from the
console of the calling process. If P_OVERLAY is used, the current
process will be replaced; the spawn*() function will not return.
When operation is not specified or 'open', this acts like double-clicking
the file in Windows Explorer, or giving the file name as an argument to the
start command from the interactive command shell: the file is opened
with whatever application (if any) its extension is associated.
When another operation is given, it must be a “command verb” that specifies
what should be done with the file. Common verbs documented by Microsoft are
'print' and 'edit' (to be used on files) as well as 'explore' and
'find' (to be used on directories).
startfile() returns as soon as the associated application is launched.
There is no option to wait for the application to close, and no way to retrieve
the application’s exit status. The path parameter is relative to the current
directory. If you want to use an absolute path, make sure the first character
is not a slash ('/'); the underlying Win32 ShellExecute() function
doesn’t work if it is. Use the os.path.normpath() function to ensure that
the path is properly encoded for Win32.
Execute the command (a string) in a subshell. This is implemented by calling
the Standard C function system(), and has the same limitations.
Changes to sys.stdin, etc. are not reflected in the environment of
the executed command. If command generates any output, it will be sent to
the interpreter standard output stream.
On Unix, the return value is the exit status of the process encoded in the
format specified for wait(). Note that POSIX does not specify the
meaning of the return value of the C system() function, so the return
value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after
running command. The shell is given by the Windows environment variable
COMSPEC: it is usually cmd.exe, which returns the exit
status of the command run; on systems using a non-native shell, consult your
shell documentation.
The subprocess module provides more powerful facilities for spawning
new processes and retrieving their results; using that module is preferable
to using this function. See the Replacing Older Functions with the subprocess Module section in
the subprocess documentation for some helpful recipes.
Return a 5-tuple of floating point numbers indicating accumulated (processor
or other) times, in seconds. The items are: user time, system time,
children’s user time, children’s system time, and elapsed real time since a
fixed point in the past, in that order. See the Unix manual page
times(2) or the corresponding Windows Platform API documentation.
On Windows, only the first two items are filled, the others are zero.
Wait for completion of a child process, and return a tuple containing its pid
and exit status indication: a 16-bit number, whose low byte is the signal number
that killed the process, and whose high byte is the exit status (if the signal
number is zero); the high bit of the low byte is set if a core file was
produced.
The details of this function differ on Unix and Windows.
On Unix: Wait for completion of a child process given by process id pid, and
return a tuple containing its process id and exit status indication (encoded as
for wait()). The semantics of the call are affected by the value of the
integer options, which should be 0 for normal operation.
If pid is greater than 0, waitpid() requests status information for
that specific process. If pid is 0, the request is for the status of any
child in the process group of the current process. If pid is -1, the
request pertains to any child of the current process. If pid is less than
-1, status is requested for any process in the process group -pid (the
absolute value of pid).
An OSError is raised with the value of errno when the syscall
returns -1.
On Windows: Wait for completion of a process given by process handle pid, and
return a tuple containing pid, and its exit status shifted left by 8 bits
(shifting makes cross-platform use of the function easier). A pid less than or
equal to 0 has no special meaning on Windows, and raises an exception. The
value of integer options has no effect. pid can refer to any process whose
id is known, not necessarily a child process. The spawn() functions called
with P_NOWAIT return suitable process handles.
Similar to waitpid(), except no process id argument is given and a
3-element tuple containing the child’s process id, exit status indication, and
resource usage information is returned. Refer to resource.getrusage() for details on resource usage information. The option
argument is the same as that provided to waitpid() and wait4().
Similar to waitpid(), except a 3-element tuple, containing the child’s
process id, exit status indication, and resource usage information is returned.
Refer to resource.getrusage() for details on resource usage
information. The arguments to wait4() are the same as those provided to
waitpid().
This option causes child processes to be reported if they have been stopped but
their current state has not been reported since they were stopped.
Availability: Unix.
The following functions take a process status code as returned by
system(), wait(), or waitpid() as a parameter. They may be
used to determine the disposition of a process.
Return string-valued system configuration values. name specifies the
configuration value to retrieve; it may be a string which is the name of a
defined system value; these names are specified in a number of standards (POSIX,
Unix 95, Unix 98, and others). Some platforms define additional names as well.
The names known to the host operating system are given as the keys of the
confstr_names dictionary. For configuration variables not included in that
mapping, passing an integer for name is also accepted.
If the configuration value specified by name isn’t defined, None is
returned.
If name is a string and is not known, ValueError is raised. If a
specific value for name is not supported by the host system, even if it is
included in confstr_names, an OSError is raised with
errno.EINVAL for the error number.
Dictionary mapping names accepted by confstr() to the integer values
defined for those names by the host operating system. This can be used to
determine the set of names known to the system.
Return the number of processes in the system run queue averaged over the last
1, 5, and 15 minutes or raises OSError if the load average was
unobtainable.
Return integer-valued system configuration values. If the configuration value
specified by name isn’t defined, -1 is returned. The comments regarding
the name parameter for confstr() apply here as well; the dictionary that
provides information on the known names is given by sysconf_names.
Dictionary mapping names accepted by sysconf() to the integer values
defined for those names by the host operating system. This can be used to
determine the set of names known to the system.
Availability: Unix.
The following data values are used to support path manipulation operations. These
are defined for all platforms.
Higher-level operations on pathnames are defined in the os.path module.
The character used by the operating system to separate pathname components.
This is '/' for POSIX and '\\' for Windows. Note that knowing this
is not sufficient to be able to parse or concatenate pathnames — use
os.path.split() and os.path.join() — but it is occasionally
useful. Also available via os.path.
An alternative character used by the operating system to separate pathname
components, or None if only one separator character exists. This is set to
'/' on Windows systems where sep is a backslash. Also available via
os.path.
The character conventionally used by the operating system to separate search
path components (as in PATH), such as ':' for POSIX or ';' for
Windows. Also available via os.path.
The string used to separate (or, rather, terminate) lines on the current
platform. This may be a single character, such as '\n' for POSIX, or
multiple characters, for example, '\r\n' for Windows. Do not use
os.linesep as a line terminator when writing files opened in text mode (the
default); use a single '\n' instead, on all platforms.
Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness source. The
returned data should be unpredictable enough for cryptographic applications,
though its exact quality depends on the OS implementation. On a UNIX-like
system this will query /dev/urandom, and on Windows it will use CryptGenRandom.
If a randomness source is not found, NotImplementedError will be raised.
The io module provides Python’s main facilities for dealing for various
types of I/O. There are three main types of I/O: text I/O, binary I/O, raw
I/O. These are generic categories, and various backing stores can be used for
each of them. Concrete objects belonging to any of these categories will often
be called streams; another common term is file-like objects.
Independently of its category, each concrete stream object will also have
various capabilities: it can be read-only, write-only, or read-write. It can
also allow arbitrary random access (seeking forwards or backwards to any
location), or only sequential access (for example in the case of a socket or
pipe).
All streams are careful about the type of data you give to them. For example
giving a str object to the write() method of a binary stream
will raise a TypeError. So will giving a bytes object to the
write() method of a text stream.
Text I/O expects and produces str objects. This means that whenever
the backing store is natively made of bytes (such as in the case of a file),
encoding and decoding of data is made transparently as well as optional
translation of platform-specific newline characters.
The easiest way to create a text stream is with open(), optionally
specifying an encoding:
f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
f = io.StringIO("some initial text data")
The text stream API is described in detail in the documentation for the
TextIOBase.
Binary I/O (also called buffered I/O) expects and produces bytes
objects. No encoding, decoding, or newline translation is performed. This
category of streams can be used for all kinds of non-text data, and also when
manual control over the handling of text data is desired.
The easiest way to create a binary stream is with open() with 'b' in
the mode string:
f = open("myfile.jpg", "rb")
In-memory binary streams are also available as BytesIO objects:
f = io.BytesIO(b"some initial binary data: \x00\x01")
The binary stream API is described in detail in the docs of
BufferedIOBase.
Other library modules may provide additional ways to create text or binary
streams. See socket.socket.makefile() for example.
Raw I/O (also called unbuffered I/O) is generally used as a low-level
building-block for binary and text streams; it is rarely useful to directly
manipulate a raw stream from user code. Nevertheless, you can create a raw
stream by opening a file in binary mode with buffering disabled:
f = open("myfile.jpg", "rb", buffering=0)
The raw stream API is described in detail in the docs of RawIOBase.
An int containing the default buffer size used by the module’s buffered I/O
classes. open() uses the file’s blksize (as obtained by
os.stat()) if possible.
It is also possible to use a str or bytes-like object as a
file for both reading and writing. For strings StringIO can be used
like a file opened in text mode. BytesIO can be used like a file
opened in binary mode. Both provide full read-write capabilities with random
access.
The implementation of I/O streams is organized as a hierarchy of classes. First
abstract base classes (ABCs), which are used to
specify the various categories of streams, then concrete classes providing the
standard stream implementations.
Note
The abstract base classes also provide default implementations of some
methods in order to help implementation of concrete stream classes. For
example, BufferedIOBase provides unoptimized implementations of
readinto() and readline().
At the top of the I/O hierarchy is the abstract base class IOBase. It
defines the basic interface to a stream. Note, however, that there is no
separation between reading and writing to streams; implementations are allowed
to raise UnsupportedOperation if they do not support a given operation.
The RawIOBase ABC extends IOBase. It deals with the reading
and writing of bytes to a stream. FileIO subclasses RawIOBase
to provide an interface to files in the machine’s file system.
The TextIOBase ABC, another subclass of IOBase, deals with
streams whose bytes represent text, and handles encoding and decoding to and
from strings. TextIOWrapper, which extends it, is a buffered text
interface to a buffered raw stream (BufferedIOBase). Finally,
StringIO is an in-memory stream for text.
Argument names are not part of the specification, and only the arguments of
open() are intended to be used as keyword arguments.
The abstract base class for all I/O classes, acting on streams of bytes.
There is no public constructor.
This class provides empty abstract implementations for many methods
that derived classes can override selectively; the default
implementations represent a file that cannot be read, written or
seeked.
Even though IOBase does not declare read(), readinto(),
or write() because their signatures will vary, implementations and
clients should consider those methods part of the interface. Also,
implementations may raise a IOError when operations they do not
support are called.
The basic type used for binary data read from or written to a file is
bytes. bytearrays are accepted too, and in some cases
(such as readinto) required. Text I/O classes work with
str data.
Note that calling any method (even inquiries) on a closed stream is
undefined. Implementations may raise IOError in this case.
IOBase (and its subclasses) support the iterator protocol, meaning that an
IOBase object can be iterated over yielding the lines in a stream.
Lines are defined slightly differently depending on whether the stream is
a binary stream (yielding bytes), or a text stream (yielding character
strings). See readline() below.
IOBase is also a context manager and therefore supports the
with statement. In this example, file is closed after the
with statement’s suite is finished—even if an exception occurs:
with open('spam.txt', 'w') as file:
file.write('Spam and eggs!')
IOBase provides these data attributes and methods:
Flush and close this stream. This method has no effect if the file is
already closed. Once the file is closed, any operation on the file
(e.g. reading or writing) will raise a ValueError.
As a convenience, it is allowed to call this method more than once;
only the first call, however, will have an effect.
Read and return one line from the stream. If limit is specified, at
most limit bytes will be read.
The line terminator is always b'\n' for binary files; for text files,
the newlines argument to open() can be used to select the line
terminator(s) recognized.
Read and return a list of lines from the stream. hint can be specified
to control the number of lines read: no more lines will be read if the
total size (in bytes/characters) of all lines so far exceeds hint.
Resize the stream to the given size in bytes (or the current position
if size is not specified). The current stream position isn’t changed.
This resizing can extend or reduce the current file size. In case of
extension, the contents of the new file area depend on the platform
(on most systems, additional bytes are zero-filled, on Windows they’re
undetermined). The new file size is returned.
Base class for raw binary I/O. It inherits IOBase. There is no
public constructor.
Raw binary I/O typically provides low-level access to an underlying OS
device or API, and does not try to encapsulate it in high-level primitives
(this is left to Buffered I/O and Text I/O, described later in this page).
In addition to the attributes and methods from IOBase,
RawIOBase provides the following methods:
Read up to n bytes from the object and return them. As a convenience,
if n is unspecified or -1, readall() is called. Otherwise,
only one system call is ever made. Fewer than n bytes may be
returned if the operating system call returns fewer than n bytes.
If 0 bytes are returned, and n was not 0, this indicates end of file.
If the object is in non-blocking mode and no bytes are available,
None is returned.
Read up to len(b) bytes into bytearray b and return the number
of bytes read. If the object is in non-blocking mode and no
bytes are available, None is returned.
Write the given bytes or bytearray object, b, to the underlying raw
stream and return the number of bytes written. This can be less than
len(b), depending on specifics of the underlying raw stream, and
especially if it is in non-blocking mode. None is returned if the
raw stream is set not to block and no single byte could be readily
written to it.
Base class for binary streams that support some kind of buffering.
It inherits IOBase. There is no public constructor.
The main difference with RawIOBase is that methods read(),
readinto() and write() will try (respectively) to read as much
input as requested or to consume all given output, at the expense of
making perhaps more than one system call.
In addition, those methods can raise BlockingIOError if the
underlying raw stream is in non-blocking mode and cannot take or give
enough data; unlike their RawIOBase counterparts, they will
never return None.
Besides, the read() method does not have a default
implementation that defers to readinto().
The underlying raw stream (a RawIOBase instance) that
BufferedIOBase deals with. This is not part of the
BufferedIOBase API and may not exist on some implementations.
Read and return up to n bytes. If the argument is omitted, None, or
negative, data is read and returned until EOF is reached. An empty bytes
object is returned if the stream is already at EOF.
If the argument is positive, and the underlying raw stream is not
interactive, multiple raw reads may be issued to satisfy the byte count
(unless EOF is reached first). But for interactive raw streams, at most
one raw read will be issued, and a short result does not imply that EOF is
imminent.
A BlockingIOError is raised if the underlying raw stream is in
non blocking-mode, and has no data available at the moment.
Read and return up to n bytes, with at most one call to the underlying
raw stream’s read() method. This can be useful if you
are implementing your own buffering on top of a BufferedIOBase
object.
Write the given bytes or bytearray object, b and return the number
of bytes written (never less than len(b), since if the write fails
an IOError will be raised). Depending on the actual
implementation, these bytes may be readily written to the underlying
stream, or held in a buffer for performance and latency reasons.
When in non-blocking mode, a BlockingIOError is raised if the
data needed to be written to the raw stream but it couldn’t accept
all the data without blocking.
FileIO represents an OS-level file containing bytes data.
It implements the RawIOBase interface (and therefore the
IOBase interface, too).
The name can be one of two things:
a character string or bytes object representing the path to the file
which will be opened;
an integer representing the number of an existing OS-level file descriptor
to which the resulting FileIO object will give access.
The mode can be 'r', 'w' or 'a' for reading (default), writing,
or appending. The file will be created if it doesn’t exist when opened for
writing or appending; it will be truncated when opened for writing. Add a
'+' to the mode to allow simultaneous reading and writing.
The read() (when called with a positive argument), readinto()
and write() methods on this class will only make one system call.
In addition to the attributes and methods from IOBase and
RawIOBase, FileIO provides the following data
attributes and methods:
Return a readable and writable view over the contents of the buffer
without copying them. Also, mutating the view will transparently
update the contents of the buffer:
class io.BufferedReader(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffer providing higher-level access to a readable, sequential
RawIOBase object. It inherits BufferedIOBase.
When reading data from this object, a larger amount of data may be
requested from the underlying raw stream, and kept in an internal buffer.
The buffered data can then be returned directly on subsequent reads.
The constructor creates a BufferedReader for the given readable
raw stream and buffer_size. If buffer_size is omitted,
DEFAULT_BUFFER_SIZE is used.
Return bytes from the stream without advancing the position. At most one
single read on the raw stream is done to satisfy the call. The number of
bytes returned may be less or more than requested.
Read and return up to n bytes with only one call on the raw stream. If
at least one byte is buffered, only buffered bytes are returned.
Otherwise, one raw stream read call is made.
class io.BufferedWriter(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffer providing higher-level access to a writeable, sequential
RawIOBase object. It inherits BufferedIOBase.
When writing to this object, data is normally held into an internal
buffer. The buffer will be written out to the underlying RawIOBase
object under various conditions, including:
when the buffer gets too small for all pending data;
Write the bytes or bytearray object, b and return the number of bytes
written. When in non-blocking mode, a BlockingIOError is raised
if the buffer needs to be written out but the raw stream blocks.
class io.BufferedRandom(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffered interface to random access streams. It inherits
BufferedReader and BufferedWriter, and further supports
seek() and tell() functionality.
The constructor creates a reader and writer for a seekable raw stream, given
in the first argument. If the buffer_size is omitted it defaults to
DEFAULT_BUFFER_SIZE.
A third argument, max_buffer_size, is supported, but unused and deprecated.
class io.BufferedRWPair(reader, writer, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffered I/O object combining two unidirectional RawIOBase
objects – one readable, the other writeable – into a single bidirectional
endpoint. It inherits BufferedIOBase.
reader and writer are RawIOBase objects that are readable and
writeable respectively. If the buffer_size is omitted it defaults to
DEFAULT_BUFFER_SIZE.
A fourth argument, max_buffer_size, is supported, but unused and
deprecated.
BufferedRWPair does not attempt to synchronize accesses to
its underlying raw streams. You should not pass it the same object
as reader and writer; use BufferedRandom instead.
Base class for text streams. This class provides a character and line based
interface to stream I/O. There is no readinto() method because
Python’s character strings are immutable. It inherits IOBase.
There is no public constructor.
TextIOBase provides or overrides these data attributes and
methods in addition to those from IOBase:
A string, a tuple of strings, or None, indicating the newlines
translated so far. Depending on the implementation and the initial
constructor flags, this may not be available.
The underlying binary buffer (a BufferedIOBase instance) that
TextIOBase deals with. This is not part of the
TextIOBase API and may not exist on some implementations.
Separate the underlying binary buffer from the TextIOBase and
return it.
After the underlying buffer has been detached, the TextIOBase is
in an unusable state.
Some TextIOBase implementations, like StringIO, may not
have the concept of an underlying buffer and calling this method will
raise UnsupportedOperation.
encoding gives the name of the encoding that the stream will be decoded or
encoded with. It defaults to locale.getpreferredencoding().
errors is an optional string that specifies how encoding and decoding
errors are to be handled. Pass 'strict' to raise a ValueError
exception if there is an encoding error (the default of None has the same
effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding
errors can lead to data loss.) 'replace' causes a replacement marker
(such as '?') to be inserted where there is malformed data. When
writing, 'xmlcharrefreplace' (replace with the appropriate XML character
reference) or 'backslashreplace' (replace with backslashed escape
sequences) can be used. Any other error handling name that has been
registered with codecs.register_error() is also valid.
newline can be None, '', '\n', '\r', or '\r\n'. It
controls the handling of line endings. If it is None, universal newlines
is enabled. With this enabled, on input, the lines endings '\n',
'\r', or '\r\n' are translated to '\n' before being returned to
the caller. Conversely, on output, '\n' is translated to the system
default line separator, os.linesep. If newline is any other of its
legal values, that newline becomes the newline when the file is read and it
is returned untranslated. On output, '\n' is converted to the newline.
If line_buffering is True, flush() is implied when a call to
write contains a newline character.
class io.StringIO(initial_value='', newline=None)¶
An in-memory stream for text I/O.
The initial value of the buffer (an empty string by default) can be set by
providing initial_value. The newline argument works like that of
TextIOWrapper. The default is to do no newline translation.
StringIO provides this method in addition to those from
TextIOBase and its parents:
By reading and writing only large chunks of data even when the user asks for a
single byte, buffered I/O hides any inefficiency in calling and executing the
operating system’s unbuffered I/O routines. The gain depends on the OS and the
kind of I/O which is performed. For example, on some modern OSes such as Linux,
unbuffered disk I/O can be as fast as buffered I/O. The bottom line, however,
is that buffered I/O offers predictable performance regardless of the platform
and the backing device. Therefore, it is most always preferable to use buffered
I/O rather than unbuffered I/O for binary datal
Text I/O over a binary storage (such as a file) is significantly slower than
binary I/O over the same storage, because it requires conversions between
unicode and binary data using a character codec. This can become noticeable
handling huge amounts of text data like large log files. Also,
TextIOWrapper.tell() and TextIOWrapper.seek() are both quite slow
due to the reconstruction algorithm used.
StringIO, however, is a native in-memory unicode container and will
exhibit similar speed to BytesIO.
Binary buffered objects (instances of BufferedReader,
BufferedWriter, BufferedRandom and BufferedRWPair)
are not reentrant. While reentrant calls will not happen in normal situations,
they can arise from doing I/O in a signal handler. If a thread tries to
renter a buffered object which it is already accessing, a RuntimeError is
raised. Note this doesn’t prohibit a different thread from entering the
buffered object.
The above implicitly extends to text files, since the open() function
will wrap a buffered object inside a TextIOWrapper. This includes
standard streams and therefore affects the built-in function print() as
well.
This module provides various time-related functions. For related
functionality, see also the datetime and calendar modules.
Although this module is always available,
not all functions are available on all platforms. Most of the functions
defined in this module call platform C library functions with the same name. It
may sometimes be helpful to consult the platform documentation, because the
semantics of these functions varies among platforms.
An explanation of some terminology and conventions is in order.
The epoch is the point where the time starts. On January 1st of that
year, at 0 hours, the “time since the epoch” is zero. For Unix, the epoch is
1970. To find out what the epoch is, look at gmtime(0).
The functions in this module may not handle dates and times before the epoch or
far in the future. The cut-off point in the future is determined by the C
library; for 32-bit systems, it is typically in 2038.
Year 2000 (Y2K) issues: Python depends on the platform’s C library, which
generally doesn’t have year 2000 issues, since all dates and times are
represented internally as seconds since the epoch. Function strptime()
can parse 2-digit years when given %y format code. When 2-digit years are
parsed, they are converted according to the POSIX and ISO C standards: values
69–99 are mapped to 1969–1999, and values 0–68 are mapped to 2000–2068.
For backward compatibility, years with less than 4 digits are treated
specially by asctime(), mktime(), and strftime() functions
that operate on a 9-tuple or struct_time values. If year (the first
value in the 9-tuple) is specified with less than 4 digits, its interpretation
depends on the value of accept2dyear variable.
If accept2dyear is true (default), a backward compatibility behavior is
invoked as follows:
for 2-digit year, century is guessed according to POSIX rules for
%y strptime format. A deprecation warning is issued when century
information is guessed in this way.
for 3-digit or negative year, a ValueError exception is raised.
If accept2dyear is false (set by the program or as a result of a
non-empty value assigned to PYTHONY2K environment variable) all year
values are interpreted as given.
UTC is Coordinated Universal Time (formerly known as Greenwich Mean Time, or
GMT). The acronym UTC is not a mistake but a compromise between English and
French.
DST is Daylight Saving Time, an adjustment of the timezone by (usually) one
hour during part of the year. DST rules are magic (determined by local law) and
can change from year to year. The C library has a table containing the local
rules (often it is read from a system file for flexibility) and is the only
source of True Wisdom in this respect.
The precision of the various real-time functions may be less than suggested by
the units in which their value or argument is expressed. E.g. on most Unix
systems, the clock “ticks” only 50 or 100 times a second.
On the other hand, the precision of time() and sleep() is better
than their Unix equivalents: times are expressed as floating point numbers,
time() returns the most accurate time available (using Unix
gettimeofday() where available), and sleep() will accept a time
with a nonzero fraction (Unix select() is used to implement this, where
available).
Boolean value indicating whether two-digit year values will be
mapped to 1969–2068 range by asctime(), mktime(), and
strftime() functions. This is true by default, but will be
set to false if the environment variable PYTHONY2K has
been set to a non-empty string. It may also be modified at run
time.
Deprecated since version 3.2:
Deprecated since version 3.2: Mapping of 2-digit year values by asctime(),
mktime(), and strftime() functions to 1969–2068
range is deprecated. Programs that need to process 2-digit
years should use %y code available in strptime()
function or convert 2-digit year values to 4-digit themselves.
The offset of the local DST timezone, in seconds west of UTC, if one is defined.
This is negative if the local DST timezone is east of UTC (as in Western Europe,
including the UK). Only use this if daylight is nonzero.
Convert a tuple or struct_time representing a time as returned by
gmtime() or localtime() to a string of the following
form: 'SunJun2023:21:051993'. If t is not provided, the current time
as returned by localtime() is used. Locale information is not used by
asctime().
Note
Unlike the C function of the same name, there is no trailing newline.
On Unix, return the current processor time as a floating point number expressed
in seconds. The precision, and in fact the very definition of the meaning of
“processor time”, depends on that of the C function of the same name, but in any
case, this is the function to use for benchmarking Python or timing algorithms.
On Windows, this function returns wall-clock seconds elapsed since the first
call to this function, as a floating point number, based on the Win32 function
QueryPerformanceCounter(). The resolution is typically better than one
microsecond.
Convert a time expressed in seconds since the epoch to a string representing
local time. If secs is not provided or None, the current time as
returned by time() is used. ctime(secs) is equivalent to
asctime(localtime(secs)). Locale information is not used by ctime().
Convert a time expressed in seconds since the epoch to a struct_time in
UTC in which the dst flag is always zero. If secs is not provided or
None, the current time as returned by time() is used. Fractions
of a second are ignored. See above for a description of the
struct_time object. See calendar.timegm() for the inverse of this
function.
Like gmtime() but converts to local time. If secs is not provided or
None, the current time as returned by time() is used. The dst
flag is set to 1 when DST applies to the given time.
This is the inverse function of localtime(). Its argument is the
struct_time or full 9-tuple (since the dst flag is needed; use -1
as the dst flag if it is unknown) which expresses the time in local time, not
UTC. It returns a floating point number, for compatibility with time().
If the input value cannot be represented as a valid time, either
OverflowError or ValueError will be raised (which depends on
whether the invalid value is caught by Python or the underlying C libraries).
The earliest date for which it can generate a time is platform-dependent.
Suspend execution for the given number of seconds. The argument may be a
floating point number to indicate a more precise sleep time. The actual
suspension time may be less than that requested because any caught signal will
terminate the sleep() following execution of that signal’s catching
routine. Also, the suspension time may be longer than requested by an arbitrary
amount because of the scheduling of other activity in the system.
Convert a tuple or struct_time representing a time as returned by
gmtime() or localtime() to a string as specified by the format
argument. If t is not provided, the current time as returned by
localtime() is used. format must be a string. ValueError is
raised if any field in t is outside of the allowed range.
0 is a legal argument for any position in the time tuple; if it is normally
illegal the value is forced to a correct one.
The following directives can be embedded in the format string. They are shown
without the optional field width and precision specification, and are replaced
by the indicated characters in the strftime() result:
Directive
Meaning
Notes
%a
Locale’s abbreviated weekday name.
%A
Locale’s full weekday name.
%b
Locale’s abbreviated month name.
%B
Locale’s full month name.
%c
Locale’s appropriate date and time
representation.
%d
Day of the month as a decimal number [01,31].
%H
Hour (24-hour clock) as a decimal number
[00,23].
%I
Hour (12-hour clock) as a decimal number
[01,12].
%j
Day of the year as a decimal number [001,366].
%m
Month as a decimal number [01,12].
%M
Minute as a decimal number [00,59].
%p
Locale’s equivalent of either AM or PM.
(1)
%S
Second as a decimal number [00,61].
(2)
%U
Week number of the year (Sunday as the first
day of the week) as a decimal number [00,53].
All days in a new year preceding the first
Sunday are considered to be in week 0.
(3)
%w
Weekday as a decimal number [0(Sunday),6].
%W
Week number of the year (Monday as the first
day of the week) as a decimal number [00,53].
All days in a new year preceding the first
Monday are considered to be in week 0.
(3)
%x
Locale’s appropriate date representation.
%X
Locale’s appropriate time representation.
%y
Year without century as a decimal number
[00,99].
%Y
Year with century as a decimal number.
(4)
%Z
Time zone name (no characters if no time zone
exists).
%%
A literal '%' character.
Notes:
When used with the strptime() function, the %p directive only affects
the output hour field if the %I directive is used to parse the hour.
The range really is 0 to 61; value 60 is valid in
timestamps representing leap seconds and value 61 is supported
for historical reasons.
When used with the strptime() function, %U and %W are only used in
calculations when the day of the week and the year are specified.
Produces different results depending on the value of
time.accept2dyear variable. See Year 2000 (Y2K)
issues for details.
Here is an example, a format for dates compatible with that specified in the
RFC 2822 Internet email standard. [1]
>>> from time import gmtime, strftime
>>> strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
'Thu, 28 Jun 2001 14:17:15 +0000'
Additional directives may be supported on certain platforms, but only the ones
listed here have a meaning standardized by ANSI C.
On some platforms, an optional field width and precision specification can
immediately follow the initial '%' of a directive in the following order;
this is also not portable. The field width is normally 2 except for %j where
it is 3.
Parse a string representing a time according to a format. The return value
is a struct_time as returned by gmtime() or
localtime().
The format parameter uses the same directives as those used by
strftime(); it defaults to "%a%b%d%H:%M:%S%Y" which matches the
formatting returned by ctime(). If string cannot be parsed according
to format, or if it has excess data after parsing, ValueError is
raised. The default values used to fill in any missing data when more
accurate values cannot be inferred are (1900,1,1,0,0,0,0,1,-1).
Both string and format must be strings.
Support for the %Z directive is based on the values contained in tzname
and whether daylight is true. Because of this, it is platform-specific
except for recognizing UTC and GMT which are always known (and are considered to
be non-daylight savings timezones).
Only the directives specified in the documentation are supported. Because
strftime() is implemented per platform it can sometimes offer more
directives than those listed. But strptime() is independent of any platform
and thus does not necessarily support all directives available that are not
documented as supported.
The type of the time value sequence returned by gmtime(),
localtime(), and strptime(). It is an object with a named
tuple interface: values can be accessed by index and by attribute name. The
following values are present:
Note that unlike the C structure, the month value is a range of [1, 12], not
[0, 11]. A year value will be handled as described under Year 2000
(Y2K) issues above. A -1 argument as the daylight
savings flag, passed to mktime() will usually result in the correct
daylight savings state to be filled in.
When a tuple with an incorrect length is passed to a function expecting a
struct_time, or having elements of the wrong type, a
TypeError is raised.
Return the time as a floating point number expressed in seconds since the epoch,
in UTC. Note that even though the time is always returned as a floating point
number, not all systems provide time with a better precision than 1 second.
While this function normally returns non-decreasing values, it can return a
lower value than a previous call if the system clock has been set back between
the two calls.
A tuple of two strings: the first is the name of the local non-DST timezone, the
second is the name of the local DST timezone. If no DST timezone is defined,
the second string should not be used.
Resets the time conversion rules used by the library routines. The environment
variable TZ specifies how this is done.
Availability: Unix.
Note
Although in many cases, changing the TZ environment variable may
affect the output of functions like localtime() without calling
tzset(), this behavior should not be relied on.
The TZ environment variable should contain no whitespace.
The standard format of the TZ environment variable is (whitespace
added for clarity):
Three or more alphanumerics giving the timezone abbreviations. These will be
propagated into time.tzname
offset
The offset has the form: ±hh[:mm[:ss]]. This indicates the value
added the local time to arrive at UTC. If preceded by a ‘-‘, the timezone
is east of the Prime Meridian; otherwise, it is west. If no offset follows
dst, summer time is assumed to be one hour ahead of standard time.
start[/time],end[/time]
Indicates when to change to and back from DST. The format of the
start and end dates are one of the following:
Jn
The Julian day n (1 <= n <= 365). Leap days are not counted, so in
all years February 28 is day 59 and March 1 is day 60.
n
The zero-based Julian day (0 <= n <= 365). Leap days are counted, and
it is possible to refer to February 29.
Mm.n.d
The d‘th day (0 <= d <= 6) or week n of month m of the year (1
<= n <= 5, 1 <= m <= 12, where week 5 means “the last d day in
month m” which may occur in either the fourth or the fifth
week). Week 1 is the first week in which the d‘th day occurs. Day
zero is Sunday.
time has the same format as offset except that no leading sign
(‘-‘ or ‘+’) is allowed. The default, if time is not given, is 02:00:00.
On many Unix systems (including *BSD, Linux, Solaris, and Darwin), it is more
convenient to use the system’s zoneinfo (tzfile(5)) database to
specify the timezone rules. To do this, set the TZ environment
variable to the path of the required timezone datafile, relative to the root of
the systems ‘zoneinfo’ timezone database, usually located at
/usr/share/zoneinfo. For example, 'US/Eastern',
'Australia/Melbourne', 'Egypt' or 'Europe/Amsterdam'.
The use of %Z is now deprecated, but the %z escape that expands to the
preferred hour/minute offset is not supported by all ANSI C libraries. Also, a
strict reading of the original 1982 RFC 822 standard calls for a two-digit
year (%y rather than %Y), but practice moved to 4-digit years long before the
year 2000. After that, RFC 822 became obsolete and the 4-digit year has
been first recommended by RFC 1123 and then mandated by RFC 2822.
argparse — Parser for command-line options, arguments and sub-commands¶
The argparse module makes it easy to write user-friendly command-line
interfaces. The program defines what arguments it requires, and argparse
will figure out how to parse those out of sys.argv. The argparse
module also automatically generates help and usage messages and issues errors
when users give the program invalid arguments.
The following code is a Python program that takes a list of integers and
produces either the sum or the max:
import argparse
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
const=sum, default=max,
help='sum the integers (default: find the max)')
args = parser.parse_args()
print(args.accumulate(args.integers))
Assuming the Python code above is saved into a file called prog.py, it can
be run at the command line and provides useful help messages:
$ prog.py -h
usage: prog.py [-h] [--sum] N [N ...]
Process some integers.
positional arguments:
N an integer for the accumulator
optional arguments:
-h, --help show this help message and exit
--sum sum the integers (default: find the max)
When run with the appropriate arguments, it prints either the sum or the max of
the command-line integers:
$ prog.py 1 2 3 4
4
$ prog.py 1 2 3 4 --sum
10
If invalid arguments are passed in, it will issue an error:
$ prog.py a b c
usage: prog.py [-h] [--sum] N [N ...]
prog.py: error: argument N: invalid int value: 'a'
The following sections walk you through this example.
Filling an ArgumentParser with information about program arguments is
done by making calls to the add_argument() method.
Generally, these calls tell the ArgumentParser how to take the strings
on the command line and turn them into objects. This information is stored and
used when parse_args() is called. For example:
>>> parser.add_argument('integers', metavar='N', type=int, nargs='+',
... help='an integer for the accumulator')
>>> parser.add_argument('--sum', dest='accumulate', action='store_const',
... const=sum, default=max,
... help='sum the integers (default: find the max)')
Later, calling parse_args() will return an object with
two attributes, integers and accumulate. The integers attribute
will be a list of one or more ints, and the accumulate attribute will be
either the sum() function, if --sum was specified at the command line,
or the max() function if it was not.
ArgumentParser parses arguments through the
parse_args() method. This will inspect the command line,
convert each arg to the appropriate type and then invoke the appropriate action.
In most cases, this means a simple Namespace object will be built up from
attributes parsed out of the command line:
In a script, parse_args() will typically be called with no
arguments, and the ArgumentParser will automatically determine the
command-line arguments from sys.argv.
Most calls to the ArgumentParser constructor will use the
description= keyword argument. This argument gives a brief description of
what the program does and how it works. In help messages, the description is
displayed between the command-line usage string and the help messages for the
various arguments:
>>> parser = argparse.ArgumentParser(description='A foo that bars')
>>> parser.print_help()
usage: argparse.py [-h]
A foo that bars
optional arguments:
-h, --help show this help message and exit
By default, the description will be line-wrapped so that it fits within the
given space. To change this behavior, see the formatter_class argument.
Some programs like to display additional description of the program after the
description of the arguments. Such text can be specified using the epilog=
argument to ArgumentParser:
>>> parser = argparse.ArgumentParser(
... description='A foo that bars',
... epilog="And that's how you'd foo a bar")
>>> parser.print_help()
usage: argparse.py [-h]
A foo that bars
optional arguments:
-h, --help show this help message and exit
And that's how you'd foo a bar
As with the description argument, the epilog= text is by default
line-wrapped, but this behavior can be adjusted with the formatter_class
argument to ArgumentParser.
By default, ArgumentParser objects add an option which simply displays
the parser’s help message. For example, consider a file named
myprogram.py containing the following code:
If -h or --help is supplied at the command line, the ArgumentParser
help will be printed:
$ python myprogram.py --help
usage: myprogram.py [-h] [--foo FOO]
optional arguments:
-h, --help show this help message and exit
--foo FOO foo help
Occasionally, it may be useful to disable the addition of this help option.
This can be achieved by passing False as the add_help= argument to
ArgumentParser:
The help option is typically -h/--help. The exception to this is
if the prefix_chars= is specified and does not include '-', in
which case -h and --help are not valid options. In
this case, the first character in prefix_chars is used to prefix
the help options:
>>> parser = argparse.ArgumentParser(prog='PROG', prefix_chars='+/')
>>> parser.print_help()
usage: PROG [+h]
optional arguments:
+h, ++help show this help message and exit
Most command-line options will use '-' as the prefix, e.g. -f/--foo.
Parsers that need to support different or additional prefix
characters, e.g. for options
like +f or /foo, may specify them using the prefix_chars= argument
to the ArgumentParser constructor:
Sometimes, for example when dealing with a particularly long argument lists, it
may make sense to keep the list of arguments in a file rather than typing it out
at the command line. If the fromfile_prefix_chars= argument is given to the
ArgumentParser constructor, then arguments that start with any of the
specified characters will be treated as files, and will be replaced by the
arguments they contain. For example:
>>> with open('args.txt', 'w') as fp:
... fp.write('-f\nbar')
>>> parser = argparse.ArgumentParser(fromfile_prefix_chars='@')
>>> parser.add_argument('-f')
>>> parser.parse_args(['-f', 'foo', '@args.txt'])
Namespace(f='bar')
Arguments read from a file must by default be one per line (but see also
convert_arg_line_to_args()) and are treated as if they
were in the same place as the original file referencing argument on the command
line. So in the example above, the expression ['-f','foo','@args.txt']
is considered equivalent to the expression ['-f','foo','-f','bar'].
The fromfile_prefix_chars= argument defaults to None, meaning that
arguments will never be treated as file references.
Generally, argument defaults are specified either by passing a default to
add_argument() or by calling the
set_defaults() methods with a specific set of name-value
pairs. Sometimes however, it may be useful to specify a single parser-wide
default for arguments. This can be accomplished by passing the
argument_default= keyword argument to ArgumentParser. For example,
to globally suppress attribute creation on parse_args()
calls, we supply argument_default=SUPPRESS:
Sometimes, several parsers share a common set of arguments. Rather than
repeating the definitions of these arguments, a single parser with all the
shared arguments and passed to parents= argument to ArgumentParser
can be used. The parents= argument takes a list of ArgumentParser
objects, collects all the positional and optional actions from them, and adds
these actions to the ArgumentParser object being constructed:
Note that most parent parsers will specify add_help=False. Otherwise, the
ArgumentParser will see two -h/--help options (one in the parent
and one in the child) and raise an error.
Note
You must fully initialize the parsers before passing them via parents=.
If you change the parent parsers after the child parser, those changes will
not be reflected in the child.
ArgumentParser objects allow the help formatting to be customized by
specifying an alternate formatting class. Currently, there are three such
classes:
The first two allow more control over how textual descriptions are displayed,
while the last automatically adds information about argument default values.
>>> parser = argparse.ArgumentParser(
... prog='PROG',
... description='''this description
... was indented weird
... but that is okay''',
... epilog='''
... likewise for this epilog whose whitespace will
... be cleaned up and whose words will be wrapped
... across a couple lines''')
>>> parser.print_help()
usage: PROG [-h]
this description was indented weird but that is okay
optional arguments:
-h, --help show this help message and exit
likewise for this epilog whose whitespace will be cleaned up and whose words
will be wrapped across a couple lines
>>> parser = argparse.ArgumentParser(
... prog='PROG',
... formatter_class=argparse.RawDescriptionHelpFormatter,
... description=textwrap.dedent('''\
... Please do not mess up this text!
... --------------------------------
... I have indented it
... exactly the way
... I want it
... '''))
>>> parser.print_help()
usage: PROG [-h]
Please do not mess up this text!
--------------------------------
I have indented it
exactly the way
I want it
optional arguments:
-h, --help show this help message and exit
RawTextHelpFormatter maintains whitespace for all sorts of help text
including argument descriptions.
The other formatter class available, ArgumentDefaultsHelpFormatter,
will add information about the default value of each of the arguments:
ArgumentParser objects do not allow two actions with the same option
string. By default, ArgumentParser objects raises an exception if an
attempt is made to create an argument with an option string that is already in
use:
Sometimes (e.g. when using parents) it may be useful to simply override any
older arguments with the same option string. To get this behavior, the value
'resolve' can be supplied to the conflict_handler= argument of
ArgumentParser:
>>> parser = argparse.ArgumentParser(prog='PROG', conflict_handler='resolve')
>>> parser.add_argument('-f', '--foo', help='old foo help')
>>> parser.add_argument('--foo', help='new foo help')
>>> parser.print_help()
usage: PROG [-h] [-f FOO] [--foo FOO]
optional arguments:
-h, --help show this help message and exit
-f FOO old foo help
--foo FOO new foo help
Note that ArgumentParser objects only remove an action if all of its
option strings are overridden. So, in the example above, the old -f/--foo
action is retained as the -f action, because only the --foo option
string was overridden.
By default, ArgumentParser objects uses sys.argv[0] to determine
how to display the name of the program in help messages. This default is almost
always desirable because it will make the help messages match how the program was
invoked on the command line. For example, consider a file named
myprogram.py with the following code:
The help for this program will display myprogram.py as the program name
(regardless of where the program was invoked from):
$ python myprogram.py --help
usage: myprogram.py [-h] [--foo FOO]
optional arguments:
-h, --help show this help message and exit
--foo FOO foo help
$ cd ..
$ python subdir\myprogram.py --help
usage: myprogram.py [-h] [--foo FOO]
optional arguments:
-h, --help show this help message and exit
--foo FOO foo help
To change this default behavior, another value can be supplied using the
prog= argument to ArgumentParser:
>>> parser = argparse.ArgumentParser(prog='myprogram')
>>> parser.print_help()
usage: myprogram [-h]
optional arguments:
-h, --help show this help message and exit
Note that the program name, whether determined from sys.argv[0] or from the
prog= argument, is available to help messages using the %(prog)s format
specifier.
>>> parser = argparse.ArgumentParser(prog='myprogram')
>>> parser.add_argument('--foo', help='foo of the %(prog)s program')
>>> parser.print_help()
usage: myprogram [-h] [--foo FOO]
optional arguments:
-h, --help show this help message and exit
--foo FOO foo of the myprogram program
By default, ArgumentParser calculates the usage message from the
arguments it contains:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--foo', nargs='?', help='foo help')
>>> parser.add_argument('bar', nargs='+', help='bar help')
>>> parser.print_help()
usage: PROG [-h] [--foo [FOO]] bar [bar ...]
positional arguments:
bar bar help
optional arguments:
-h, --help show this help message and exit
--foo [FOO] foo help
The default message can be overridden with the usage= keyword argument:
>>> parser = argparse.ArgumentParser(prog='PROG', usage='%(prog)s [options]')
>>> parser.add_argument('--foo', nargs='?', help='foo help')
>>> parser.add_argument('bar', nargs='+', help='bar help')
>>> parser.print_help()
usage: PROG [options]
positional arguments:
bar bar help
optional arguments:
-h, --help show this help message and exit
--foo [FOO] foo help
The %(prog)s format specifier is available to fill in the program name in
your usage messages.
The add_argument() method must know whether an optional
argument, like -f or --foo, or a positional argument, like a list of
filenames, is expected. The first arguments passed to
add_argument() must therefore be either a series of
flags, or a simple argument name. For example, an optional argument could
be created like:
>>> parser.add_argument('-f', '--foo')
while a positional argument could be created like:
>>> parser.add_argument('bar')
When parse_args() is called, optional arguments will be
identified by the - prefix, and the remaining arguments will be assumed to
be positional:
ArgumentParser objects associate command-line arguments with actions. These
actions can do just about anything with the command-line arguments associated with
them, though most actions simply add an attribute to the object returned by
parse_args(). The action keyword argument specifies
how the command-line arguments should be handled. The supported actions are:
'store' - This just stores the argument’s value. This is the default
action. For example:
'store_const' - This stores the value specified by the const keyword
argument. (Note that the const keyword argument defaults to the rather
unhelpful None.) The 'store_const' action is most commonly used with
optional arguments that specify some sort of flag. For example:
'append' - This stores a list, and appends each argument value to the
list. This is useful to allow an option to be specified multiple times.
Example usage:
'append_const' - This stores a list, and appends the value specified by
the const keyword argument to the list. (Note that the const keyword
argument defaults to None.) The 'append_const' action is typically
useful when multiple arguments need to store constants to the same list. For
example:
You can also specify an arbitrary action by passing an object that implements
the Action API. The easiest way to do this is to extend
argparse.Action, supplying an appropriate __call__ method. The
__call__ method should accept four parameters:
parser - The ArgumentParser object which contains this action.
namespace - The Namespace object that will be returned by
parse_args(). Most actions add an attribute to this
object.
values - The associated command-line arguments, with any type conversions
applied. (Type conversions are specified with the type keyword argument to
add_argument().
option_string - The option string that was used to invoke this action.
The option_string argument is optional, and will be absent if the action
is associated with a positional argument.
ArgumentParser objects usually associate a single command-line argument with a
single action to be taken. The nargs keyword argument associates a
different number of command-line arguments with a single action. The supported
values are:
N (an integer). N arguments from the command line will be gathered together into a
list. For example:
Note that nargs=1 produces a list of one item. This is different from
the default, in which the item is produced by itself.
'?'. One arg will be consumed from the command line if possible, and
produced as a single item. If no command-line arg is present, the value from
default will be produced. Note that for optional arguments, there is an
additional case - the option string is present but not followed by a
command-line arg. In this case the value from const will be produced. Some
examples to illustrate this:
'*'. All command-line arguments present are gathered into a list. Note that
it generally doesn’t make much sense to have more than one positional argument
with nargs='*', but multiple optional arguments with nargs='*' is
possible. For example:
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', nargs='*')
>>> parser.add_argument('--bar', nargs='*')
>>> parser.add_argument('baz', nargs='*')
>>> parser.parse_args('a b --foo x y --bar 1 2'.split())
Namespace(bar=['1', '2'], baz=['a', 'b'], foo=['x', 'y'])
'+'. Just like '*', all command-line args present are gathered into a
list. Additionally, an error message will be generated if there wasn’t at
least one command-line arg present. For example:
If the nargs keyword argument is not provided, the number of arguments consumed
is determined by the action. Generally this means a single command-line arg
will be consumed and a single item (not a list) will be produced.
The const argument of add_argument() is used to hold
constant values that are not read from the command line but are required for
the various ArgumentParser actions. The two most common uses of it are:
When add_argument() is called with
action='store_const' or action='append_const'. These actions add the
const value to one of the attributes of the object returned by parse_args(). See the action description for examples.
When add_argument() is called with option strings
(like -f or --foo) and nargs='?'. This creates an optional
argument that can be followed by zero or one command-line arguments.
When parsing the command line, if the option string is encountered with no
command-line arg following it, the value of const will be assumed instead.
See the nargs description for examples.
All optional arguments and some positional arguments may be omitted at the
command line. The default keyword argument of
add_argument(), whose value defaults to None,
specifies what value should be used if the command-line arg is not present.
For optional arguments, the default value is used when the option string
was not present at the command line:
By default, ArgumentParser objects read command-line arguments in as simple
strings. However, quite often the command-line string should instead be
interpreted as another type, like a float or int. The
type keyword argument of add_argument() allows any
necessary type-checking and type conversions to be performed. Common built-in
types and functions can be used directly as the value of the type argument:
To ease the use of various types of files, the argparse module provides the
factory FileType which takes the mode= and bufsize= arguments of the
open() function. For example, FileType('w') can be used to create a
writable file:
Some command-line arguments should be selected from a restricted set of values.
These can be handled by passing a container object as the choices keyword
argument to add_argument(). When the command line is
parsed, arg values will be checked, and an error message will be displayed if
the arg was not one of the acceptable values:
Note that inclusion in the choices container is checked after any type
conversions have been performed, so the type of the objects in the choices
container should match the type specified:
In general, the argparse module assumes that flags like -f and --bar
indicate optional arguments, which can always be omitted at the command line.
To make an option required, True can be specified for the required=
keyword argument to add_argument():
The help value is a string containing a brief description of the argument.
When a user requests help (usually by using -h or --help at the
command line), these help descriptions will be displayed with each
argument:
>>> parser = argparse.ArgumentParser(prog='frobble')
>>> parser.add_argument('--foo', action='store_true',
... help='foo the bars before frobbling')
>>> parser.add_argument('bar', nargs='+',
... help='one of the bars to be frobbled')
>>> parser.parse_args('-h'.split())
usage: frobble [-h] [--foo] bar [bar ...]
positional arguments:
bar one of the bars to be frobbled
optional arguments:
-h, --help show this help message and exit
--foo foo the bars before frobbling
The help strings can include various format specifiers to avoid repetition
of things like the program name or the argument default. The available
specifiers include the program name, %(prog)s and most keyword arguments to
add_argument(), e.g. %(default)s, %(type)s, etc.:
>>> parser = argparse.ArgumentParser(prog='frobble')
>>> parser.add_argument('bar', nargs='?', type=int, default=42,
... help='the bar to %(prog)s (default: %(default)s)')
>>> parser.print_help()
usage: frobble [-h] [bar]
positional arguments:
bar the bar to frobble (default: 42)
optional arguments:
-h, --help show this help message and exit
When ArgumentParser generates help messages, it need some way to refer
to each expected argument. By default, ArgumentParser objects use the dest
value as the “name” of each object. By default, for positional argument
actions, the dest value is used directly, and for optional argument actions,
the dest value is uppercased. So, a single positional argument with
dest='bar' will that argument will be referred to as bar. A single
optional argument --foo that should be followed by a single command-line arg
will be referred to as FOO. An example:
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo')
>>> parser.add_argument('bar')
>>> parser.parse_args('X --foo Y'.split())
Namespace(bar='X', foo='Y')
>>> parser.print_help()
usage: [-h] [--foo FOO] bar
positional arguments:
bar
optional arguments:
-h, --help show this help message and exit
--foo FOO
An alternative name can be specified with metavar:
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', metavar='YYY')
>>> parser.add_argument('bar', metavar='XXX')
>>> parser.parse_args('X --foo Y'.split())
Namespace(bar='X', foo='Y')
>>> parser.print_help()
usage: [-h] [--foo YYY] XXX
positional arguments:
XXX
optional arguments:
-h, --help show this help message and exit
--foo YYY
Note that metavar only changes the displayed name - the name of the
attribute on the parse_args() object is still determined
by the dest value.
Different values of nargs may cause the metavar to be used multiple times.
Providing a tuple to metavar specifies a different display for each of the
arguments:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('-x', nargs=2)
>>> parser.add_argument('--foo', nargs=2, metavar=('bar', 'baz'))
>>> parser.print_help()
usage: PROG [-h] [-x X X] [--foo bar baz]
optional arguments:
-h, --help show this help message and exit
-x X X
--foo bar baz
Most ArgumentParser actions add some value as an attribute of the
object returned by parse_args(). The name of this
attribute is determined by the dest keyword argument of
add_argument(). For positional argument actions,
dest is normally supplied as the first argument to
add_argument():
For optional argument actions, the value of dest is normally inferred from
the option strings. ArgumentParser generates the value of dest by
taking the first long option string and stripping away the initial '--'
string. If no long option strings were supplied, dest will be derived from
the first short option string by stripping the initial '-' character. Any
internal '-' characters will be converted to '_' characters to make sure
the string is a valid attribute name. The examples below illustrate this
behavior:
Convert argument strings to objects and assign them as attributes of the
namespace. Return the populated namespace.
Previous calls to add_argument() determine exactly what objects are
created and how they are assigned. See the documentation for
add_argument() for details.
By default, the arg strings are taken from sys.argv, and a new empty
Namespace object is created for the attributes.
The parse_args() method supports several ways of
specifying the value of an option (if it takes one). In the simplest case, the
option and its value are passed as two separate arguments:
For long options (options with names longer than a single character), the option
and value can also be passed as a single command-line argument, using = to
separate them:
While parsing the command line, parse_args() checks for a
variety of errors, including ambiguous options, invalid types, invalid options,
wrong number of positional arguments, etc. When it encounters such an error,
it exits and prints the error along with a usage message:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--foo', type=int)
>>> parser.add_argument('bar', nargs='?')
>>> # invalid type
>>> parser.parse_args(['--foo', 'spam'])
usage: PROG [-h] [--foo FOO] [bar]
PROG: error: argument --foo: invalid int value: 'spam'
>>> # invalid option
>>> parser.parse_args(['--bar'])
usage: PROG [-h] [--foo FOO] [bar]
PROG: error: no such option: --bar
>>> # wrong number of arguments
>>> parser.parse_args(['spam', 'badger'])
usage: PROG [-h] [--foo FOO] [bar]
PROG: error: extra arguments found: badger
The parse_args() method attempts to give errors whenever
the user has clearly made a mistake, but some situations are inherently
ambiguous. For example, the command-line arg '-1' could either be an
attempt to specify an option or an attempt to provide a positional argument.
The parse_args() method is cautious here: positional
arguments may only begin with '-' if they look like negative numbers and
there are no options in the parser that look like negative numbers:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('-x')
>>> parser.add_argument('foo', nargs='?')
>>> # no negative number options, so -1 is a positional argument
>>> parser.parse_args(['-x', '-1'])
Namespace(foo=None, x='-1')
>>> # no negative number options, so -1 and -5 are positional arguments
>>> parser.parse_args(['-x', '-1', '-5'])
Namespace(foo='-5', x='-1')
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('-1', dest='one')
>>> parser.add_argument('foo', nargs='?')
>>> # negative number options present, so -1 is an option
>>> parser.parse_args(['-1', 'X'])
Namespace(foo=None, one='X')
>>> # negative number options present, so -2 is an option
>>> parser.parse_args(['-2'])
usage: PROG [-h] [-1 ONE] [foo]
PROG: error: no such option: -2
>>> # negative number options present, so both -1s are options
>>> parser.parse_args(['-1', '-1'])
usage: PROG [-h] [-1 ONE] [foo]
PROG: error: argument -1: expected one argument
If you have positional arguments that must begin with '-' and don’t look
like negative numbers, you can insert the pseudo-argument '--' which tells
parse_args() that everything after that is a positional
argument:
Sometimes it may be useful to have an ArgumentParser parse arguments other than those
of sys.argv. This can be accomplished by passing a list of strings to
parse_args(). This is useful for testing at the
interactive prompt:
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument(
... 'integers', metavar='int', type=int, choices=range(10),
... nargs='+', help='an integer in the range 0..9')
>>> parser.add_argument(
... '--sum', dest='accumulate', action='store_const', const=sum,
... default=max, help='sum the integers (default: find the max)')
>>> parser.parse_args(['1', '2', '3', '4'])
Namespace(accumulate=<built-in function max>, integers=[1, 2, 3, 4])
>>> parser.parse_args('1 2 3 4 --sum'.split())
Namespace(accumulate=<built-in function sum>, integers=[1, 2, 3, 4])
Simple class used by default by parse_args() to create
an object holding attributes and return it.
This class is deliberately simple, just an object subclass with a
readable string representation. If you prefer to have dict-like view of the
attributes, you can use the standard Python idiom, vars():
It may also be useful to have an ArgumentParser assign attributes to an
already existing object, rather than a new Namespace object. This can
be achieved by specifying the namespace= keyword argument:
Many programs split up their functionality into a number of sub-commands,
for example, the svn program can invoke sub-commands like svncheckout, svnupdate, and svncommit. Splitting up functionality
this way can be a particularly good idea when a program performs several
different functions which require different kinds of command-line arguments.
ArgumentParser supports the creation of such sub-commands with the
add_subparsers() method. The add_subparsers() method is normally
called with no arguments and returns an special action object. This object
has a single method, add_parser(), which takes a
command name and any ArgumentParser constructor arguments, and
returns an ArgumentParser object that can be modified as usual.
Some example usage:
>>> # create the top-level parser
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--foo', action='store_true', help='foo help')
>>> subparsers = parser.add_subparsers(help='sub-command help')
>>>
>>> # create the parser for the "a" command
>>> parser_a = subparsers.add_parser('a', help='a help')
>>> parser_a.add_argument('bar', type=int, help='bar help')
>>>
>>> # create the parser for the "b" command
>>> parser_b = subparsers.add_parser('b', help='b help')
>>> parser_b.add_argument('--baz', choices='XYZ', help='baz help')
>>>
>>> # parse some arg lists
>>> parser.parse_args(['a', '12'])
Namespace(bar=12, foo=False)
>>> parser.parse_args(['--foo', 'b', '--baz', 'Z'])
Namespace(baz='Z', foo=True)
Note that the object returned by parse_args() will only contain
attributes for the main parser and the subparser that was selected by the
command line (and not any other subparsers). So in the example above, when
the "a" command is specified, only the foo and bar attributes are
present, and when the "b" command is specified, only the foo and
baz attributes are present.
Similarly, when a help message is requested from a subparser, only the help
for that particular parser will be printed. The help message will not
include parent parser or sibling parser messages. (A help message for each
subparser command, however, can be given by supplying the help= argument
to add_parser() as above.)
>>> parser.parse_args(['--help'])
usage: PROG [-h] [--foo] {a,b} ...
positional arguments:
{a,b} sub-command help
a a help
b b help
optional arguments:
-h, --help show this help message and exit
--foo foo help
>>> parser.parse_args(['a', '--help'])
usage: PROG a [-h] bar
positional arguments:
bar bar help
optional arguments:
-h, --help show this help message and exit
>>> parser.parse_args(['b', '--help'])
usage: PROG b [-h] [--baz {X,Y,Z}]
optional arguments:
-h, --help show this help message and exit
--baz {X,Y,Z} baz help
The add_subparsers() method also supports title and description
keyword arguments. When either is present, the subparser’s commands will
appear in their own group in the help output. For example:
>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers(title='subcommands',
... description='valid subcommands',
... help='additional help')
>>> subparsers.add_parser('foo')
>>> subparsers.add_parser('bar')
>>> parser.parse_args(['-h'])
usage: [-h] {foo,bar} ...
optional arguments:
-h, --help show this help message and exit
subcommands:
valid subcommands
{foo,bar} additional help
Furthermore, add_parser supports an additional aliases argument,
which allows multiple strings to refer to the same subparser. This example,
like svn, aliases co as a shorthand for checkout:
One particularly effective way of handling sub-commands is to combine the use
of the add_subparsers() method with calls to set_defaults() so
that each subparser knows which Python function it should execute. For
example:
>>> # sub-command functions
>>> def foo(args):
... print(args.x * args.y)
...
>>> def bar(args):
... print('((%s))' % args.z)
...
>>> # create the top-level parser
>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>>
>>> # create the parser for the "foo" command
>>> parser_foo = subparsers.add_parser('foo')
>>> parser_foo.add_argument('-x', type=int, default=1)
>>> parser_foo.add_argument('y', type=float)
>>> parser_foo.set_defaults(func=foo)
>>>
>>> # create the parser for the "bar" command
>>> parser_bar = subparsers.add_parser('bar')
>>> parser_bar.add_argument('z')
>>> parser_bar.set_defaults(func=bar)
>>>
>>> # parse the args and call whatever function was selected
>>> args = parser.parse_args('foo 1 -x 2'.split())
>>> args.func(args)
2.0
>>>
>>> # parse the args and call whatever function was selected
>>> args = parser.parse_args('bar XYZYX'.split())
>>> args.func(args)
((XYZYX))
This way, you can let parse_args() do the job of calling the
appropriate function after argument parsing is complete. Associating
functions with actions like this is typically the easiest way to handle the
different actions for each of your subparsers. However, if it is necessary
to check the name of the subparser that was invoked, the dest keyword
argument to the add_subparsers() call will work:
The FileType factory creates objects that can be passed to the type
argument of ArgumentParser.add_argument(). Arguments that have
FileType objects as their type will open command-line arguments as files
with the requested modes and buffer sizes:
FileType objects understand the pseudo-argument '-' and automatically
convert this into sys.stdin for readable FileType objects and
sys.stdout for writable FileType objects:
By default, ArgumentParser groups command-line arguments into
“positional arguments” and “optional arguments” when displaying help
messages. When there is a better conceptual grouping of arguments than this
default one, appropriate groups can be created using the
add_argument_group() method:
>>> parser = argparse.ArgumentParser(prog='PROG', add_help=False)
>>> group = parser.add_argument_group('group')
>>> group.add_argument('--foo', help='foo help')
>>> group.add_argument('bar', help='bar help')
>>> parser.print_help()
usage: PROG [--foo FOO] bar
group:
bar bar help
--foo FOO foo help
The add_argument_group() method returns an argument group object which
has an add_argument() method just like a regular
ArgumentParser. When an argument is added to the group, the parser
treats it just like a normal argument, but displays the argument in a
separate group for help messages. The add_argument_group() method
accepts title and description arguments which can be used to
customize this display:
Create a mutually exclusive group. argparse will make sure that only
one of the arguments in the mutually exclusive group was present on the
command line:
The add_mutually_exclusive_group() method also accepts a required
argument, to indicate that at least one of the mutually exclusive arguments
is required:
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> group = parser.add_mutually_exclusive_group(required=True)
>>> group.add_argument('--foo', action='store_true')
>>> group.add_argument('--bar', action='store_false')
>>> parser.parse_args([])
usage: PROG [-h] (--foo | --bar)
PROG: error: one of the arguments --foo --bar is required
Note that currently mutually exclusive argument groups do not support the
title and description arguments of
add_argument_group().
Most of the time, the attributes of the object returned by parse_args()
will be fully determined by inspecting the command-line arguments and the argument
actions. set_defaults() allows some additional
attributes that are determined without any inspection of the command line to
be added:
In most typical applications, parse_args() will take
care of formatting and printing any usage or error messages. However, several
formatting methods are available:
Print a help message, including the program usage and information about the
arguments registered with the ArgumentParser. If file is
None, sys.stdout is assumed.
There are also variants of these methods that simply return a string instead of
printing it:
Sometimes a script may only parse a few of the command-line arguments, passing
the remaining arguments on to another script or program. In these cases, the
parse_known_args() method can be useful. It works much like
parse_args() except that it does not produce an error when
extra arguments are present. Instead, it returns a two item tuple containing
the populated namespace and the list of remaining argument strings.
Arguments that are read from a file (see the fromfile_prefix_chars
keyword argument to the ArgumentParser constructor) are read one
argument per line. convert_arg_line_to_args() can be overriden for
fancier reading.
This method takes a single argument arg_line which is a string read from
the argument file. It returns a list of arguments parsed from this string.
The method is called once per line read from the argument file, in order.
A useful override of this method is one that treats each space-separated word
as an argument:
def convert_arg_line_to_args(self, arg_line):
for arg in arg_line.split():
if not arg.strip():
continue
yield arg
Originally, the argparse module had attempted to maintain compatibility
with optparse. However, optparse was difficult to extend
transparently, particularly with the changes required to support the new
nargs= specifiers and better usage messages. When most everything in
optparse had either been copy-pasted over or monkey-patched, it no
longer seemed practical to try to maintain the backwards compatibility.
Replace options,args=parser.parse_args() with args=parser.parse_args() and add additional ArgumentParser.add_argument()
calls for the positional arguments.
Replace callback actions and the callback_* keyword arguments with
type or action arguments.
Replace string names for type keyword arguments with the corresponding
type objects (e.g. int, float, complex, etc).
Replace optparse.Values with Namespace and
optparse.OptionError and optparse.OptionValueError with
ArgumentError.
Replace strings with implicit arguments such as %default or %prog with
the standard Python syntax to use dictionaries to format strings, that is,
%(default)s and %(prog)s.
Replace the OptionParser constructor version argument with a call to
parser.add_argument('--version',action='version',version='<theversion>')
Deprecated since version 2.7: The optparse module is deprecated and will not be developed further;
development will continue with the argparse module.
optparse is a more convenient, flexible, and powerful library for parsing
command-line options than the old getopt module. optparse uses a
more declarative style of command-line parsing: you create an instance of
OptionParser, populate it with options, and parse the command
line. optparse allows users to specify options in the conventional
GNU/POSIX syntax, and additionally generates usage and help messages for you.
Here’s an example of using optparse in a simple script:
from optparse import OptionParser
[...]
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()
With these few lines of code, users of your script can now do the “usual thing”
on the command-line, for example:
<yourscript> --file=outfile -q
As it parses the command line, optparse sets attributes of the
options object returned by parse_args() based on user-supplied
command-line values. When parse_args() returns from parsing this command
line, options.filename will be "outfile" and options.verbose will be
False. optparse supports both long and short options, allows short
options to be merged together, and allows options to be associated with their
arguments in a variety of ways. Thus, the following command lines are all
equivalent to the above example:
and optparse will print out a brief summary of your script’s options:
Usage: <yourscript> [options]
Options:
-h, --help show this help message and exit
-f FILE, --file=FILE write report to FILE
-q, --quiet don't print status messages to stdout
where the value of yourscript is determined at runtime (normally from
sys.argv[0]).
optparse was explicitly designed to encourage the creation of programs
with straightforward, conventional command-line interfaces. To that end, it
supports only the most common command-line syntax and semantics conventionally
used under Unix. If you are unfamiliar with these conventions, read this
section to acquaint yourself with them.
a string entered on the command-line, and passed by the shell to execl()
or execv(). In Python, arguments are elements of sys.argv[1:]
(sys.argv[0] is the name of the program being executed). Unix shells
also use the term “word”.
It is occasionally desirable to substitute an argument list other than
sys.argv[1:], so you should read “argument” as “an element of
sys.argv[1:], or of some other list provided as a substitute for
sys.argv[1:]”.
option
an argument used to supply extra information to guide or customize the
execution of a program. There are many different syntaxes for options; the
traditional Unix syntax is a hyphen (“-”) followed by a single letter,
e.g. -x or -F. Also, traditional Unix syntax allows multiple
options to be merged into a single argument, e.g. -x-F is equivalent
to -xF. The GNU project introduced -- followed by a series of
hyphen-separated words, e.g. --file or --dry-run. These are the
only two option syntaxes provided by optparse.
Some other option syntaxes that the world has seen include:
a hyphen followed by a few letters, e.g. -pf (this is not the same
as multiple options merged into a single argument)
a hyphen followed by a whole word, e.g. -file (this is technically
equivalent to the previous syntax, but they aren’t usually seen in the same
program)
a plus sign followed by a single letter, or a few letters, or a word, e.g.
+f, +rgb
a slash followed by a letter, or a few letters, or a word, e.g. /f,
/file
These option syntaxes are not supported by optparse, and they never
will be. This is deliberate: the first three are non-standard on any
environment, and the last only makes sense if you’re exclusively targeting
VMS, MS-DOS, and/or Windows.
option argument
an argument that follows an option, is closely associated with that option,
and is consumed from the argument list when that option is. With
optparse, option arguments may either be in a separate argument from
their option:
-f foo
--file foo
or included in the same argument:
-ffoo
--file=foo
Typically, a given option either takes an argument or it doesn’t. Lots of
people want an “optional option arguments” feature, meaning that some options
will take an argument if they see it, and won’t if they don’t. This is
somewhat controversial, because it makes parsing ambiguous: if -a takes
an optional argument and -b is another option entirely, how do we
interpret -ab? Because of this ambiguity, optparse does not
support this feature.
positional argument
something leftover in the argument list after options have been parsed, i.e.
after options and their arguments have been parsed and removed from the
argument list.
required option
an option that must be supplied on the command-line; note that the phrase
“required option” is self-contradictory in English. optparse doesn’t
prevent you from implementing required options, but doesn’t give you much
help at it either.
For example, consider this hypothetical command-line:
prog -v --report /tmp/report.txt foo bar
-v and --report are both options. Assuming that --report
takes one argument, /tmp/report.txt is an option argument. foo and
bar are positional arguments.
Options are used to provide extra information to tune or customize the execution
of a program. In case it wasn’t clear, options are usually optional. A
program should be able to run just fine with no options whatsoever. (Pick a
random program from the Unix or GNU toolsets. Can it run without any options at
all and still make sense? The main exceptions are find, tar, and
dd—all of which are mutant oddballs that have been rightly criticized
for their non-standard syntax and confusing interfaces.)
Lots of people want their programs to have “required options”. Think about it.
If it’s required, then it’s not optional! If there is a piece of information
that your program absolutely requires in order to run successfully, that’s what
positional arguments are for.
As an example of good command-line interface design, consider the humble cp
utility, for copying files. It doesn’t make much sense to try to copy files
without supplying a destination and at least one source. Hence, cp fails if
you run it with no arguments. However, it has a flexible, useful syntax that
does not require any options at all:
cp SOURCE DEST
cp SOURCE ... DEST-DIR
You can get pretty far with just that. Most cp implementations provide a
bunch of options to tweak exactly how the files are copied: you can preserve
mode and modification time, avoid following symlinks, ask before clobbering
existing files, etc. But none of this distracts from the core mission of
cp, which is to copy either one file to another, or several files to another
directory.
Positional arguments are for those pieces of information that your program
absolutely, positively requires to run.
A good user interface should have as few absolute requirements as possible. If
your program requires 17 distinct pieces of information in order to run
successfully, it doesn’t much matter how you get that information from the
user—most people will give up and walk away before they successfully run the
program. This applies whether the user interface is a command-line, a
configuration file, or a GUI: if you make that many demands on your users, most
of them will simply give up.
In short, try to minimize the amount of information that users are absolutely
required to supply—use sensible defaults whenever possible. Of course, you
also want to make your programs reasonably flexible. That’s what options are
for. Again, it doesn’t matter if they are entries in a config file, widgets in
the “Preferences” dialog of a GUI, or command-line options—the more options
you implement, the more flexible your program is, and the more complicated its
implementation becomes. Too much flexibility has drawbacks as well, of course;
too many options can overwhelm users and make your code much harder to maintain.
While optparse is quite flexible and powerful, it’s also straightforward
to use in most cases. This section covers the code patterns that are common to
any optparse-based program.
First, you need to import the OptionParser class; then, early in the main
program, create an OptionParser instance:
from optparse import OptionParser
[...]
parser = OptionParser()
Then you can start defining options. The basic syntax is:
parser.add_option(opt_str, ...,
attr=value, ...)
Each option has one or more option strings, such as -f or --file,
and several option attributes that tell optparse what to expect and what
to do when it encounters that option on the command line.
Typically, each option will have one short option string and one long option
string, e.g.:
parser.add_option("-f", "--file", ...)
You’re free to define as many short option strings and as many long option
strings as you like (including zero), as long as there is at least one option
string overall.
The option strings passed to add_option() are effectively labels for the
option defined by that call. For brevity, we will frequently refer to
encountering an option on the command line; in reality, optparse
encounters option strings and looks up options from them.
Once all of your options are defined, instruct optparse to parse your
program’s command line:
(options, args) = parser.parse_args()
(If you like, you can pass a custom argument list to parse_args(), but
that’s rarely necessary: by default it uses sys.argv[1:].)
parse_args() returns two values:
options, an object containing values for all of your options—e.g. if
--file takes a single string argument, then options.file will be the
filename supplied by the user, or None if the user did not supply that
option
args, the list of positional arguments leftover after parsing options
This tutorial section only covers the four most important option attributes:
action, type, dest
(destination), and help. Of these, action is the
most fundamental.
Actions tell optparse what to do when it encounters an option on the
command line. There is a fixed set of actions hard-coded into optparse;
adding new actions is an advanced topic covered in section
Extending optparse. Most actions tell optparse to store
a value in some variable—for example, take a string from the command line and
store it in an attribute of options.
If you don’t specify an option action, optparse defaults to store.
The most common option action is store, which tells optparse to take
the next argument (or the remainder of the current argument), ensure that it is
of the correct type, and store it to your chosen destination.
When optparse sees the option string -f, it consumes the next
argument, foo.txt, and stores it in options.filename. So, after this
call to parse_args(), options.filename is "foo.txt".
Some other option types supported by optparse are int and float.
Here’s an option that expects an integer argument:
parser.add_option("-n", type="int", dest="num")
Note that this option has no long option string, which is perfectly acceptable.
Also, there’s no explicit action, since the default is store.
Let’s parse another fake command-line. This time, we’ll jam the option argument
right up against the option: since -n42 (one argument) is equivalent to
-n42 (two arguments), the code
If you don’t specify a type, optparse assumes string. Combined with
the fact that the default action is store, that means our first example can
be a lot shorter:
If you don’t supply a destination, optparse figures out a sensible
default from the option strings: if the first long option string is
--foo-bar, then the default destination is foo_bar. If there are no
long option strings, optparse looks at the first short option string: the
default destination for -f is f.
optparse also includes the built-in complex type. Adding
types is covered in section Extending optparse.
Flag options—set a variable to true or false when a particular option is seen
—are quite common. optparse supports them with two separate actions,
store_true and store_false. For example, you might have a verbose
flag that is turned on with -v and off with -q:
Here we have two different options with the same destination, which is perfectly
OK. (It just means you have to be a bit careful when setting default values—
see below.)
When optparse encounters -v on the command line, it sets
options.verbose to True; when it encounters -q,
options.verbose is set to False.
All of the above examples involve setting some variable (the “destination”) when
certain command-line options are seen. What happens if those options are never
seen? Since we didn’t supply any defaults, they are all set to None. This
is usually fine, but sometimes you want more control. optparse lets you
supply a default value for each destination, which is assigned before the
command line is parsed.
First, consider the verbose/quiet example. If we want optparse to set
verbose to True unless -q is seen, then we can do this:
Since default values apply to the destination rather than to any particular
option, and these two options happen to have the same destination, this is
exactly equivalent:
As before, the last value specified for a given option destination is the one
that counts. For clarity, try to use one method or the other of setting default
values, not both.
optparse‘s ability to generate help and usage text automatically is
useful for creating user-friendly command-line interfaces. All you have to do
is supply a help value for each option, and optionally a short
usage message for your whole program. Here’s an OptionParser populated with
user-friendly (documented) options:
If optparse encounters either -h or --help on the
command-line, or if you just call parser.print_help(), it prints the
following to standard output:
Usage: <yourscript> [options] arg1 arg2
Options:
-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-q, --quiet be vewwy quiet (I'm hunting wabbits)
-f FILE, --filename=FILE
write output to FILE
-m MODE, --mode=MODE interaction mode: novice, intermediate, or
expert [default: intermediate]
(If the help output is triggered by a help option, optparse exits after
printing the help text.)
There’s a lot going on here to help optparse generate the best possible
help message:
the script defines its own usage message:
usage = "usage: %prog [options] arg1 arg2"
optparse expands %prog in the usage string to the name of the
current program, i.e. os.path.basename(sys.argv[0]). The expanded string
is then printed before the detailed option help.
If you don’t supply a usage string, optparse uses a bland but sensible
default: "Usage:%prog[options]", which is fine if your script doesn’t
take any positional arguments.
every option defines a help string, and doesn’t worry about line-wrapping—
optparse takes care of wrapping lines and making the help output look
good.
options that take a value indicate this fact in their automatically-generated
help message, e.g. for the “mode” option:
-m MODE, --mode=MODE
Here, “MODE” is called the meta-variable: it stands for the argument that the
user is expected to supply to -m/--mode. By default,
optparse converts the destination variable name to uppercase and uses
that for the meta-variable. Sometimes, that’s not what you want—for
example, the --filename option explicitly sets metavar="FILE",
resulting in this automatically-generated option description:
-f FILE, --filename=FILE
This is important for more than just saving space, though: the manually
written help text uses the meta-variable FILE to clue the user in that
there’s a connection between the semi-formal syntax -fFILE and the informal
semantic description “write output to FILE”. This is a simple but effective
way to make your help text a lot clearer and more useful for end users.
options that have a default value can include %default in the help
string—optparse will replace it with str() of the option’s
default value. If an option has no default value (or the default value is
None), %default expands to none.
When dealing with many options, it is convenient to group these options for
better help output. An OptionParser can contain several option groups,
each of which can contain several options.
An option group is obtained using the class OptionGroup:
class optparse.OptionGroup(parser, title, description=None)¶
where
parser is the OptionParser instance the group will be insterted in
to
title is the group title
description, optional, is a long description of the group
OptionGroup inherits from OptionContainer (like
OptionParser) and so the add_option() method can be used to add
an option to the group.
Once all the options are declared, using the OptionParser method
add_option_group() the group is added to the previously defined parser.
Continuing with the parser defined in the previous section, adding an
OptionGroup to a parser is easy:
group = OptionGroup(parser, "Dangerous Options",
"Caution: use these options at your own risk. "
"It is believed that some of them bite.")
group.add_option("-g", action="store_true", help="Group option.")
parser.add_option_group(group)
This would result in the following help output:
Usage: <yourscript> [options] arg1 arg2
Options:
-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-q, --quiet be vewwy quiet (I'm hunting wabbits)
-f FILE, --filename=FILE
write output to FILE
-m MODE, --mode=MODE interaction mode: novice, intermediate, or
expert [default: intermediate]
Dangerous Options:
Caution: use these options at your own risk. It is believed that some
of them bite.
-g Group option.
A bit more complete example might invole using more than one group: still
extendind the previous example:
group = OptionGroup(parser, "Dangerous Options",
"Caution: use these options at your own risk. "
"It is believed that some of them bite.")
group.add_option("-g", action="store_true", help="Group option.")
parser.add_option_group(group)
group = OptionGroup(parser, "Debug Options")
group.add_option("-d", "--debug", action="store_true",
help="Print debug information")
group.add_option("-s", "--sql", action="store_true",
help="Print all SQL statements executed")
group.add_option("-e", action="store_true", help="Print every action done")
parser.add_option_group(group)
that results in the following output:
Usage: <yourscript> [options] arg1 arg2
Options:
-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-q, --quiet be vewwy quiet (I'm hunting wabbits)
-f FILE, --filename=FILE
write output to FILE
-m MODE, --mode=MODE interaction mode: novice, intermediate, or expert
[default: intermediate]
Dangerous Options:
Caution: use these options at your own risk. It is believed that some
of them bite.
-g Group option.
Debug Options:
-d, --debug Print debug information
-s, --sql Print all SQL statements executed
-e Print every action done
Another interesting method, in particular when working programmatically with
option groups is:
Return the OptionGroup to which the short or long option
string opt_str (e.g. '-o' or '--option') belongs. If
there’s no such OptionGroup, return None.
Similar to the brief usage string, optparse can also print a version
string for your program. You have to supply the string as the version
argument to OptionParser:
%prog is expanded just like it is in usage. Apart from that,
version can contain anything you like. When you supply it, optparse
automatically adds a --version option to your parser. If it encounters
this option on the command line, it expands your version string (by
replacing %prog), prints it to stdout, and exits.
For example, if your script is called /usr/bin/foo:
$ /usr/bin/foo --version
foo 1.0
The following two methods can be used to print and get the version string:
Print the version message for the current program (self.version) to
file (default stdout). As with print_usage(), any occurrence
of %prog in self.version is replaced with the name of the current
program. Does nothing if self.version is empty or undefined.
There are two broad classes of errors that optparse has to worry about:
programmer errors and user errors. Programmer errors are usually erroneous
calls to OptionParser.add_option(), e.g. invalid option strings, unknown
option attributes, missing option attributes, etc. These are dealt with in the
usual way: raise an exception (either optparse.OptionError or
TypeError) and let the program crash.
Handling user errors is much more important, since they are guaranteed to happen
no matter how stable your code is. optparse can automatically detect
some user errors, such as bad option arguments (passing -n4x where
-n takes an integer argument), missing arguments (-n at the end
of the command line, where -n takes an argument of any type). Also,
you can call OptionParser.error() to signal an application-defined error
condition:
(options, args) = parser.parse_args()
[...]
if options.a and options.b:
parser.error("options -a and -b are mutually exclusive")
In either case, optparse handles the error the same way: it prints the
program’s usage message and an error message to standard error and exits with
error status 2.
Consider the first example above, where the user passes 4x to an option
that takes an integer:
optparse-generated error messages take care always to mention the
option involved in the error; be sure to do the same when calling
OptionParser.error() from your application code.
If optparse‘s default error-handling behaviour does not suit your needs,
you’ll need to subclass OptionParser and override its exit()
and/or error() methods.
The OptionParser constructor has no required arguments, but a number of
optional keyword arguments. You should always pass them as keyword
arguments, i.e. do not rely on the order in which the arguments are declared.
usage (default: "%prog[options]")
The usage summary to print when your program is run incorrectly or with a
help option. When optparse prints the usage string, it expands
%prog to os.path.basename(sys.argv[0]) (or to prog if you
passed that keyword argument). To suppress a usage message, pass the
special value optparse.SUPPRESS_USAGE.
option_list (default: [])
A list of Option objects to populate the parser with. The options in
option_list are added after any options in standard_option_list (a
class attribute that may be set by OptionParser subclasses), but before
any version or help options. Deprecated; use add_option() after
creating the parser instead.
option_class (default: optparse.Option)
Class to use when adding options to the parser in add_option().
version (default: None)
A version string to print when the user supplies a version option. If you
supply a true value for version, optparse automatically adds a
version option with the single option string --version. The
substring %prog is expanded the same as for usage.
conflict_handler (default: "error")
Specifies what to do when options with conflicting option strings are
added to the parser; see section
Conflicts between options.
description (default: None)
A paragraph of text giving a brief overview of your program.
optparse reformats this paragraph to fit the current terminal width
and prints it when the user requests help (after usage, but before the
list of options).
formatter (default: a new IndentedHelpFormatter)
An instance of optparse.HelpFormatter that will be used for printing help
text. optparse provides two concrete classes for this purpose:
IndentedHelpFormatter and TitledHelpFormatter.
add_help_option (default: True)
If true, optparse will add a help option (with option strings -h
and --help) to the parser.
prog
The string to use when expanding %prog in usage and version
instead of os.path.basename(sys.argv[0]).
epilog (default: None)
A paragraph of help text to print after the option help.
There are several ways to populate the parser with options. The preferred way
is by using OptionParser.add_option(), as shown in section
Tutorial. add_option() can be called in one of two ways:
pass it an Option instance (as returned by make_option())
pass it any combination of positional and keyword arguments that are
acceptable to make_option() (i.e., to the Option constructor), and it
will create the Option instance for you
The other alternative is to pass a list of pre-constructed Option instances to
the OptionParser constructor, as in:
(make_option() is a factory function for creating Option instances;
currently it is an alias for the Option constructor. A future version of
optparse may split Option into several classes, and make_option()
will pick the right class to instantiate. Do not instantiate Option directly.)
Each Option instance represents a set of synonymous command-line option strings,
e.g. -f and --file. You can specify any number of short or
long option strings, but you must specify at least one overall option string.
The canonical way to create an Option instance is with the
add_option() method of OptionParser.
To define an option with only a short option string:
parser.add_option("-f", attr=value, ...)
And to define an option with only a long option string:
parser.add_option("--foo", attr=value, ...)
The keyword arguments define attributes of the new Option object. The most
important option attribute is action, and it largely
determines which other attributes are relevant or required. If you pass
irrelevant option attributes, or fail to pass required ones, optparse
raises an OptionError exception explaining your mistake.
An option’s action determines what optparse does when it encounters
this option on the command-line. The standard option actions hard-coded into
optparse are:
"store"
store this option’s argument (default)
"store_const"
store a constant value
"store_true"
store a true value
"store_false"
store a false value
"append"
append this option’s argument to a list
"append_const"
append a constant value to a list
"count"
increment a counter by one
"callback"
call a specified function
"help"
print a usage message including all options and the documentation for them
(If you don’t supply an action, the default is "store". For this action,
you may also supply type and dest option
attributes; see Standard option actions.)
As you can see, most actions involve storing or updating a value somewhere.
optparse always creates a special object for this, conventionally called
options (it happens to be an instance of optparse.Values). Option
arguments (and various other values) are stored as attributes of this object,
according to the dest (destination) option attribute.
For example, when you call
parser.parse_args()
one of the first things optparse does is create the options object:
options = Values()
If one of the options in this parser is defined with
The following option attributes may be passed as keyword arguments to
OptionParser.add_option(). If you pass an option attribute that is not
relevant to a particular option, or fail to pass a required option attribute,
optparse raises OptionError.
If the option’s action implies writing or modifying a value somewhere, this
tells optparse where to write it: dest names an
attribute of the options object that optparse builds as it parses
the command line.
For options with action "callback", the callable to call when this option
is seen. See section Option Callbacks for detail on the
arguments passed to the callable.
Help text to print for this option when listing all available options after
the user supplies a help option (such as --help). If
no help text is supplied, the option will be listed without help text. To
hide this option, use the special value optparse.SUPPRESS_HELP.
The various option actions all have slightly different requirements and effects.
Most actions have several relevant option attributes which you may specify to
guide optparse‘s behaviour; a few have required attributes, which you
must specify for any option using that action.
The option must be followed by an argument, which is converted to a value
according to type and stored in dest. If
nargs > 1, multiple arguments will be consumed from the
command line; all will be converted according to type and
stored to dest as a tuple. See the
Standard option types section.
If choices is supplied (a list or tuple of strings), the type
defaults to "choice".
If dest is not supplied, optparse derives a destination
from the first long option string (e.g., --foo-bar implies
foo_bar). If there are no long option strings, optparse derives a
destination from the first short option string (e.g., -f implies f).
The option must be followed by an argument, which is appended to the list in
dest. If no default value for dest is
supplied, an empty list is automatically created when optparse first
encounters this option on the command-line. If nargs > 1,
multiple arguments are consumed, and a tuple of length nargs
is appended to dest.
The defaults for type and dest are the same as
for the "store" action.
Like "store_const", but the value const is appended to
dest; as with "append", dest defaults to
None, and an empty list is automatically created the first time the option
is encountered.
Prints a complete help message for all the options in the current option
parser. The help message is constructed from the usage string passed to
OptionParser’s constructor and the help string passed to every
option.
If no help string is supplied for an option, it will still be
listed in the help message. To omit an option entirely, use the special value
optparse.SUPPRESS_HELP.
optparse automatically adds a help option to all
OptionParsers, so you do not normally need to create one.
Example:
from optparse import OptionParser, SUPPRESS_HELP
# usually, a help option is added automatically, but that can
# be suppressed using the add_help_option argument
parser = OptionParser(add_help_option=False)
parser.add_option("-h", "--help", action="help")
parser.add_option("-v", action="store_true", dest="verbose",
help="Be moderately verbose")
parser.add_option("--file", dest="filename",
help="Input file to read data from")
parser.add_option("--secret", help=SUPPRESS_HELP)
If optparse sees either -h or --help on the command line,
it will print something like the following help message to stdout (assuming
sys.argv[0] is "foo.py"):
Usage: foo.py [options]
Options:
-h, --help Show this help message and exit
-v Be moderately verbose
--file=FILENAME Input file to read data from
After printing the help message, optparse terminates your process with
sys.exit(0).
"version"
Prints the version number supplied to the OptionParser to stdout and exits.
The version number is actually formatted and printed by the
print_version() method of OptionParser. Generally only relevant if the
version argument is supplied to the OptionParser constructor. As with
help options, you will rarely create version options,
since optparse automatically adds them when needed.
optparse has five built-in option types: "string", "int",
"choice", "float" and "complex". If you need to add new
option types, see section Extending optparse.
Arguments to string options are not checked or converted in any way: the text on
the command line is stored in the destination (or passed to the callback) as-is.
Integer arguments (type "int") are parsed as follows:
if the number starts with 0x, it is parsed as a hexadecimal number
if the number starts with 0, it is parsed as an octal number
if the number starts with 0b, it is parsed as a binary number
otherwise, the number is parsed as a decimal number
The conversion is done by calling int() with the appropriate base (2, 8,
10, or 16). If this fails, so will optparse, although with a more useful
error message.
"float" and "complex" option arguments are converted directly with
float() and complex(), with similar error-handling.
"choice" options are a subtype of "string" options. The
choices option attribute (a sequence of strings) defines the
set of allowed option arguments. optparse.check_choice() compares
user-supplied option arguments against this master list and raises
OptionValueError if an invalid string is given.
the list of arguments to process (default: sys.argv[1:])
values
a optparse.Values object to store option arguments in (default: a
new instance of Values) – if you give an existing object, the
option defaults will not be initialized on it
and the return values are
options
the same object that was passed in as values, or the optparse.Values
instance created by optparse
args
the leftover positional arguments after all options have been processed
The most common usage is to supply neither keyword argument. If you supply
values, it will be modified with repeated setattr() calls (roughly one
for every option argument stored to an option destination) and returned by
parse_args().
If parse_args() encounters any errors in the argument list, it calls the
OptionParser’s error() method with an appropriate end-user error message.
This ultimately terminates your process with an exit status of 2 (the
traditional Unix exit status for command-line errors).
The default behavior of the option parser can be customized slightly, and you
can also poke around your option parser and see what’s there. OptionParser
provides several methods to help you out:
Set parsing to stop on the first non-option. For example, if -a and
-b are both simple options that take no arguments, optparse
normally accepts this syntax:
prog -a arg1 -b arg2
and treats it as equivalent to
prog -a -b arg1 arg2
To disable this feature, call disable_interspersed_args(). This
restores traditional Unix syntax, where option parsing stops with the first
non-option argument.
Use this if you have a command processor which runs another command which has
options of its own and you want to make sure these options don’t get
confused. For example, each command might have a different set of options.
If the OptionParser has an option corresponding to opt_str, that
option is removed. If that option provided any other option strings, all of
those option strings become invalid. If opt_str does not occur in any
option belonging to this OptionParser, raises ValueError.
(This is particularly true if you’ve defined your own OptionParser subclass with
some standard options.)
Every time you add an option, optparse checks for conflicts with existing
options. If it finds any, it invokes the current conflict-handling mechanism.
You can set the conflict-handling mechanism either in the constructor:
At this point, optparse detects that a previously-added option is already
using the -n option string. Since conflict_handler is "resolve",
it resolves the situation by removing -n from the earlier option’s list of
option strings. Now --dry-run is the only way for the user to activate
that option. If the user asks for help, the help message will reflect that:
Options:
--dry-run do no harm
[...]
-n, --noisy be noisy
It’s possible to whittle away the option strings for a previously-added option
until there are none left, and the user has no way of invoking that option from
the command-line. In that case, optparse removes that option completely,
so it doesn’t show up in help text or anywhere else. Carrying on with our
existing OptionParser:
OptionParser instances have several cyclic references. This should not be a
problem for Python’s garbage collector, but you may wish to break the cyclic
references explicitly by calling destroy() on your
OptionParser once you are done with it. This is particularly useful in
long-running applications where large object graphs are reachable from your
OptionParser.
Set the usage string according to the rules described above for the usage
constructor keyword argument. Passing None sets the default usage
string; use optparse.SUPPRESS_USAGE to suppress a usage message.
Print the usage message for the current program (self.usage) to file
(default stdout). Any occurrence of the string %prog in self.usage
is replaced with the name of the current program. Does nothing if
self.usage is empty or not defined.
Set default values for several option destinations at once. Using
set_defaults() is the preferred way to set default values for options,
since multiple options can share the same destination. For example, if
several “mode” options all set the same destination, any one of them can set
the default, and the last one wins:
When optparse‘s built-in actions and types aren’t quite enough for your
needs, you have two choices: extend optparse or define a callback option.
Extending optparse is more general, but overkill for a lot of simple
cases. Quite often a simple callback is all you need.
There are two steps to defining a callback option:
define the option itself using the "callback" action
write the callback; this is a function (or method) that takes at least four
arguments, as described below
As always, the easiest way to define a callback option is by using the
OptionParser.add_option() method. Apart from action, the
only option attribute you must specify is callback, the function to call:
callback is a function (or other callable object), so you must have already
defined my_callback() when you create this callback option. In this simple
case, optparse doesn’t even know if -c takes any arguments,
which usually means that the option takes no arguments—the mere presence of
-c on the command-line is all it needs to know. In some
circumstances, though, you might want your callback to consume an arbitrary
number of command-line arguments. This is where writing callbacks gets tricky;
it’s covered later in this section.
optparse always passes four particular arguments to your callback, and it
will only pass additional arguments if you specify them via
callback_args and callback_kwargs. Thus, the
minimal callback function signature is:
def my_callback(option, opt, value, parser):
The four arguments to a callback are described below.
There are several other option attributes that you can supply when you define a
callback option:
has its usual meaning: as with the "store" or "append" actions, it
instructs optparse to consume one argument and convert it to
type. Rather than storing the converted value(s) anywhere,
though, optparse passes it to your callback function.
also has its usual meaning: if it is supplied and > 1, optparse will
consume nargs arguments, each of which must be convertible to
type. It then passes a tuple of converted values to your
callback.
is the Option instance that’s calling the callback
opt_str
is the option string seen on the command-line that’s triggering the callback.
(If an abbreviated long option was used, opt_str will be the full,
canonical option string—e.g. if the user puts --foo on the
command-line as an abbreviation for --foobar, then opt_str will be
"--foobar".)
value
is the argument to this option seen on the command-line. optparse will
only expect an argument if type is set; the type of value will be
the type implied by the option’s type. If type for this option is
None (no argument expected), then value will be None. If nargs
> 1, value will be a tuple of values of the appropriate type.
parser
is the OptionParser instance driving the whole thing, mainly useful because
you can access some other interesting data through its instance attributes:
parser.largs
the current list of leftover arguments, ie. arguments that have been
consumed but are neither options nor option arguments. Feel free to modify
parser.largs, e.g. by adding more arguments to it. (This list will
become args, the second return value of parse_args().)
parser.rargs
the current list of remaining arguments, ie. with opt_str and
value (if applicable) removed, and only the arguments following them
still there. Feel free to modify parser.rargs, e.g. by consuming more
arguments.
parser.values
the object where option values are by default stored (an instance of
optparse.OptionValues). This lets callbacks use the same mechanism as the
rest of optparse for storing option values; you don’t need to mess
around with globals or closures. You can also access or modify the
value(s) of any options already encountered on the command-line.
args
is a tuple of arbitrary positional arguments supplied via the
callback_args option attribute.
kwargs
is a dictionary of arbitrary keyword arguments supplied via
callback_kwargs.
The callback function should raise OptionValueError if there are any
problems with the option or its argument(s). optparse catches this and
terminates the program, printing the error message you supply to stderr. Your
message should be clear, concise, accurate, and mention the option at fault.
Otherwise, the user will have a hard time figuring out what he did wrong.
Here’s a slightly more interesting example: record the fact that -a is
seen, but blow up if it comes after -b in the command-line.
def check_order(option, opt_str, value, parser):
if parser.values.b:
raise OptionValueError("can't use -a after -b")
parser.values.a = 1
[...]
parser.add_option("-a", action="callback", callback=check_order)
parser.add_option("-b", action="store_true", dest="b")
Callback example 3: check option order (generalized)¶
If you want to re-use this callback for several similar options (set a flag, but
blow up if -b has already been seen), it needs a bit of work: the error
message and the flag that it sets must be generalized.
def check_order(option, opt_str, value, parser):
if parser.values.b:
raise OptionValueError("can't use %s after -b" % opt_str)
setattr(parser.values, option.dest, 1)
[...]
parser.add_option("-a", action="callback", callback=check_order, dest='a')
parser.add_option("-b", action="store_true", dest="b")
parser.add_option("-c", action="callback", callback=check_order, dest='c')
Of course, you could put any condition in there—you’re not limited to checking
the values of already-defined options. For example, if you have options that
should not be called when the moon is full, all you have to do is this:
def check_moon(option, opt_str, value, parser):
if is_moon_full():
raise OptionValueError("%s option invalid when moon is full"
% opt_str)
setattr(parser.values, option.dest, 1)
[...]
parser.add_option("--foo",
action="callback", callback=check_moon, dest="foo")
(The definition of is_moon_full() is left as an exercise for the reader.)
Things get slightly more interesting when you define callback options that take
a fixed number of arguments. Specifying that a callback option takes arguments
is similar to defining a "store" or "append" option: if you define
type, then the option takes one argument that must be
convertible to that type; if you further define nargs, then the
option takes nargs arguments.
Here’s an example that just emulates the standard "store" action:
Note that optparse takes care of consuming 3 arguments and converting
them to integers for you; all you have to do is store them. (Or whatever;
obviously you don’t need a callback for this example.)
Things get hairy when you want an option to take a variable number of arguments.
For this case, you must write a callback, as optparse doesn’t provide any
built-in capabilities for it. And you have to deal with certain intricacies of
conventional Unix command-line parsing that optparse normally handles for
you. In particular, callbacks should implement the conventional rules for bare
-- and - arguments:
either -- or - can be option arguments
bare -- (if not the argument to some option): halt command-line
processing and discard the --
bare - (if not the argument to some option): halt command-line
processing but keep the - (append it to parser.largs)
If you want an option that takes a variable number of arguments, there are
several subtle, tricky issues to worry about. The exact implementation you
choose will be based on which trade-offs you’re willing to make for your
application (which is why optparse doesn’t support this sort of thing
directly).
Nevertheless, here’s a stab at a callback for an option with variable
arguments:
def vararg_callback(option, opt_str, value, parser):
assert value is None
value = []
def floatable(str):
try:
float(str)
return True
except ValueError:
return False
for arg in parser.rargs:
# stop on --foo like options
if arg[:2] == "--" and len(arg) > 2:
break
# stop on -a, but not on -3 or -3.0
if arg[:1] == "-" and len(arg) > 1 and not floatable(arg):
break
value.append(arg)
del parser.rargs[:len(value)]
setattr(parser.values, option.dest, value)
[...]
parser.add_option("-c", "--callback", dest="vararg_attr",
action="callback", callback=vararg_callback)
Since the two major controlling factors in how optparse interprets
command-line options are the action and type of each option, the most likely
direction of extension is to add new actions and new types.
To add new types, you need to define your own subclass of optparse‘s
Option class. This class has a couple of attributes that define
optparse‘s types: TYPES and TYPE_CHECKER.
A dictionary mapping type names to type-checking functions. A type-checking
function has the following signature:
def check_mytype(option, opt, value)
where option is an Option instance, opt is an option string
(e.g., -f), and value is the string from the command line that must
be checked and converted to your desired type. check_mytype() should
return an object of the hypothetical type mytype. The value returned by
a type-checking function will wind up in the OptionValues instance returned
by OptionParser.parse_args(), or be passed to a callback as the
value parameter.
Your type-checking function should raise OptionValueError if it
encounters any problems. OptionValueError takes a single string
argument, which is passed as-is to OptionParser‘s error()
method, which in turn prepends the program name and the string "error:"
and prints everything to stderr before terminating the process.
Here’s a silly example that demonstrates adding a "complex" option type to
parse Python-style complex numbers on the command line. (This is even sillier
than it used to be, because optparse 1.3 added built-in support for
complex numbers, but never mind.)
First, the necessary imports:
from copy import copy
from optparse import Option, OptionValueError
You need to define your type-checker first, since it’s referred to later (in the
TYPE_CHECKER class attribute of your Option subclass):
(If we didn’t make a copy() of Option.TYPE_CHECKER, we would end
up modifying the TYPE_CHECKER attribute of optparse‘s
Option class. This being Python, nothing stops you from doing that except good
manners and common sense.)
That’s it! Now you can write a script that uses the new option type just like
any other optparse-based script, except you have to instruct your
OptionParser to use MyOption instead of Option:
Alternately, you can build your own option list and pass it to OptionParser; if
you don’t use add_option() in the above way, you don’t need to tell
OptionParser which option class to use:
Adding new actions is a bit trickier, because you have to understand that
optparse has a couple of classifications for actions:
“store” actions
actions that result in optparse storing a value to an attribute of the
current OptionValues instance; these options require a dest
attribute to be supplied to the Option constructor.
“typed” actions
actions that take a value from the command line and expect it to be of a
certain type; or rather, a string that can be converted to a certain type.
These options require a type attribute to the Option
constructor.
These are overlapping sets: some default “store” actions are "store",
"store_const", "append", and "count", while the default “typed”
actions are "store", "append", and "callback".
When you add an action, you need to categorize it by listing it in at least one
of the following class attributes of Option (all are lists of strings):
Actions that always take a type (i.e. whose options always take a value) are
additionally listed here. The only effect of this is that optparse
assigns the default type, "string", to options with no explicit type
whose action is listed in ALWAYS_TYPED_ACTIONS.
In order to actually implement your new action, you must override Option’s
take_action() method and add a case that recognizes your action.
For example, let’s add an "extend" action. This is similar to the standard
"append" action, but instead of taking a single value from the command-line
and appending it to an existing list, "extend" will take multiple values in
a single comma-delimited string, and extend an existing list with them. That
is, if --names is an "extend" option of type "string", the command
line
"extend" both expects a value on the command-line and stores that value
somewhere, so it goes in both STORE_ACTIONS and
TYPED_ACTIONS.
to ensure that optparse assigns the default type of "string" to
"extend" actions, we put the "extend" action in
ALWAYS_TYPED_ACTIONS as well.
MyOption.take_action() implements just this one new action, and passes
control back to Option.take_action() for the standard optparse
actions.
values is an instance of the optparse_parser.Values class, which provides
the very useful ensure_value() method. ensure_value() is
essentially getattr() with a safety valve; it is called as
values.ensure_value(attr, value)
If the attr attribute of values doesn’t exist or is None, then
ensure_value() first sets it to value, and then returns ‘value. This is
very handy for actions like "extend", "append", and "count", all
of which accumulate data in a variable and expect that variable to be of a
certain type (a list for the first two, an integer for the latter). Using
ensure_value() means that scripts using your action don’t have to worry
about setting a default value for the option destinations in question; they
can just leave the default as None and ensure_value() will take care of
getting it right when it’s needed.
The getopt module is a parser for command line options whose API is
designed to be familiar to users of the C getopt() function. Users who
are unfamiliar with the C getopt() function or who would like to write
less code and get better help and error messages should consider using the
argparse module instead.
This module helps scripts to parse the command line arguments in sys.argv.
It supports the same conventions as the Unix getopt() function (including
the special meanings of arguments of the form ‘-‘ and ‘--‘). Long
options similar to those supported by GNU software may be used as well via an
optional third argument.
This module provides two functions and an
exception:
Parses command line options and parameter list. args is the argument list to
be parsed, without the leading reference to the running program. Typically, this
means sys.argv[1:]. shortopts is the string of option letters that the
script wants to recognize, with options that require an argument followed by a
colon (':'; i.e., the same format that Unix getopt() uses).
Note
Unlike GNU getopt(), after a non-option argument, all further
arguments are considered also non-options. This is similar to the way
non-GNU Unix systems work.
longopts, if specified, must be a list of strings with the names of the
long options which should be supported. The leading '--' characters
should not be included in the option name. Long options which require an
argument should be followed by an equal sign ('='). Optional arguments
are not supported. To accept only long options, shortopts should be an
empty string. Long options on the command line can be recognized so long as
they provide a prefix of the option name that matches exactly one of the
accepted options. For example, if longopts is ['foo','frob'], the
option --fo will match as --foo, but --f will
not match uniquely, so GetoptError will be raised.
The return value consists of two elements: the first is a list of (option,value) pairs; the second is the list of program arguments left after the
option list was stripped (this is a trailing slice of args). Each
option-and-value pair returned has the option as its first element, prefixed
with a hyphen for short options (e.g., '-x') or two hyphens for long
options (e.g., '--long-option'), and the option argument as its
second element, or an empty string if the option has no argument. The
options occur in the list in the same order in which they were found, thus
allowing multiple occurrences. Long and short options may be mixed.
This function works like getopt(), except that GNU style scanning mode is
used by default. This means that option and non-option arguments may be
intermixed. The getopt() function stops processing options as soon as a
non-option argument is encountered.
If the first character of the option string is '+', or if the environment
variable POSIXLY_CORRECT is set, then option processing stops as
soon as a non-option argument is encountered.
This is raised when an unrecognized option is found in the argument list or when
an option requiring an argument is given none. The argument to the exception is
a string indicating the cause of the error. For long options, an argument given
to an option which does not require one will also cause this exception to be
raised. The attributes msg and opt give the error message and
related option; if there is no specific option to which the exception relates,
opt is an empty string.
In a script, typical usage is something like this:
import getopt, sys
def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
except getopt.GetoptError as err:
# print help information and exit:
print(err) # will print something like "option -a not recognized"
usage()
sys.exit(2)
output = None
verbose = False
for o, a in opts:
if o == "-v":
verbose = True
elif o in ("-h", "--help"):
usage()
sys.exit()
elif o in ("-o", "--output"):
output = a
else:
assert False, "unhandled option"
# ...
if __name__ == "__main__":
main()
Note that an equivalent command line interface could be produced with less code
and more informative help and error messages by using the argparse module:
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-o', '--output')
parser.add_argument('-v', dest='verbose', action='store_true')
args = parser.parse_args()
# ... do something with args.output ...
# ... do something with args.verbose ..
This module defines functions and classes which implement a flexible event
logging system for applications and libraries.
The key benefit of having the logging API provided by a standard library module
is that all Python modules can participate in logging, so your application log
can include your own messages integrated with messages from third-party
modules.
The module provides a lot of functionality and flexibility. If you are
unfamiliar with logging, the best way to get to grips with it is to see the
tutorials (see the links on the right).
The basic classes defined by the module, together with their functions, are
listed below.
Loggers expose the interface that application code directly uses.
Handlers send the log records (created by loggers) to the appropriate
destination.
Filters provide a finer grained facility for determining which log records
to output.
Formatters specify the layout of log records in the final output.
Loggers have the following attributes and methods. Note that Loggers are never
instantiated directly, but always through the module-level function
logging.getLogger(name).
If this evaluates to false, logging messages are not passed by this logger or by
its child loggers to the handlers of higher level (ancestor) loggers. The
constructor sets this attribute to 1.
Sets the threshold for this logger to lvl. Logging messages which are less
severe than lvl will be ignored. When a logger is created, the level is set to
NOTSET (which causes all messages to be processed when the logger is
the root logger, or delegation to the parent when the logger is a non-root
logger). Note that the root logger is created with level WARNING.
The term ‘delegation to the parent’ means that if a logger has a level of
NOTSET, its chain of ancestor loggers is traversed until either an ancestor with
a level other than NOTSET is found, or the root is reached.
If an ancestor is found with a level other than NOTSET, then that ancestor’s
level is treated as the effective level of the logger where the ancestor search
began, and is used to determine how a logging event is handled.
If the root is reached, and it has a level of NOTSET, then all messages will be
processed. Otherwise, the root’s level will be used as the effective level.
Indicates if a message of severity lvl would be processed by this logger.
This method checks first the module-level level set by
logging.disable(lvl) and then the logger’s effective level as determined
by getEffectiveLevel().
Indicates the effective level for this logger. If a value other than
NOTSET has been set using setLevel(), it is returned. Otherwise,
the hierarchy is traversed towards the root until a value other than
NOTSET is found, and that value is returned.
Returns a logger which is a descendant to this logger, as determined by the suffix.
Thus, logging.getLogger('abc').getChild('def.ghi') would return the same
logger as would be returned by logging.getLogger('abc.def.ghi'). This is a
convenience method, useful when the parent logger is named using e.g. __name__
rather than a literal string.
Logs a message with level DEBUG on this logger. The msg is the
message format string, and the args are the arguments which are merged into
msg using the string formatting operator. (Note that this means that you can
use keywords in the format string, together with a single dictionary argument.)
There are three keyword arguments in kwargs which are inspected: exc_info
which, if it does not evaluate as false, causes exception information to be
added to the logging message. If an exception tuple (in the format returned by
sys.exc_info()) is provided, it is used; otherwise, sys.exc_info()
is called to get the exception information.
The second optional keyword argument is stack_info, which defaults to
False. If specified as True, stack information is added to the logging
message, including the actual logging call. Note that this is not the same
stack information as that displayed through specifying exc_info: The
former is stack frames from the bottom of the stack up to the logging call
in the current thread, whereas the latter is information about stack frames
which have been unwound, following an exception, while searching for
exception handlers.
You can specify stack_info independently of exc_info, e.g. to just show
how you got to a certain point in your code, even when no exceptions were
raised. The stack frames are printed following a header line which says:
Stack (most recent call last):
This mimics the Traceback (most recent call last): which is used when
displaying exception frames.
The third keyword argument is extra which can be used to pass a
dictionary which is used to populate the __dict__ of the LogRecord created for
the logging event with user-defined attributes. These custom attributes can then
be used as you like. For example, they could be incorporated into logged
messages. For example:
The keys in the dictionary passed in extra should not clash with the keys used
by the logging system. (See the Formatter documentation for more
information on which keys are used by the logging system.)
If you choose to use these attributes in logged messages, you need to exercise
some care. In the above example, for instance, the Formatter has been
set up with a format string which expects ‘clientip’ and ‘user’ in the attribute
dictionary of the LogRecord. If these are missing, the message will not be
logged because a string formatting exception will occur. So in this case, you
always need to pass the extra dictionary with these keys.
While this might be annoying, this feature is intended for use in specialized
circumstances, such as multi-threaded servers where the same code executes in
many contexts, and interesting conditions which arise are dependent on this
context (such as remote client IP address and authenticated user name, in the
above example). In such circumstances, it is likely that specialized
Formatters would be used with particular Handlers.
New in version 3.2:
New in version 3.2: The stack_info parameter was added.
Logs a message with level ERROR on this logger. The arguments are
interpreted as for debug(). Exception info is added to the logging
message. This method should only be called from an exception handler.
Finds the caller’s source filename and line number. Returns the filename, line
number, function name and stack information as a 4-element tuple. The stack
information is returned as None unless stack_info is True.
Handles a record by passing it to all handlers associated with this logger and
its ancestors (until a false value of propagate is found). This method is used
for unpickled records received from a socket, as well as those created locally.
Logger-level filtering is applied using filter().
Checks to see if this logger has any handlers configured. This is done by
looking for handlers in this logger and its parents in the logger hierarchy.
Returns True if a handler was found, else False. The method stops searching
up the hierarchy whenever a logger with the ‘propagate’ attribute set to
False is found - that will be the last logger which is checked for the
existence of handlers.
Handlers have the following attributes and methods. Note that Handler
is never instantiated directly; this class acts as a base for more useful
subclasses. However, the __init__() method in subclasses needs to call
Handler.__init__().
Initializes the Handler instance by setting its level, setting the list
of filters to the empty list and creating a lock (using createLock()) for
serializing access to an I/O mechanism.
Sets the threshold for this handler to lvl. Logging messages which are less
severe than lvl will be ignored. When a handler is created, the level is set
to NOTSET (which causes all messages to be processed).
Tidy up any resources used by the handler. This version does no output but
removes the handler from an internal list of handlers which is closed when
shutdown() is called. Subclasses should ensure that this gets called
from overridden close() methods.
Conditionally emits the specified logging record, depending on filters which may
have been added to the handler. Wraps the actual emission of the record with
acquisition/release of the I/O thread lock.
This method should be called from handlers when an exception is encountered
during an emit() call. By default it does nothing, which means that
exceptions get silently ignored. This is what is mostly wanted for a logging
system - most users will not care about errors in the logging system, they are
more interested in application errors. You could, however, replace this with a
custom handler if you wish. The specified record is the one which was being
processed when the exception occurred.
Do whatever it takes to actually log the specified logging record. This version
is intended to be implemented by subclasses and so raises a
NotImplementedError.
For a list of handlers included as standard, see logging.handlers.
Formatter objects have the following attributes and methods. They are
responsible for converting a LogRecord to (usually) a string which can
be interpreted by either a human or an external system. The base
Formatter allows a formatting string to be specified. If none is
supplied, the default value of '%(message)s' is used.
A Formatter can be initialized with a format string which makes use of knowledge
of the LogRecord attributes - such as the default value mentioned above
making use of the fact that the user’s message and arguments are pre-formatted
into a LogRecord‘s message attribute. This format string contains
standard Python %-style mapping keys. See section Old String Formatting Operations
for more information on string formatting.
class logging.Formatter(fmt=None, datefmt=None, style='%')¶
Returns a new instance of the Formatter class. The instance is
initialized with a format string for the message as a whole, as well as a
format string for the date/time portion of a message. If no fmt is
specified, '%(message)s' is used. If no datefmt is specified, the
ISO8601 date format is used.
The style parameter can be one of ‘%’, ‘{‘ or ‘$’ and determines how
the format string will be merged with its data: using one of %-formatting,
str.format() or string.Template.
Changed in version 3.2:
Changed in version 3.2: The style parameter was added.
The record’s attribute dictionary is used as the operand to a string
formatting operation. Returns the resulting string. Before formatting the
dictionary, a couple of preparatory steps are carried out. The message
attribute of the record is computed using msg % args. If the
formatting string contains '(asctime)', formatTime() is called
to format the event time. If there is exception information, it is
formatted using formatException() and appended to the message. Note
that the formatted exception information is cached in attribute
exc_text. This is useful because the exception information can be
pickled and sent across the wire, but you should be careful if you have
more than one Formatter subclass which customizes the formatting
of exception information. In this case, you will have to clear the cached
value after a formatter has done its formatting, so that the next
formatter to handle the event doesn’t use the cached value but
recalculates it afresh.
If stack information is available, it’s appended after the exception
information, using formatStack() to transform it if necessary.
This method should be called from format() by a formatter which
wants to make use of a formatted time. This method can be overridden in
formatters to provide for any specific requirement, but the basic behavior
is as follows: if datefmt (a string) is specified, it is used with
time.strftime() to format the creation time of the
record. Otherwise, the ISO8601 format is used. The resulting string is
returned.
This function uses a user-configurable function to convert the creation
time to a tuple. By default, time.localtime() is used; to change
this for a particular formatter instance, set the converter attribute
to a function with the same signature as time.localtime() or
time.gmtime(). To change it for all formatters, for example if you
want all logging times to be shown in GMT, set the converter
attribute in the Formatter class.
Formats the specified exception information (a standard exception tuple as
returned by sys.exc_info()) as a string. This default implementation
just uses traceback.print_exception(). The resulting string is
returned.
Formats the specified stack information (a string as returned by
traceback.print_stack(), but with the last newline removed) as a
string. This default implementation just returns the input value.
Filters can be used by Handlers and Loggers for more sophisticated
filtering than is provided by levels. The base filter class only allows events
which are below a certain point in the logger hierarchy. For example, a filter
initialized with ‘A.B’ will allow events logged by loggers ‘A.B’, ‘A.B.C’,
‘A.B.C.D’, ‘A.B.D’ etc. but not ‘A.BB’, ‘B.A.B’ etc. If initialized with the
empty string, all events are passed.
Returns an instance of the Filter class. If name is specified, it
names a logger which, together with its children, will have its events allowed
through the filter. If name is the empty string, allows every event.
Is the specified record to be logged? Returns zero for no, nonzero for
yes. If deemed appropriate, the record may be modified in-place by this
method.
Note that filters attached to handlers are consulted whenever an event is
emitted by the handler, whereas filters attached to loggers are consulted
whenever an event is logged to the handler (using debug(), info(),
etc.) This means that events which have been generated by descendant loggers
will not be filtered by a logger’s filter setting, unless the filter has also
been applied to those descendant loggers.
You don’t actually need to subclass Filter: you can pass any instance
which has a filter method with the same semantics.
Changed in version 3.2:
Changed in version 3.2: You don’t need to create specialized Filter classes, or use other
classes with a filter method: you can use a function (or other
callable) as a filter. The filtering logic will check to see if the filter
object has a filter attribute: if it does, it’s assumed to be a
Filter and its filter() method is called. Otherwise, it’s
assumed to be a callable and called with the record as the single
parameter. The returned value should conform to that returned by
filter().
Although filters are used primarily to filter records based on more
sophisticated criteria than levels, they get to see every record which is
processed by the handler or logger they’re attached to: this can be useful if
you want to do things like counting how many records were processed by a
particular logger or handler, or adding, changing or removing attributes in
the LogRecord being processed. Obviously changing the LogRecord needs to be
done with some care, but it does allow the injection of contextual information
into logs (see Using Filters to impart contextual information).
LogRecord instances are created automatically by the Logger
every time something is logged, and can be created manually via
makeLogRecord() (for example, from a pickled event received over the
wire).
class logging.LogRecord(name, level, pathname, lineno, msg, args, exc_info, func=None, sinfo=None)¶
Contains all the information pertinent to the event being logged.
The primary information is passed in msg and args, which
are combined using msg%args to create the message field of the
record.
Parameters:
name – The name of the logger used to log the event represented by
this LogRecord.
level – The numeric level of the logging event (one of DEBUG, INFO etc.)
Note that this is converted to two attributes of the LogRecord:
levelno for the numeric value and levelname for the
corresponding level name.
pathname – The full pathname of the source file where the logging call
was made.
lineno – The line number in the source file where the logging call was
made.
msg – The event description message, possibly a format string with
placeholders for variable data.
args – Variable data to merge into the msg argument to obtain the
event description.
exc_info – An exception tuple with the current exception information,
or None if no exception information is available.
func – The name of the function or method from which the logging call
was invoked.
sinfo – A text string representing stack information from the base of
the stack in the current thread, up to the logging call.
Returns the message for this LogRecord instance after merging any
user-supplied arguments with the message. If the user-supplied message
argument to the logging call is not a string, str() is called on it to
convert it to a string. This allows use of user-defined classes as
messages, whose __str__ method can return the actual format string to
be used.
Changed in version 3.2:
Changed in version 3.2: The creation of a LogRecord has been made more configurable by
providing a factory which is used to create the record. The factory can be
set using getLogRecordFactory() and setLogRecordFactory()
(see this for the factory’s signature).
This functionality can be used to inject your own values into a
LogRecord at creation time. You can use the following pattern:
old_factory = logging.getLogRecordFactory()
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
record.custom_attribute = 0xdecafbad
return record
logging.setLogRecordFactory(record_factory)
With this pattern, multiple factories could be chained, and as long
as they don’t overwrite each other’s attributes or unintentionally
overwrite the standard attributes listed above, there should be no
surprises.
The LogRecord has a number of attributes, most of which are derived from the
parameters to the constructor. (Note that the names do not always correspond
exactly between the LogRecord constructor parameters and the LogRecord
attributes.) These attributes can be used to merge data from the record into
the format string. The following table lists (in alphabetical order) the
attribute names, their meanings and the corresponding placeholder in a %-style
format string.
If you are using {}-formatting (str.format()), you can use
{attrname} as the placeholder in the format string. If you are using
$-formatting (string.Template), use the form ${attrname}. In
both cases, of course, replace attrname with the actual attribute name
you want to use.
In the case of {}-formatting, you can specify formatting flags by placing them
after the attribute name, separated from it with a colon. For example: a
placeholder of {msecs:03d} would format a millisecond value of 4 as
004. Refer to the str.format() documentation for full details on
the options available to you.
Attribute name
Format
Description
args
You shouldn’t need to
format this yourself.
The tuple of arguments merged into msg to
produce message.
asctime
%(asctime)s
Human-readable time when the
LogRecord was created. By default
this is of the form ‘2003-07-08 16:49:45,896’
(the numbers after the comma are millisecond
portion of the time).
Exception tuple (à la sys.exc_info) or,
if no exception has occurred, None.
filename
%(filename)s
Filename portion of pathname.
funcName
%(funcName)s
Name of function containing the logging call.
levelname
%(levelname)s
Text logging level for the message
('DEBUG', 'INFO', 'WARNING',
'ERROR', 'CRITICAL').
levelno
%(levelno)s
Numeric logging level for the message
(DEBUG, INFO,
WARNING, ERROR,
CRITICAL).
lineno
%(lineno)d
Source line number where the logging call was
issued (if available).
module
%(module)s
Module (name portion of filename).
msecs
%(msecs)d
Millisecond portion of the time when the
LogRecord was created.
message
%(message)s
The logged message, computed as msg%args. This is set when
Formatter.format() is invoked.
msg
You shouldn’t need to
format this yourself.
The format string passed in the original
logging call. Merged with args to
produce message, or an arbitrary object
(see Using arbitrary objects as messages).
name
%(name)s
Name of the logger used to log the call.
pathname
%(pathname)s
Full pathname of the source file where the
logging call was issued (if available).
process
%(process)d
Process ID (if available).
processName
%(processName)s
Process name (if available).
relativeCreated
%(relativeCreated)d
Time in milliseconds when the LogRecord was
created, relative to the time the logging
module was loaded.
stack_info
You shouldn’t need to
format this yourself.
Stack frame information (where available)
from the bottom of the stack in the current
thread, up to and including the stack frame
of the logging call which resulted in the
creation of this record.
Modifies the message and/or keyword arguments passed to a logging call in
order to insert contextual information. This implementation takes the object
passed as extra to the constructor and adds it to kwargs using key
‘extra’. The return value is a (msg, kwargs) tuple which has the
(possibly modified) versions of the arguments passed in.
In addition to the above, LoggerAdapter supports the following
methods of Logger, i.e. debug(), info(), warning(),
error(), exception(), critical(), log(),
isEnabledFor(), getEffectiveLevel(), setLevel(),
hasHandlers(). These methods have the same signatures as their
counterparts in Logger, so you can use the two types of instances
interchangeably.
Changed in version 3.2:
Changed in version 3.2: The isEnabledFor(), getEffectiveLevel(), setLevel() and
hasHandlers() methods were added to LoggerAdapter. These
methods delegate to the underlying logger.
The logging module is intended to be thread-safe without any special work
needing to be done by its clients. It achieves this though using threading
locks; there is one lock to serialize access to the module’s shared data, and
each handler also creates a lock to serialize access to its underlying I/O.
If you are implementing asynchronous signal handlers using the signal
module, you may not be able to use logging from within such handlers. This is
because lock implementations in the threading module are not always
re-entrant, and so cannot be invoked from such signal handlers.
Return a logger with the specified name or, if name is None, return a
logger which is the root logger of the hierarchy. If specified, the name is
typically a dot-separated hierarchical name like ‘a’, ‘a.b’ or ‘a.b.c.d’.
Choice of these names is entirely up to the developer who is using logging.
All calls to this function with a given name return the same logger instance.
This means that logger instances never need to be passed between different parts
of an application.
Return either the standard Logger class, or the last class passed to
setLoggerClass(). This function may be called from within a new class
definition, to ensure that installing a customised Logger class will
not undo customisations already applied by other code. For example:
class MyLogger(logging.getLoggerClass()):
# ... override behaviour here
Return a callable which is used to create a LogRecord.
New in version 3.2:
New in version 3.2: This function has been provided, along with setLogRecordFactory(),
to allow developers more control over how the LogRecord
representing a logging event is constructed.
Logs a message with level DEBUG on the root logger. The msg is the
message format string, and the args are the arguments which are merged into
msg using the string formatting operator. (Note that this means that you can
use keywords in the format string, together with a single dictionary argument.)
There are three keyword arguments in kwargs which are inspected: exc_info
which, if it does not evaluate as false, causes exception information to be
added to the logging message. If an exception tuple (in the format returned by
sys.exc_info()) is provided, it is used; otherwise, sys.exc_info()
is called to get the exception information.
The second optional keyword argument is stack_info, which defaults to
False. If specified as True, stack information is added to the logging
message, including the actual logging call. Note that this is not the same
stack information as that displayed through specifying exc_info: The
former is stack frames from the bottom of the stack up to the logging call
in the current thread, whereas the latter is information about stack frames
which have been unwound, following an exception, while searching for
exception handlers.
You can specify stack_info independently of exc_info, e.g. to just show
how you got to a certain point in your code, even when no exceptions were
raised. The stack frames are printed following a header line which says:
Stack (most recent call last):
This mimics the Traceback (most recent call last): which is used when
displaying exception frames.
The third optional keyword argument is extra which can be used to pass a
dictionary which is used to populate the __dict__ of the LogRecord created for
the logging event with user-defined attributes. These custom attributes can then
be used as you like. For example, they could be incorporated into logged
messages. For example:
FORMAT = '%(asctime)-15s %(clientip)s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logging.warning('Protocol problem: %s', 'connection reset', extra=d)
The keys in the dictionary passed in extra should not clash with the keys used
by the logging system. (See the Formatter documentation for more
information on which keys are used by the logging system.)
If you choose to use these attributes in logged messages, you need to exercise
some care. In the above example, for instance, the Formatter has been
set up with a format string which expects ‘clientip’ and ‘user’ in the attribute
dictionary of the LogRecord. If these are missing, the message will not be
logged because a string formatting exception will occur. So in this case, you
always need to pass the extra dictionary with these keys.
While this might be annoying, this feature is intended for use in specialized
circumstances, such as multi-threaded servers where the same code executes in
many contexts, and interesting conditions which arise are dependent on this
context (such as remote client IP address and authenticated user name, in the
above example). In such circumstances, it is likely that specialized
Formatters would be used with particular Handlers.
New in version 3.2:
New in version 3.2: The stack_info parameter was added.
Logs a message with level ERROR on the root logger. The arguments are
interpreted as for debug(). Exception info is added to the logging
message. This function should only be called from an exception handler.
Logs a message with level level on the root logger. The other arguments are
interpreted as for debug().
PLEASE NOTE: The above module-level functions which delegate to the root
logger should not be used in threads, in versions of Python earlier than
2.7.1 and 3.2, unless at least one handler has been added to the root
logger before the threads are started. These convenience functions call
basicConfig() to ensure that at least one handler is available; in
earlier versions of Python, this can (under rare circumstances) lead to
handlers being added multiple times to the root logger, which can in turn
lead to multiple messages for the same event.
Provides an overriding level lvl for all loggers which takes precedence over
the logger’s own level. When the need arises to temporarily throttle logging
output down across the whole application, this function can be useful. Its
effect is to disable all logging calls of severity lvl and below, so that
if you call it with a value of INFO, then all INFO and DEBUG events would be
discarded, whereas those of severity WARNING and above would be processed
according to the logger’s effective level.
Associates level lvl with text levelName in an internal dictionary, which is
used to map numeric levels to a textual representation, for example when a
Formatter formats a message. This function can also be used to define
your own levels. The only constraints are that all levels used must be
registered using this function, levels should be positive integers and they
should increase in increasing order of severity.
NOTE: If you are thinking of defining your own levels, please see the section
on Custom Levels.
Returns the textual representation of logging level lvl. If the level is one
of the predefined levels CRITICAL, ERROR, WARNING,
INFO or DEBUG then you get the corresponding string. If you
have associated levels with names using addLevelName() then the name you
have associated with lvl is returned. If a numeric value corresponding to one
of the defined levels is passed in, the corresponding string representation is
returned. Otherwise, the string ‘Level %s’ % lvl is returned.
Creates and returns a new LogRecord instance whose attributes are
defined by attrdict. This function is useful for taking a pickled
LogRecord attribute dictionary, sent over a socket, and reconstituting
it as a LogRecord instance at the receiving end.
This function does nothing if the root logger already has handlers
configured for it.
PLEASE NOTE: This function should be called from the main thread
before other threads are started. In versions of Python prior to
2.7.1 and 3.2, if this function is called from multiple threads,
it is possible (in rare circumstances) that a handler will be added
to the root logger more than once, leading to unexpected results
such as messages being duplicated in the log.
The following keyword arguments are supported.
Format
Description
filename
Specifies that a FileHandler be created,
using the specified filename, rather than a
StreamHandler.
filemode
Specifies the mode to open the file, if
filename is specified (if filemode is
unspecified, it defaults to ‘a’).
format
Use the specified format string for the
handler.
datefmt
Use the specified date/time format.
style
If format is specified, use this style
for the format string. One of ‘%’, ‘{‘ or
‘$’ for %-formatting, str.format() or
string.Template respectively, and
defaulting to ‘%’ if not specified.
level
Set the root logger level to the specified
level.
stream
Use the specified stream to initialize the
StreamHandler. Note that this argument is
incompatible with ‘filename’ - if both are
present, ‘stream’ is ignored.
Changed in version 3.2:
Changed in version 3.2: The style argument was added.
Informs the logging system to perform an orderly shutdown by flushing and
closing all handlers. This should be called at application exit and no
further use of the logging system should be made after this call.
Tells the logging system to use the class klass when instantiating a logger.
The class should define __init__() such that only a name argument is
required, and the __init__() should call Logger.__init__(). This
function is typically called before any loggers are instantiated by applications
which need to use custom logger behavior.
Set a callable which is used to create a LogRecord.
Parameters:
factory – The factory callable to be used to instantiate a log record.
New in version 3.2:
New in version 3.2: This function has been provided, along with getLogRecordFactory(), to
allow developers more control over how the LogRecord representing
a logging event is constructed.
This function is used to turn the capture of warnings by logging on and
off.
If capture is True, warnings issued by the warnings module will
be redirected to the logging system. Specifically, a warning will be
formatted using warnings.formatwarning() and the resulting string
logged to a logger named ‘py.warnings’ with a severity of WARNING.
If capture is False, the redirection of warnings to the logging system
will stop, and warnings will be redirected to their original destinations
(i.e. those in effect before captureWarnings(True) was called).
This is the original source for the logging package. The version of the
package available from this site is suitable for use with Python 1.5.2, 2.1.x
and 2.2.x, which do not include the logging package in the standard
library.
The following functions configure the logging module. They are located in the
logging.config module. Their use is optional — you can configure the
logging module using these functions or by making calls to the main API (defined
in logging itself) and defining handlers which are declared either in
logging or logging.handlers.
Takes the logging configuration from a dictionary. The contents of
this dictionary are described in Configuration dictionary schema
below.
If an error is encountered during configuration, this function will
raise a ValueError, TypeError, AttributeError
or ImportError with a suitably descriptive message. The
following is a (possibly incomplete) list of conditions which will
raise an error:
A level which is not a string or which is a string not
corresponding to an actual logging level.
A propagate value which is not a boolean.
An id which does not have a corresponding destination.
A non-existent handler id found during an incremental call.
An invalid logger name.
Inability to resolve to an internal or external object.
Parsing is performed by the DictConfigurator class, whose
constructor is passed the dictionary used for configuration, and
has a configure() method. The logging.config module
has a callable attribute dictConfigClass
which is initially set to DictConfigurator.
You can replace the value of dictConfigClass with a
suitable implementation of your own.
dictConfig() calls dictConfigClass passing
the specified dictionary, and then calls the configure() method on
the returned object to put the configuration into effect:
For example, a subclass of DictConfigurator could call
DictConfigurator.__init__() in its own __init__(), then
set up custom prefixes which would be usable in the subsequent
configure() call. dictConfigClass would be bound to
this new subclass, and then dictConfig() could be called exactly as
in the default, uncustomized state.
Reads the logging configuration from a configparser-format file
named fname. This function can be called several times from an
application, allowing an end user to select from various pre-canned
configurations (if the developer provides a mechanism to present the choices
and load the chosen configuration).
Parameters:
defaults – Defaults to be passed to the ConfigParser can be specified
in this argument.
disable_existing_loggers – If specified as False, loggers which
exist when this call is made are left
alone. The default is True because this
enables old behaviour in a backward-
compatible way. This behaviour is to
disable any existing loggers unless they or
their ancestors are explicitly named in the
logging configuration.
Starts up a socket server on the specified port, and listens for new
configurations. If no port is specified, the module’s default
DEFAULT_LOGGING_CONFIG_PORT is used. Logging configurations will be
sent as a file suitable for processing by fileConfig(). Returns a
Thread instance on which you can call start() to start the
server, and which you can join() when appropriate. To stop the server,
call stopListening().
To send a configuration to the socket, read in the configuration file and
send it to the socket as a string of bytes preceded by a four-byte length
string packed in binary using struct.pack('>L',n).
Stops the listening server which was created with a call to listen().
This is typically called before calling join() on the return value from
listen().
Describing a logging configuration requires listing the various
objects to create and the connections between them; for example, you
may create a handler named ‘console’ and then say that the logger
named ‘startup’ will send its messages to the ‘console’ handler.
These objects aren’t limited to those provided by the logging
module because you might write your own formatter or handler class.
The parameters to these classes may also need to include external
objects such as sys.stderr. The syntax for describing these
objects and connections is defined in Object connections
below.
The dictionary passed to dictConfig() must contain the following
keys:
version - to be set to an integer value representing the schema
version. The only valid value at present is 1, but having this key
allows the schema to evolve while still preserving backwards
compatibility.
All other keys are optional, but if present they will be interpreted
as described below. In all cases below where a ‘configuring dict’ is
mentioned, it will be checked for the special '()' key to see if a
custom instantiation is required. If so, the mechanism described in
User-defined objects below is used to create an instance;
otherwise, the context is used to determine what to instantiate.
formatters - the corresponding value will be a dict in which each
key is a formatter id and each value is a dict describing how to
configure the corresponding Formatter instance.
The configuring dict is searched for keys format and datefmt
(with defaults of None) and these are used to construct a
logging.Formatter instance.
filters - the corresponding value will be a dict in which each key
is a filter id and each value is a dict describing how to configure
the corresponding Filter instance.
The configuring dict is searched for the key name (defaulting to the
empty string) and this is used to construct a logging.Filter
instance.
handlers - the corresponding value will be a dict in which each
key is a handler id and each value is a dict describing how to
configure the corresponding Handler instance.
The configuring dict is searched for the following keys:
class (mandatory). This is the fully qualified name of the
handler class.
level (optional). The level of the handler.
formatter (optional). The id of the formatter for this
handler.
filters (optional). A list of ids of the filters for this
handler.
All other keys are passed through as keyword arguments to the
handler’s constructor. For example, given the snippet:
handlers:
console:
class : logging.StreamHandler
formatter: brief
level : INFO
filters: [allow_foo]
stream : ext://sys.stdout
file:
class : logging.handlers.RotatingFileHandler
formatter: precise
filename: logconfig.log
maxBytes: 1024
backupCount: 3
the handler with id console is instantiated as a
logging.StreamHandler, using sys.stdout as the underlying
stream. The handler with id file is instantiated as a
logging.handlers.RotatingFileHandler with the keyword arguments
filename='logconfig.log',maxBytes=1024,backupCount=3.
loggers - the corresponding value will be a dict in which each key
is a logger name and each value is a dict describing how to
configure the corresponding Logger instance.
The configuring dict is searched for the following keys:
level (optional). The level of the logger.
propagate (optional). The propagation setting of the logger.
filters (optional). A list of ids of the filters for this
logger.
handlers (optional). A list of ids of the handlers for this
logger.
The specified loggers will be configured according to the level,
propagation, filters and handlers specified.
root - this will be the configuration for the root logger.
Processing of the configuration will be as for any logger, except
that the propagate setting will not be applicable.
incremental - whether the configuration is to be interpreted as
incremental to the existing configuration. This value defaults to
False, which means that the specified configuration replaces the
existing configuration with the same semantics as used by the
existing fileConfig() API.
If the specified value is True, the configuration is processed
as described in the section on Incremental Configuration.
disable_existing_loggers - whether any existing loggers are to be
disabled. This setting mirrors the parameter of the same name in
fileConfig(). If absent, this parameter defaults to True.
This value is ignored if incremental is True.
It is difficult to provide complete flexibility for incremental
configuration. For example, because objects such as filters
and formatters are anonymous, once a configuration is set up, it is
not possible to refer to such anonymous objects when augmenting a
configuration.
Furthermore, there is not a compelling case for arbitrarily altering
the object graph of loggers, handlers, filters, formatters at
run-time, once a configuration is set up; the verbosity of loggers and
handlers can be controlled just by setting levels (and, in the case of
loggers, propagation flags). Changing the object graph arbitrarily in
a safe way is problematic in a multi-threaded environment; while not
impossible, the benefits are not worth the complexity it adds to the
implementation.
Thus, when the incremental key of a configuration dict is present
and is True, the system will completely ignore any formatters and
filters entries, and process only the level
settings in the handlers entries, and the level and
propagate settings in the loggers and root entries.
Using a value in the configuration dict lets configurations to be sent
over the wire as pickled dicts to a socket listener. Thus, the logging
verbosity of a long-running application can be altered over time with
no need to stop and restart the application.
The schema describes a set of logging objects - loggers,
handlers, formatters, filters - which are connected to each other in
an object graph. Thus, the schema needs to represent connections
between the objects. For example, say that, once configured, a
particular logger has attached to it a particular handler. For the
purposes of this discussion, we can say that the logger represents the
source, and the handler the destination, of a connection between the
two. Of course in the configured objects this is represented by the
logger holding a reference to the handler. In the configuration dict,
this is done by giving each destination object an id which identifies
it unambiguously, and then using the id in the source object’s
configuration to indicate that a connection exists between the source
and the destination object with that id.
So, for example, consider the following YAML snippet:
formatters:
brief:
# configuration for formatter with id 'brief' goes here
precise:
# configuration for formatter with id 'precise' goes here
handlers:
h1: #This is an id
# configuration of handler with id 'h1' goes here
formatter: brief
h2: #This is another id
# configuration of handler with id 'h2' goes here
formatter: precise
loggers:
foo.bar.baz:
# other configuration for logger 'foo.bar.baz'
handlers: [h1, h2]
(Note: YAML used here because it’s a little more readable than the
equivalent Python source form for the dictionary.)
The ids for loggers are the logger names which would be used
programmatically to obtain a reference to those loggers, e.g.
foo.bar.baz. The ids for Formatters and Filters can be any string
value (such as brief, precise above) and they are transient,
in that they are only meaningful for processing the configuration
dictionary and used to determine connections between objects, and are
not persisted anywhere when the configuration call is complete.
The above snippet indicates that logger named foo.bar.baz should
have two handlers attached to it, which are described by the handler
ids h1 and h2. The formatter for h1 is that described by id
brief, and the formatter for h2 is that described by id
precise.
The schema supports user-defined objects for handlers, filters and
formatters. (Loggers do not need to have different types for
different instances, so there is no support in this configuration
schema for user-defined logger classes.)
Objects to be configured are described by dictionaries
which detail their configuration. In some places, the logging system
will be able to infer from the context how an object is to be
instantiated, but when a user-defined object is to be instantiated,
the system will not know how to do this. In order to provide complete
flexibility for user-defined object instantiation, the user needs
to provide a ‘factory’ - a callable which is called with a
configuration dictionary and which returns the instantiated object.
This is signalled by an absolute import path to the factory being
made available under the special key '()'. Here’s a concrete
example:
The above YAML snippet defines three formatters. The first, with id
brief, is a standard logging.Formatter instance with the
specified format string. The second, with id default, has a
longer format and also defines the time format explicitly, and will
result in a logging.Formatter initialized with those two format
strings. Shown in Python source form, the brief and default
formatters have configuration sub-dictionaries:
respectively, and as these dictionaries do not contain the special key
'()', the instantiation is inferred from the context: as a result,
standard logging.Formatter instances are created. The
configuration sub-dictionary for the third formatter, with id
custom, is:
and this contains the special key '()', which means that
user-defined instantiation is wanted. In this case, the specified
factory callable will be used. If it is an actual callable it will be
used directly - otherwise, if you specify a string (as in the example)
the actual callable will be located using normal import mechanisms.
The callable will be called with the remaining items in the
configuration sub-dictionary as keyword arguments. In the above
example, the formatter with id custom will be assumed to be
returned by the call:
The key '()' has been used as the special key because it is not a
valid keyword parameter name, and so will not clash with the names of
the keyword arguments used in the call. The '()' also serves as a
mnemonic that the corresponding value is a callable.
There are times where a configuration needs to refer to objects
external to the configuration, for example sys.stderr. If the
configuration dict is constructed using Python code, this is
straightforward, but a problem arises when the configuration is
provided via a text file (e.g. JSON, YAML). In a text file, there is
no standard way to distinguish sys.stderr from the literal string
'sys.stderr'. To facilitate this distinction, the configuration
system looks for certain special prefixes in string values and
treat them specially. For example, if the literal string
'ext://sys.stderr' is provided as a value in the configuration,
then the ext:// will be stripped off and the remainder of the
value processed using normal import mechanisms.
The handling of such prefixes is done in a way analogous to protocol
handling: there is a generic mechanism to look for prefixes which
match the regular expression ^(?P<prefix>[a-z]+)://(?P<suffix>.*)$
whereby, if the prefix is recognised, the suffix is processed
in a prefix-dependent manner and the result of the processing replaces
the string value. If the prefix is not recognised, then the string
value will be left as-is.
As well as external objects, there is sometimes also a need to refer
to objects in the configuration. This will be done implicitly by the
configuration system for things that it knows about. For example, the
string value 'DEBUG' for a level in a logger or handler will
automatically be converted to the value logging.DEBUG, and the
handlers, filters and formatter entries will take an
object id and resolve to the appropriate destination object.
However, a more generic mechanism is needed for user-defined
objects which are not known to the logging module. For
example, consider logging.handlers.MemoryHandler, which takes
a target argument which is another handler to delegate to. Since
the system already knows about this class, then in the configuration,
the given target just needs to be the object id of the relevant
target handler, and the system will resolve to the handler from the
id. If, however, a user defines a my.package.MyHandler which has
an alternate handler, the configuration system would not know that
the alternate referred to a handler. To cater for this, a generic
resolution system allows the user to specify:
handlers:
file:
# configuration of file handler goes here
custom:
(): my.package.MyHandler
alternate: cfg://handlers.file
The literal string 'cfg://handlers.file' will be resolved in an
analogous way to strings with the ext:// prefix, but looking
in the configuration itself rather than the import namespace. The
mechanism allows access by dot or by index, in a similar way to
that provided by str.format. Thus, given the following snippet:
handlers:
email:
class: logging.handlers.SMTPHandler
mailhost: localhost
fromaddr: my_app@domain.tld
toaddrs:
- support_team@domain.tld
- dev_team@domain.tld
subject: Houston, we have a problem.
in the configuration, the string 'cfg://handlers' would resolve to
the dict with key handlers, the string 'cfg://handlers.email
would resolve to the dict with key email in the handlers dict,
and so on. The string 'cfg://handlers.email.toaddrs[1] would
resolve to 'dev_team.domain.tld' and the string
'cfg://handlers.email.toaddrs[0]' would resolve to the value
'support_team@domain.tld'. The subject value could be accessed
using either 'cfg://handlers.email.subject' or, equivalently,
'cfg://handlers.email[subject]'. The latter form only needs to be
used if the key contains spaces or non-alphanumeric characters. If an
index value consists only of decimal digits, access will be attempted
using the corresponding integer value, falling back to the string
value if needed.
Given a string cfg://handlers.myhandler.mykey.123, this will
resolve to config_dict['handlers']['myhandler']['mykey']['123'].
If the string is specified as cfg://handlers.myhandler.mykey[123],
the system will attempt to retrieve the value from
config_dict['handlers']['myhandler']['mykey'][123], and fall back
to config_dict['handlers']['myhandler']['mykey']['123'] if that
fails.
Import resolution, by default, uses the builtin __import__() function
to do its importing. You may want to replace this with your own importing
mechanism: if so, you can replace the importer attribute of the
DictConfigurator or its superclass, the
BaseConfigurator class. However, you need to be
careful because of the way functions are accessed from classes via
descriptors. If you are using a Python callable to do your imports, and you
want to define it at class level rather than instance level, you need to wrap
it with staticmethod(). For example:
from importlib import import_module
from logging.config import BaseConfigurator
BaseConfigurator.importer = staticmethod(import_module)
You don’t need to wrap with staticmethod() if you’re setting the import
callable on a configurator instance.
The configuration file format understood by fileConfig() is based on
configparser functionality. The file must contain sections called
[loggers], [handlers] and [formatters] which identify by name the
entities of each type which are defined in the file. For each such entity, there
is a separate section which identifies how that entity is configured. Thus, for
a logger named log01 in the [loggers] section, the relevant
configuration details are held in a section [logger_log01]. Similarly, a
handler called hand01 in the [handlers] section will have its
configuration held in a section called [handler_hand01], while a formatter
called form01 in the [formatters] section will have its configuration
specified in a section called [formatter_form01]. The root logger
configuration must be specified in a section called [logger_root].
Examples of these sections in the file are given below.
The root logger must specify a level and a list of handlers. An example of a
root logger section is given below.
[logger_root]
level=NOTSET
handlers=hand01
The level entry can be one of DEBUG,INFO,WARNING,ERROR,CRITICAL or
NOTSET. For the root logger only, NOTSET means that all messages will be
logged. Level values are eval()uated in the context of the logging
package’s namespace.
The handlers entry is a comma-separated list of handler names, which must
appear in the [handlers] section. These names must appear in the
[handlers] section and have corresponding sections in the configuration
file.
For loggers other than the root logger, some additional information is required.
This is illustrated by the following example.
The level and handlers entries are interpreted as for the root logger,
except that if a non-root logger’s level is specified as NOTSET, the system
consults loggers higher up the hierarchy to determine the effective level of the
logger. The propagate entry is set to 1 to indicate that messages must
propagate to handlers higher up the logger hierarchy from this logger, or 0 to
indicate that messages are not propagated to handlers up the hierarchy. The
qualname entry is the hierarchical channel name of the logger, that is to
say the name used by the application to get the logger.
Sections which specify handler configuration are exemplified by the following.
The class entry indicates the handler’s class (as determined by eval()
in the logging package’s namespace). The level is interpreted as for
loggers, and NOTSET is taken to mean ‘log everything’.
The formatter entry indicates the key name of the formatter for this
handler. If blank, a default formatter (logging._defaultFormatter) is used.
If a name is specified, it must appear in the [formatters] section and have
a corresponding section in the configuration file.
The args entry, when eval()uated in the context of the logging
package’s namespace, is the list of arguments to the constructor for the handler
class. Refer to the constructors for the relevant handlers, or to the examples
below, to see how typical entries are constructed.
The format entry is the overall format string, and the datefmt entry is
the strftime()-compatible date/time format string. If empty, the
package substitutes ISO8601 format date/times, which is almost equivalent to
specifying the date format string '%Y-%m-%d%H:%M:%S'. The ISO8601 format
also specifies milliseconds, which are appended to the result of using the above
format string, with a comma separator. An example time in ISO8601 format is
2003-01-2300:29:50,411.
The class entry is optional. It indicates the name of the formatter’s class
(as a dotted module and class name.) This option is useful for instantiating a
Formatter subclass. Subclasses of Formatter can present
exception tracebacks in an expanded or condensed format.
The following useful handlers are provided in the package. Note that three of
the handlers (StreamHandler, FileHandler and
NullHandler) are actually defined in the logging module itself,
but have been documented here along with the other handlers.
The StreamHandler class, located in the core logging package,
sends logging output to streams such as sys.stdout, sys.stderr or any
file-like object (or, more precisely, any object which supports write()
and flush() methods).
Returns a new instance of the StreamHandler class. If stream is
specified, the instance will use it for logging output; otherwise, sys.stderr
will be used.
If a formatter is specified, it is used to format the record. The record
is then written to the stream with a terminator. If exception information
is present, it is formatted using traceback.print_exception() and
appended to the stream.
Flushes the stream by calling its flush() method. Note that the
close() method is inherited from Handler and so does
no output, so an explicit flush() call may be needed at times.
Changed in version 3.2:
Changed in version 3.2: The StreamHandler class now has a terminator attribute, default
value '\n', which is used as the terminator when writing a formatted
record to a stream. If you don’t want this newline termination, you can
set the handler instance’s terminator attribute to the empty string.
In earlier versions, the terminator was hardcoded as '\n'.
The FileHandler class, located in the core logging package,
sends logging output to a disk file. It inherits the output functionality from
StreamHandler.
class logging.FileHandler(filename, mode='a', encoding=None, delay=False)¶
Returns a new instance of the FileHandler class. The specified file is
opened and used as the stream for logging. If mode is not specified,
'a' is used. If encoding is not None, it is used to open the file
with that encoding. If delay is true, then file opening is deferred until the
first call to emit(). By default, the file grows indefinitely.
The NullHandler class, located in the core logging package,
does not do any formatting or output. It is essentially a ‘no-op’ handler
for use by library developers.
The WatchedFileHandler class, located in the logging.handlers
module, is a FileHandler which watches the file it is logging to. If
the file changes, it is closed and reopened using the file name.
A file change can happen because of usage of programs such as newsyslog and
logrotate which perform log file rotation. This handler, intended for use
under Unix/Linux, watches the file to see if it has changed since the last emit.
(A file is deemed to have changed if its device or inode have changed.) If the
file has changed, the old file stream is closed, and the file opened to get a
new stream.
This handler is not appropriate for use under Windows, because under Windows
open log files cannot be moved or renamed - logging opens the files with
exclusive locks - and so there is no need for such a handler. Furthermore,
ST_INO is not supported under Windows; stat() always returns zero for
this value.
class logging.handlers.WatchedFileHandler(filename[, mode[, encoding[, delay]]])¶
Returns a new instance of the WatchedFileHandler class. The specified
file is opened and used as the stream for logging. If mode is not specified,
'a' is used. If encoding is not None, it is used to open the file
with that encoding. If delay is true, then file opening is deferred until the
first call to emit(). By default, the file grows indefinitely.
Outputs the record to the file, but first checks to see if the file has
changed. If it has, the existing stream is flushed and closed and the
file opened again, before outputting the record to the file.
class logging.handlers.RotatingFileHandler(filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=0)¶
Returns a new instance of the RotatingFileHandler class. The specified
file is opened and used as the stream for logging. If mode is not specified,
'a' is used. If encoding is not None, it is used to open the file
with that encoding. If delay is true, then file opening is deferred until the
first call to emit(). By default, the file grows indefinitely.
You can use the maxBytes and backupCount values to allow the file to
rollover at a predetermined size. When the size is about to be exceeded,
the file is closed and a new file is silently opened for output. Rollover occurs
whenever the current log file is nearly maxBytes in length; if maxBytes is
zero, rollover never occurs. If backupCount is non-zero, the system will save
old log files by appending the extensions ‘.1’, ‘.2’ etc., to the filename. For
example, with a backupCount of 5 and a base file name of app.log, you
would get app.log, app.log.1, app.log.2, up to
app.log.5. The file being written to is always app.log. When
this file is filled, it is closed and renamed to app.log.1, and if files
app.log.1, app.log.2, etc. exist, then they are renamed to
app.log.2, app.log.3 etc. respectively.
class logging.handlers.TimedRotatingFileHandler(filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False)¶
Returns a new instance of the TimedRotatingFileHandler class. The
specified file is opened and used as the stream for logging. On rotating it also
sets the filename suffix. Rotating happens based on the product of when and
interval.
You can use the when to specify the type of interval. The list of possible
values is below. Note that they are not case sensitive.
Value
Type of interval
'S'
Seconds
'M'
Minutes
'H'
Hours
'D'
Days
'W'
Week day (0=Monday)
'midnight'
Roll over at midnight
The system will save old log files by appending extensions to the filename.
The extensions are date-and-time based, using the strftime format
%Y-%m-%d_%H-%M-%S or a leading portion thereof, depending on the
rollover interval.
When computing the next rollover time for the first time (when the handler
is created), the last modification time of an existing log file, or else
the current time, is used to compute when the next rotation will occur.
If the utc argument is true, times in UTC will be used; otherwise
local time is used.
If backupCount is nonzero, at most backupCount files
will be kept, and if more would be created when rollover occurs, the oldest
one is deleted. The deletion logic uses the interval to determine which
files to delete, so changing the interval may leave old files lying around.
If delay is true, then file opening is deferred until the first call to
emit().
Pickles the record’s attribute dictionary and writes it to the socket in
binary format. If there is an error with the socket, silently drops the
packet. If the connection was previously lost, re-establishes the
connection. To unpickle the record at the receiving end into a
LogRecord, use the makeLogRecord() function.
Handles an error which has occurred during emit(). The most likely
cause is a lost connection. Closes the socket so that we can retry on the
next event.
This is a factory method which allows subclasses to define the precise
type of socket they want. The default implementation creates a TCP socket
(socket.SOCK_STREAM).
Pickles the record’s attribute dictionary in binary format with a length
prefix, and returns it ready for transmission across the socket.
Note that pickles aren’t completely secure. If you are concerned about
security, you may want to override this method to implement a more secure
mechanism. For example, you can sign pickles using HMAC and then verify
them on the receiving end, or alternatively you can disable unpickling of
global objects on the receiving end.
Tries to create a socket; on failure, uses an exponential back-off
algorithm. On intial failure, the handler will drop the message it was
trying to send. When subsequent messages are handled by the same
instance, it will not try connecting until some time has passed. The
default parameters are such that the initial delay is one second, and if
after that delay the connection still can’t be made, the handler will
double the delay each time up to a maximum of 30 seconds.
This behaviour is controlled by the following handler attributes:
retryStart (initial delay, defaulting to 1.0 seconds).
retryFactor (multiplier, defaulting to 2.0).
retryMax (maximum delay, defaulting to 30.0 seconds).
This means that if the remote listener starts up after the handler has
been used, you could lose messages (since the handler won’t even attempt
a connection until the delay has elapsed, but just silently drop messages
during the delay period).
Pickles the record’s attribute dictionary and writes it to the socket in
binary format. If there is an error with the socket, silently drops the
packet. To unpickle the record at the receiving end into a
LogRecord, use the makeLogRecord() function.
The SysLogHandler class, located in the logging.handlers module,
supports sending logging messages to a remote or local Unix syslog.
class logging.handlers.SysLogHandler(address=('localhost', SYSLOG_UDP_PORT), facility=LOG_USER, socktype=socket.SOCK_DGRAM)¶
Returns a new instance of the SysLogHandler class intended to
communicate with a remote Unix machine whose address is given by address in
the form of a (host,port) tuple. If address is not specified,
('localhost',514) is used. The address is used to open a socket. An
alternative to providing a (host,port) tuple is providing an address as a
string, for example ‘/dev/log’. In this case, a Unix domain socket is used to
send the message to the syslog. If facility is not specified,
LOG_USER is used. The type of socket opened depends on the
socktype argument, which defaults to socket.SOCK_DGRAM and thus
opens a UDP socket. To open a TCP socket (for use with the newer syslog
daemons such as rsyslog), specify a value of socket.SOCK_STREAM.
Note that if your server is not listening on UDP port 514,
SysLogHandler may appear not to work. In that case, check what
address you should be using for a domain socket - it’s system dependent.
For example, on Linux it’s usually ‘/dev/log’ but on OS/X it’s
‘/var/run/syslog’. You’ll need to check your platform and use the
appropriate address (you may need to do this check at runtime if your
application needs to run on several platforms). On Windows, you pretty
much have to use the UDP option.
The record is formatted, and then sent to the syslog server. If exception
information is present, it is not sent to the server.
Changed in version 3.2.1:
Changed in version 3.2.1: (See: issue 12168.) In earlier versions, the message sent to the
syslog daemons was always terminated with a NUL byte, because early
versions of these daemons expected a NUL terminated message - even
though it’s not in the relevant specification (RF 5424). More recent
versions of these daemons don’t expect the NUL byte but strip it off
if it’s there, and even more recent daemons (which adhere more closely
to RFC 5424) pass the NUL byte on as part of the message.
To enable easier handling of syslog messages in the face of all these
differing daemon behaviours, the appending of the NUL byte has been
made configurable, through the use of a class-level attribute,
append_nul. This defaults to True (preserving the existing
behaviour) but can be set to False on a SysLogHandler instance
in order for that instance to not append the NUL terminator.
Encodes the facility and priority into an integer. You can pass in strings
or integers - if strings are passed, internal mapping dictionaries are
used to convert them to integers.
The symbolic LOG_ values are defined in SysLogHandler and
mirror the values defined in the sys/syslog.h header file.
Maps a logging level name to a syslog priority name.
You may need to override this if you are using custom levels, or
if the default algorithm is not suitable for your needs. The
default algorithm maps DEBUG, INFO, WARNING, ERROR and
CRITICAL to the equivalent syslog names, and all other level
names to ‘warning’.
The NTEventLogHandler class, located in the logging.handlers
module, supports sending logging messages to a local Windows NT, Windows 2000 or
Windows XP event log. Before you can use it, you need Mark Hammond’s Win32
extensions for Python installed.
class logging.handlers.NTEventLogHandler(appname, dllname=None, logtype='Application')¶
Returns a new instance of the NTEventLogHandler class. The appname is
used to define the application name as it appears in the event log. An
appropriate registry entry is created using this name. The dllname should give
the fully qualified pathname of a .dll or .exe which contains message
definitions to hold in the log (if not specified, 'win32service.pyd' is used
- this is installed with the Win32 extensions and contains some basic
placeholder message definitions. Note that use of these placeholders will make
your event logs big, as the entire message source is held in the log. If you
want slimmer logs, you have to pass in the name of your own .dll or .exe which
contains the message definitions you want to use in the event log). The
logtype is one of 'Application', 'System' or 'Security', and
defaults to 'Application'.
At this point, you can remove the application name from the registry as a
source of event log entries. However, if you do this, you will not be able
to see the events as you intended in the Event Log Viewer - it needs to be
able to access the registry to get the .dll name. The current version does
not do this.
Returns the event type for the record. Override this if you want to
specify your own types. This version does a mapping using the handler’s
typemap attribute, which is set up in __init__() to a dictionary
which contains mappings for DEBUG, INFO,
WARNING, ERROR and CRITICAL. If you are using
your own levels, you will either need to override this method or place a
suitable dictionary in the handler’s typemap attribute.
Returns the message ID for the record. If you are using your own messages,
you could do this by having the msg passed to the logger being an ID
rather than a format string. Then, in here, you could use a dictionary
lookup to get the message ID. This version returns 1, which is the base
message ID in win32service.pyd.
The SMTPHandler class, located in the logging.handlers module,
supports sending logging messages to an email address via SMTP.
class logging.handlers.SMTPHandler(mailhost, fromaddr, toaddrs, subject, credentials=None, secure=None)¶
Returns a new instance of the SMTPHandler class. The instance is
initialized with the from and to addresses and subject line of the email. The
toaddrs should be a list of strings. To specify a non-standard SMTP port, use
the (host, port) tuple format for the mailhost argument. If you use a string,
the standard SMTP port is used. If your SMTP server requires authentication, you
can specify a (username, password) tuple for the credentials argument.
To specify the use of a secure protocol (TLS), pass in a tuple to the
secure argument. This will only be used when authentication credentials are
supplied. The tuple should be either an empty tuple, or a single-value tuple
with the name of a keyfile, or a 2-value tuple with the names of the keyfile
and certificate file. (This tuple is passed to the
smtplib.SMTP.starttls() method.)
The MemoryHandler class, located in the logging.handlers module,
supports buffering of logging records in memory, periodically flushing them to a
target handler. Flushing occurs whenever the buffer is full, or when an
event of a certain severity or greater is seen.
MemoryHandler is a subclass of the more general
BufferingHandler, which is an abstract class. This buffers logging
records in memory. Whenever each record is added to the buffer, a check is made
by calling shouldFlush() to see if the buffer should be flushed. If it
should, then flush() is expected to do the needful.
class logging.handlers.BufferingHandler(capacity)¶
Initializes the handler with a buffer of the specified capacity.
Returns true if the buffer is up to capacity. This method can be
overridden to implement custom flushing strategies.
class logging.handlers.MemoryHandler(capacity, flushLevel=ERROR, target=None)¶
Returns a new instance of the MemoryHandler class. The instance is
initialized with a buffer size of capacity. If flushLevel is not specified,
ERROR is used. If no target is specified, the target will need to be
set using setTarget() before this handler does anything useful.
For a MemoryHandler, flushing means just sending the buffered
records to the target, if there is one. The buffer is also cleared when
this happens. Override if you want different behavior.
The HTTPHandler class, located in the logging.handlers module,
supports sending logging messages to a Web server, using either GET or
POST semantics.
class logging.handlers.HTTPHandler(host, url, method='GET', secure=False, credentials=None)¶
Returns a new instance of the HTTPHandler class. The host can be
of the form host:port, should you need to use a specific port number.
If no method is specified, GET is used. If secure is True, an HTTPS
connection will be used. If credentials is specified, it should be a
2-tuple consisting of userid and password, which will be placed in an HTTP
‘Authorization’ header using Basic authentication. If you specify
credentials, you should also specify secure=True so that your userid and
password are not passed in cleartext across the wire.
Along with the QueueListener class, QueueHandler can be used
to let handlers do their work on a separate thread from the one which does the
logging. This is important in Web applications and also other service
applications where threads servicing clients need to respond as quickly as
possible, while any potentially slow operations (such as sending an email via
SMTPHandler) are done on a separate thread.
Returns a new instance of the QueueHandler class. The instance is
initialized with the queue to send messages to. The queue can be any queue-
like object; it’s used as-is by the enqueue() method, which needs
to know how to send messages to it.
Prepares a record for queuing. The object returned by this
method is enqueued.
The base implementation formats the record to merge the message
and arguments, and removes unpickleable items from the record
in-place.
You might want to override this method if you want to convert
the record to a dict or JSON string, or send a modified copy
of the record while leaving the original intact.
Enqueues the record on the queue using put_nowait(); you may
want to override this if you want to use blocking behaviour, or a
timeout, or a customised queue implementation.
The QueueListener class, located in the logging.handlers
module, supports receiving logging messages from a queue, such as those
implemented in the queue or multiprocessing modules. The
messages are received from a queue in an internal thread and passed, on
the same thread, to one or more handlers for processing. While
QueueListener is not itself a handler, it is documented here
because it works hand-in-hand with QueueHandler.
Along with the QueueHandler class, QueueListener can be used
to let handlers do their work on a separate thread from the one which does the
logging. This is important in Web applications and also other service
applications where threads servicing clients need to respond as quickly as
possible, while any potentially slow operations (such as sending an email via
SMTPHandler) are done on a separate thread.
class logging.handlers.QueueListener(queue, *handlers)¶
Returns a new instance of the QueueListener class. The instance is
initialized with the queue to send messages to and a list of handlers which
will handle entries placed on the queue. The queue can be any queue-
like object; it’s passed as-is to the dequeue() method, which needs
to know how to get messages from it.
This implementation just returns the passed-in record. You may want to
override this method if you need to do any custom marshalling or
manipulation of the record before passing it to the handlers.
This just loops through the handlers offering them the record
to handle. The actual object passed to the handlers is that which
is returned from prepare().
This asks the thread to terminate, and then waits for it to do so.
Note that if you don’t call this before your application exits, there
may be some records still left on the queue, which won’t be processed.
Prompt the user for a password without echoing. The user is prompted using
the string prompt, which defaults to 'Password:'. On Unix, the prompt
is written to the file-like object stream. stream defaults to the
controlling terminal (/dev/tty) or if that is unavailable to
sys.stderr (this argument is ignored on Windows).
If echo free input is unavailable getpass() falls back to printing
a warning message to stream and reading from sys.stdin and
issuing a GetPassWarning.
Availability: Macintosh, Unix, Windows.
Note
If you call getpass from within IDLE, the input may be done in the
terminal you launched IDLE from rather than the idle window itself.
Return the “login name” of the user. Availability: Unix, Windows.
This function checks the environment variables LOGNAME,
USER, LNAME and USERNAME, in order, and returns
the value of the first one which is set to a non-empty string. If none are set,
the login name from the password database is returned on systems which support
the pwd module, otherwise, an exception is raised.
curses — Terminal handling for character-cell displays¶
The curses module provides an interface to the curses library, the
de-facto standard for portable advanced terminal handling.
While curses is most widely used in the Unix environment, versions are available
for DOS, OS/2, and possibly other systems as well. This extension module is
designed to match the API of ncurses, an open-source curses library hosted on
Linux and the BSD variants of Unix.
Note
Since version 5.4, the ncurses library decides how to interpret non-ASCII data
using the nl_langinfo function. That means that you have to call
locale.setlocale() in the application and encode Unicode strings
using one of the system’s available encodings. This example uses the
system’s default encoding:
Exception raised when a curses library function returns an error.
Note
Whenever x or y arguments to a function or a method are optional, they
default to the current cursor location. Whenever attr is optional, it defaults
to A_NORMAL.
The module curses defines the following functions:
Return the output speed of the terminal in bits per second. On software
terminal emulators it will have a fixed high value. Included for historical
reasons; in former times, it was used to write output loops for time delays and
occasionally to change interfaces depending on the line speed.
Enter cbreak mode. In cbreak mode (sometimes called “rare” mode) normal tty
line buffering is turned off and characters are available to be read one by one.
However, unlike raw mode, special characters (interrupt, quit, suspend, and flow
control) retain their effects on the tty driver and calling program. Calling
first raw() then cbreak() leaves the terminal in cbreak mode.
Return the intensity of the red, green, and blue (RGB) components in the color
color_number, which must be between 0 and COLORS. A 3-tuple is
returned, containing the R,G,B values for the given color, which will be between
0 (no component) and 1000 (maximum amount of component).
Return the attribute value for displaying text in the specified color. This
attribute value can be combined with A_STANDOUT, A_REVERSE,
and the other A_* attributes. pair_number() is the counterpart
to this function.
Set the cursor state. visibility can be set to 0, 1, or 2, for invisible,
normal, or very visible. If the terminal supports the visibility requested, the
previous cursor state is returned; otherwise, an exception is raised. On many
terminals, the “visible” mode is an underline cursor and the “very visible” mode
is a block cursor.
Save the current terminal mode as the “program” mode, the mode when the running
program is using curses. (Its counterpart is the “shell” mode, for when the
program is not in curses.) Subsequent calls to reset_prog_mode() will
restore this mode.
Save the current terminal mode as the “shell” mode, the mode when the running
program is not using curses. (Its counterpart is the “program” mode, when the
program is using curses capabilities.) Subsequent calls to
reset_shell_mode() will restore this mode.
Update the physical screen. The curses library keeps two data structures, one
representing the current physical screen contents and a virtual screen
representing the desired next state. The doupdate() ground updates the
physical screen to match the virtual screen.
The virtual screen may be updated by a noutrefresh() call after write
operations such as addstr() have been performed on a window. The normal
refresh() call is simply noutrefresh() followed by doupdate();
if you have to update multiple windows, you can speed performance and perhaps
reduce screen flicker by issuing noutrefresh() calls on all windows,
followed by a single doupdate().
Return the user’s current erase character. Under Unix operating systems this
is a property of the controlling tty of the curses program, and is not set by
the curses library itself.
The filter() routine, if used, must be called before initscr() is
called. The effect is that, during those calls, LINES is set to 1; the
capabilities clear, cup, cud, cud1, cuu1, cuu, vpa are disabled; and the home
string is set to the value of cr. The effect is that the cursor is confined to
the current line, and so are screen updates. This may be used for enabling
character-at-a-time line editing without touching the rest of the screen.
Flash the screen. That is, change it to reverse-video and then change it back
in a short interval. Some people prefer such as ‘visible bell’ to the audible
attention signal produced by beep().
After getch() returns KEY_MOUSE to signal a mouse event, this
method should be call to retrieve the queued mouse event, represented as a
5-tuple (id,x,y,z,bstate). id is an ID value used to distinguish
multiple devices, and x, y, z are the event’s coordinates. (z is
currently unused.) bstate is an integer value whose bits will be set to
indicate the type of event, and will be the bitwise OR of one or more of the
following constants, where n is the button number from 1 to 4:
BUTTONn_PRESSED, BUTTONn_RELEASED, BUTTONn_CLICKED,
BUTTONn_DOUBLE_CLICKED, BUTTONn_TRIPLE_CLICKED,
BUTTON_SHIFT, BUTTON_CTRL, BUTTON_ALT.
Read window related data stored in the file by an earlier putwin() call.
The routine then creates and initializes a new window using that data, returning
the new window object.
Return True if the terminal has insert- and delete-character capabilities.
This function is included for historical reasons only, as all modern software
terminal emulators have such capabilities.
Return True if the terminal has insert- and delete-line capabilities, or can
simulate them using scrolling regions. This function is included for
historical reasons only, as all modern software terminal emulators have such
capabilities.
Used for half-delay mode, which is similar to cbreak mode in that characters
typed by the user are immediately available to the program. However, after
blocking for tenths tenths of seconds, an exception is raised if nothing has
been typed. The value of tenths must be a number between 1 and 255. Use
nocbreak() to leave half-delay mode.
Change the definition of a color, taking the number of the color to be changed
followed by three RGB values (for the amounts of red, green, and blue
components). The value of color_number must be between 0 and
COLORS. Each of r, g, b, must be a value between 0 and
1000. When init_color() is used, all occurrences of that color on the
screen immediately change to the new definition. This function is a no-op on
most terminals; it is active only if can_change_color() returns 1.
Change the definition of a color-pair. It takes three arguments: the number of
the color-pair to be changed, the foreground color number, and the background
color number. The value of pair_number must be between 1 and
COLOR_PAIRS-1 (the 0 color pair is wired to white on black and cannot
be changed). The value of fg and bg arguments must be between 0 and
COLORS. If the color-pair was previously initialized, the screen is
refreshed and all occurrences of that color-pair are changed to the new
definition.
Return the name of the key numbered k. The name of a key generating printable
ASCII character is the key’s character. The name of a control-key combination
is a two-character string consisting of a caret followed by the corresponding
printable ASCII character. The name of an alt-key combination (128-255) is a
string consisting of the prefix ‘M-‘ followed by the name of the corresponding
ASCII character.
Return the user’s current line kill character. Under Unix operating systems
this is a property of the controlling tty of the curses program, and is not set
by the curses library itself.
Return a string containing the terminfo long name field describing the current
terminal. The maximum length of a verbose description is 128 characters. It is
defined only after the call to initscr().
Set the maximum time in milliseconds that can elapse between press and release
events in order for them to be recognized as a click, and return the previous
interval value. The default value is 200 msec, or one fifth of a second.
Set the mouse events to be reported, and return a tuple (availmask,oldmask). availmask indicates which of the specified mouse events can be
reported; on complete failure it returns 0. oldmask is the previous value of
the given window’s mouse event mask. If this function is never called, no mouse
events are ever reported.
Create and return a pointer to a new pad data structure with the given number
of lines and columns. A pad is returned as a window object.
A pad is like a window, except that it is not restricted by the screen size, and
is not necessarily associated with a particular part of the screen. Pads can be
used when a large window is needed, and only a part of the window will be on the
screen at one time. Automatic refreshes of pads (such as from scrolling or
echoing of input) do not occur. The refresh() and noutrefresh()
methods of a pad require 6 arguments to specify the part of the pad to be
displayed and the location on the screen to be used for the display. The
arguments are pminrow, pmincol, sminrow, smincol, smaxrow, smaxcol; the p
arguments refer to the upper left corner of the pad region to be displayed and
the s arguments define a clipping box on the screen within which the pad region
is to be displayed.
Enter newline mode. This mode translates the return key into newline on input,
and translates newline into return and line-feed on output. Newline mode is
initially on.
Leave newline mode. Disable translation of return into newline on input, and
disable low-level translation of newline into newline/return on output (but this
does not change the behavior of addch('\n'), which always does the
equivalent of return and line feed on the virtual screen). With translation
off, curses can sometimes speed up vertical motion a little; also, it will be
able to detect the return key on input.
When the noqiflush() routine is used, normal flush of input and output queues
associated with the INTR, QUIT and SUSP characters will not be done. You may
want to call noqiflush() in a signal handler if you want output to
continue as though the interrupt had not occurred, after the handler exits.
Equivalent to tputs(str,1,putchar); emit the value of a specified
terminfo capability for the current terminal. Note that the output of putp()
always goes to standard output.
If flag is False, the effect is the same as calling noqiflush(). If
flag is True, or no argument is provided, the queues will be flushed when
these control characters are read.
Enter raw mode. In raw mode, normal line buffering and processing of
interrupt, quit, suspend, and flow control keys are turned off; characters are
presented to curses input functions one by one.
Backend function used by resizeterm(), performing most of the work;
when resizing the windows, resize_term() blank-fills the areas that are
extended. The calling application should fill in these areas with
appropriate data. The resize_term() function attempts to resize all
windows. However, due to the calling convention of pads, it is not possible
to resize these without additional interaction with the application.
Resize the standard and current windows to the specified dimensions, and
adjusts other bookkeeping data used by the curses library that record the
window dimensions (in particular the SIGWINCH handler).
Initialize the terminal. termstr is a string giving the terminal name; if
omitted, the value of the TERM environment variable will be used. fd is the
file descriptor to which any initialization sequences will be sent; if not
supplied, the file descriptor for sys.stdout will be used.
Must be called if the programmer wants to use colors, and before any other color
manipulation routine is called. It is good practice to call this routine right
after initscr().
start_color() initializes eight basic colors (black, red, green, yellow,
blue, magenta, cyan, and white), and two global variables in the curses
module, COLORS and COLOR_PAIRS, containing the maximum number
of colors and color-pairs the terminal can support. It also restores the colors
on the terminal to the values they had when the terminal was just turned on.
Return a logical OR of all video attributes supported by the terminal. This
information is useful when a curses program needs complete control over the
appearance of the screen.
Return the value of the Boolean capability corresponding to the terminfo
capability name capname. The value -1 is returned if capname is not a
Boolean capability, or 0 if it is canceled or absent from the terminal
description.
Return the value of the numeric capability corresponding to the terminfo
capability name capname. The value -2 is returned if capname is not a
numeric capability, or -1 if it is canceled or absent from the terminal
description.
Return the value of the string capability corresponding to the terminfo
capability name capname. None is returned if capname is not a string
capability, or is canceled or absent from the terminal description.
Instantiate the string str with the supplied parameters, where str should
be a parameterized string obtained from the terminfo database. E.g.
tparm(tigetstr("cup"),5,3) could result in '\033[6;4H', the exact
result depending on terminal type.
Specify that the file descriptor fd be used for typeahead checking. If fd
is -1, then no typeahead checking is done.
The curses library does “line-breakout optimization” by looking for typeahead
periodically while updating the screen. If input is found, and it is coming
from a tty, the current update is postponed until refresh or doupdate is called
again, allowing faster response to commands typed in advance. This function
allows specifying a different file descriptor for typeahead checking.
Return a string which is a printable representation of the character ch.
Control characters are displayed as a caret followed by the character, for
example as ^C. Printing characters are left as they are.
If used, this function should be called before initscr() or newterm are
called. When flag is False, the values of lines and columns specified in the
terminfo database will be used, even if environment variables LINES
and COLUMNS (used by default) are set, or if curses is running in a
window (in which case default behavior would be to use the window size if
LINES and COLUMNS are not set).
Allow use of default values for colors on terminals supporting this feature. Use
this to support transparency in your application. The default color is assigned
to the color number -1. After calling this function, init_pair(x,curses.COLOR_RED,-1) initializes, for instance, color pair x to a red
foreground color on the default background.
Initialize curses and call another callable object, func, which should be the
rest of your curses-using application. If the application raises an exception,
this function will restore the terminal to a sane state before re-raising the
exception and generating a traceback. The callable object func is then passed
the main window ‘stdscr’ as its first argument, followed by any other arguments
passed to wrapper(). Before calling func, wrapper() turns on
cbreak mode, turns off echo, enables the terminal keypad, and initializes colors
if the terminal has color support. On exit (whether normally or by exception)
it restores cooked mode, turns on echo, and disables the terminal keypad.
A character means a C character (an ASCII code), rather than a Python
character (a string of length 1). (This note is true whenever the
documentation mentions a character.) The built-in ord() is handy for
conveying strings to codes.
Paint character ch at (y,x) with attributes attr, overwriting any
character previously painter at that location. By default, the character
position and attributes are the current settings for the window object.
Set the background property of the window to the character ch, with
attributes attr. The change is then applied to every character position in
that window:
The attribute of every character in the window is changed to the new
background attribute.
Wherever the former background character appears, it is changed to the new
background character.
Set the window’s background. A window’s background consists of a character and
any combination of attributes. The attribute part of the background is combined
(OR’ed) with all non-blank characters that are written into the window. Both
the character and attribute parts of the background are combined with the blank
characters. The background becomes a property of the character and moves with
the character through any scrolling and insert/delete line/character operations.
Draw a border around the edges of the window. Each parameter specifies the
character to use for a specific part of the border; see the table below for more
details. The characters can be specified as integers or as one-character
strings.
Note
A 0 value for any parameter will cause the default character to be used for
that parameter. Keyword parameters can not be used. The defaults are listed
in this table:
Set the attributes of num characters at the current cursor position, or at
position (y,x) if supplied. If no value of num is given or num = -1,
the attribute will be set on all the characters to the end of the line. This
function does not move the cursor. The changed line will be touched using the
touchline() method so that the contents will be redisplayed by the next
window refresh.
An abbreviation for “derive window”, derwin() is the same as calling
subwin(), except that begin_y and begin_x are relative to the origin
of the window, rather than relative to the entire screen. Return a window
object for the derived window.
Test whether the given pair of screen-relative character-cell coordinates are
enclosed by the given window, returning True or False. It is useful for
determining what subset of the screen windows enclose the location of a mouse
event.
Get a character. Note that the integer returned does not have to be in ASCII
range: function keys, keypad keys and so on return numbers higher than 256. In
no-delay mode, -1 is returned if there is no input, else getch() waits
until a key is pressed.
Get a character, returning a string instead of an integer, as getch()
does. Function keys, keypad keys and so on return a multibyte string containing
the key name. In no-delay mode, an exception is raised if there is no input.
Return the beginning coordinates of this window relative to its parent window
into two integer variables y and x. Return -1,-1 if this window has no
parent.
If flag is False, curses no longer considers using the hardware insert/delete
character feature of the terminal; if flag is True, use of character insertion
and deletion is enabled. When curses is first initialized, use of character
insert/delete is enabled by default.
If flag is True, any change in the window image automatically causes the
window to be refreshed; you no longer have to call refresh() yourself.
However, it may degrade performance considerably, due to repeated calls to
wrefresh. This option is disabled by default.
Insert nlines lines into the specified window above the current line. The
nlines bottom lines are lost. For negative nlines, delete nlines lines
starting with the one under the cursor, and move the remaining lines up. The
bottom nlines lines are cleared. The current cursor position remains the
same.
Insert a character string (as many characters as will fit on the line) before
the character under the cursor, up to n characters. If n is zero or
negative, the entire string is inserted. All characters to the right of the
cursor are shifted right, with the rightmost characters on the line being lost.
The cursor position does not change (after moving to y, x, if specified).
Insert a character string (as many characters as will fit on the line) before
the character under the cursor. All characters to the right of the cursor are
shifted right, with the rightmost characters on the line being lost. The cursor
position does not change (after moving to y, x, if specified).
Return a string of characters, extracted from the window starting at the
current cursor position, or at y, x if specified. Attributes are stripped
from the characters. If n is specified, instr() returns a string
at most n characters long (exclusive of the trailing NUL).
Return True if the specified line was modified since the last call to
refresh(); otherwise return False. Raise a curses.error
exception if line is not valid for the given window.
If yes is 1, escape sequences generated by some keys (keypad, function keys)
will be interpreted by curses. If yes is 0, escape sequences will be
left as is in the input stream.
If yes is 1, cursor is left where it is on update, instead of being at “cursor
position.” This reduces cursor movement where possible. If possible the cursor
will be made invisible.
If yes is 0, cursor will always be at “cursor position” after an update.
Move the window inside its parent window. The screen-relative parameters of
the window are not changed. This routine is used to display different parts of
the parent window at the same physical position on the screen.
Mark for refresh but wait. This function updates the data structure
representing the desired state of the window, but does not force an update of
the physical screen. To accomplish that, call doupdate().
Overlay the window on top of destwin. The windows need not be the same size,
only the overlapping region is copied. This copy is non-destructive, which means
that the current background character does not overwrite the old contents of
destwin.
To get fine-grained control over the copied region, the second form of
overlay() can be used. sminrow and smincol are the upper-left
coordinates of the source window, and the other variables mark a rectangle in
the destination window.
Overwrite the window on top of destwin. The windows need not be the same size,
in which case only the overlapping region is copied. This copy is destructive,
which means that the current background character overwrites the old contents of
destwin.
To get fine-grained control over the copied region, the second form of
overwrite() can be used. sminrow and smincol are the upper-left
coordinates of the source window, the other variables mark a rectangle in the
destination window.
Update the display immediately (sync actual screen with previous
drawing/deleting methods).
The 6 optional arguments can only be specified when the window is a pad created
with newpad(). The additional parameters are needed to indicate what part
of the pad and screen are involved. pminrow and pmincol specify the upper
left-hand corner of the rectangle to be displayed in the pad. sminrow,
smincol, smaxrow, and smaxcol specify the edges of the rectangle to be
displayed on the screen. The lower right-hand corner of the rectangle to be
displayed in the pad is calculated from the screen coordinates, since the
rectangles must be the same size. Both rectangles must be entirely contained
within their respective structures. Negative values of pminrow, pmincol,
sminrow, or smincol are treated as if they were zero.
Reallocate storage for a curses window to adjust its dimensions to the
specified values. If either dimension is larger than the current values, the
window’s data is filled with blanks that have the current background
rendition (as set by bkgdset()) merged into them.
Control what happens when the cursor of a window is moved off the edge of the
window or scrolling region, either as a result of a newline action on the bottom
line, or typing the last character of the last line. If flag is false, the
cursor is left on the bottom line. If flag is true, the window is scrolled up
one line. Note that in order to get the physical scrolling effect on the
terminal, it is also necessary to call idlok().
Touch each location in the window that has been touched in any of its ancestor
windows. This routine is called by refresh(), so it should almost never
be necessary to call it manually.
Set blocking or non-blocking read behavior for the window. If delay is
negative, blocking read is used (which will wait indefinitely for input). If
delay is zero, then non-blocking read is used, and -1 will be returned by
getch() if no input is waiting. If delay is positive, then
getch() will block for delay milliseconds, and return -1 if there is
still no input at the end of that time.
Pretend count lines have been changed, starting with line start. If
changed is supplied, it specifies whether the affected lines are marked as
having been changed (changed=1) or unchanged (changed=0).
A string representing the current version of the module. Also available as
__version__.
Several constants are available to specify character cell attributes:
Attribute
Meaning
A_ALTCHARSET
Alternate character set mode.
A_BLINK
Blink mode.
A_BOLD
Bold mode.
A_DIM
Dim mode.
A_NORMAL
Normal attribute.
A_REVERSE
Reverse background and
foreground colors.
A_STANDOUT
Standout mode.
A_UNDERLINE
Underline mode.
Keys are referred to by integer constants with names starting with KEY_.
The exact keycaps available are system dependent.
Key constant
Key
KEY_MIN
Minimum key value
KEY_BREAK
Break key (unreliable)
KEY_DOWN
Down-arrow
KEY_UP
Up-arrow
KEY_LEFT
Left-arrow
KEY_RIGHT
Right-arrow
KEY_HOME
Home key (upward+left arrow)
KEY_BACKSPACE
Backspace (unreliable)
KEY_F0
Function keys. Up to 64 function keys are
supported.
KEY_Fn
Value of function key n
KEY_DL
Delete line
KEY_IL
Insert line
KEY_DC
Delete character
KEY_IC
Insert char or enter insert mode
KEY_EIC
Exit insert char mode
KEY_CLEAR
Clear screen
KEY_EOS
Clear to end of screen
KEY_EOL
Clear to end of line
KEY_SF
Scroll 1 line forward
KEY_SR
Scroll 1 line backward (reverse)
KEY_NPAGE
Next page
KEY_PPAGE
Previous page
KEY_STAB
Set tab
KEY_CTAB
Clear tab
KEY_CATAB
Clear all tabs
KEY_ENTER
Enter or send (unreliable)
KEY_SRESET
Soft (partial) reset (unreliable)
KEY_RESET
Reset or hard reset (unreliable)
KEY_PRINT
Print
KEY_LL
Home down or bottom (lower left)
KEY_A1
Upper left of keypad
KEY_A3
Upper right of keypad
KEY_B2
Center of keypad
KEY_C1
Lower left of keypad
KEY_C3
Lower right of keypad
KEY_BTAB
Back tab
KEY_BEG
Beg (beginning)
KEY_CANCEL
Cancel
KEY_CLOSE
Close
KEY_COMMAND
Cmd (command)
KEY_COPY
Copy
KEY_CREATE
Create
KEY_END
End
KEY_EXIT
Exit
KEY_FIND
Find
KEY_HELP
Help
KEY_MARK
Mark
KEY_MESSAGE
Message
KEY_MOVE
Move
KEY_NEXT
Next
KEY_OPEN
Open
KEY_OPTIONS
Options
KEY_PREVIOUS
Prev (previous)
KEY_REDO
Redo
KEY_REFERENCE
Ref (reference)
KEY_REFRESH
Refresh
KEY_REPLACE
Replace
KEY_RESTART
Restart
KEY_RESUME
Resume
KEY_SAVE
Save
KEY_SBEG
Shifted Beg (beginning)
KEY_SCANCEL
Shifted Cancel
KEY_SCOMMAND
Shifted Command
KEY_SCOPY
Shifted Copy
KEY_SCREATE
Shifted Create
KEY_SDC
Shifted Delete char
KEY_SDL
Shifted Delete line
KEY_SELECT
Select
KEY_SEND
Shifted End
KEY_SEOL
Shifted Clear line
KEY_SEXIT
Shifted Dxit
KEY_SFIND
Shifted Find
KEY_SHELP
Shifted Help
KEY_SHOME
Shifted Home
KEY_SIC
Shifted Input
KEY_SLEFT
Shifted Left arrow
KEY_SMESSAGE
Shifted Message
KEY_SMOVE
Shifted Move
KEY_SNEXT
Shifted Next
KEY_SOPTIONS
Shifted Options
KEY_SPREVIOUS
Shifted Prev
KEY_SPRINT
Shifted Print
KEY_SREDO
Shifted Redo
KEY_SREPLACE
Shifted Replace
KEY_SRIGHT
Shifted Right arrow
KEY_SRSUME
Shifted Resume
KEY_SSAVE
Shifted Save
KEY_SSUSPEND
Shifted Suspend
KEY_SUNDO
Shifted Undo
KEY_SUSPEND
Suspend
KEY_UNDO
Undo
KEY_MOUSE
Mouse event has occurred
KEY_RESIZE
Terminal resize event
KEY_MAX
Maximum key value
On VT100s and their software emulations, such as X terminal emulators, there are
normally at least four function keys (KEY_F1, KEY_F2,
KEY_F3, KEY_F4) available, and the arrow keys mapped to
KEY_UP, KEY_DOWN, KEY_LEFT and KEY_RIGHT in
the obvious way. If your machine has a PC keyboard, it is safe to expect arrow
keys and twelve function keys (older PC keyboards may have only ten function
keys); also, the following keypad mappings are standard:
Keycap
Constant
Insert
KEY_IC
Delete
KEY_DC
Home
KEY_HOME
End
KEY_END
PageUp
KEY_NPAGE
PageDown
KEY_PPAGE
The following table lists characters from the alternate character set. These are
inherited from the VT100 terminal, and will generally be available on software
emulations such as X terminals. When there is no graphic available, curses
falls back on a crude printable ASCII approximation.
Note
These are available only after initscr() has been called.
The curses.textpad module provides a Textbox class that handles
elementary text editing in a curses window, supporting a set of keybindings
resembling those of Emacs (thus, also of Netscape Navigator, BBedit 6.x,
FrameMaker, and many other programs). The module also provides a
rectangle-drawing function useful for framing text boxes or for other purposes.
The module curses.textpad defines the following function:
Draw a rectangle. The first argument must be a window object; the remaining
arguments are coordinates relative to that window. The second and third
arguments are the y and x coordinates of the upper left hand corner of the
rectangle to be drawn; the fourth and fifth arguments are the y and x
coordinates of the lower right hand corner. The rectangle will be drawn using
VT100/IBM PC forms characters on terminals that make this possible (including
xterm and most other software terminal emulators). Otherwise it will be drawn
with ASCII dashes, vertical bars, and plus signs.
Return a textbox widget object. The win argument should be a curses
WindowObject in which the textbox is to be contained. The edit cursor
of the textbox is initially located at the upper left hand corner of the
containing window, with coordinates (0,0). The instance’s
stripspaces flag is initially on.
This is the entry point you will normally use. It accepts editing
keystrokes until one of the termination keystrokes is entered. If
validator is supplied, it must be a function. It will be called for
each keystroke entered with the keystroke as a parameter; command dispatch
is done on the result. This method returns the window contents as a
string; whether blanks in the window are included is affected by the
stripspaces attribute.
This attribute is a flag which controls the interpretation of blanks in
the window. When it is on, trailing blanks on each line are ignored; any
cursor motion that would land the cursor on a trailing blank goes to the
end of that line instead, and trailing blanks are stripped when the window
contents are gathered.
The curses.ascii module supplies name constants for ASCII characters and
functions to test membership in various ASCII character classes. The constants
supplied are names for control characters as follows:
Name
Meaning
NUL
SOH
Start of heading, console interrupt
STX
Start of text
ETX
End of text
EOT
End of transmission
ENQ
Enquiry, goes with ACK flow control
ACK
Acknowledgement
BEL
Bell
BS
Backspace
TAB
Tab
HT
Alias for TAB: “Horizontal tab”
LF
Line feed
NL
Alias for LF: “New line”
VT
Vertical tab
FF
Form feed
CR
Carriage return
SO
Shift-out, begin alternate character set
SI
Shift-in, resume default character set
DLE
Data-link escape
DC1
XON, for flow control
DC2
Device control 2, block-mode flow control
DC3
XOFF, for flow control
DC4
Device control 4
NAK
Negative acknowledgement
SYN
Synchronous idle
ETB
End transmission block
CAN
Cancel
EM
End of medium
SUB
Substitute
ESC
Escape
FS
File separator
GS
Group separator
RS
Record separator, block-mode terminator
US
Unit separator
SP
Space
DEL
Delete
Note that many of these have little practical significance in modern usage. The
mnemonics derive from teleprinter conventions that predate digital computers.
The module supplies the following functions, patterned on those in the standard
C library:
Checks for a non-ASCII character (ordinal values 0x80 and above).
These functions accept either integers or strings; when the argument is a
string, it is first converted using the built-in function ord().
Note that all these functions check ordinal bit values derived from the first
character of the string you pass in; they do not actually know anything about
the host machine’s character encoding. For functions that know about the
character encoding (and handle internationalization properly) see the
string module.
The following two functions take either a single-character string or integer
byte value; they return a value of the same type.
Return a string representation of the ASCII character c. If c is printable,
this string is the character itself. If the character is a control character
(0x00-0x1f) the string consists of a caret ('^') followed by the
corresponding uppercase letter. If the character is an ASCII delete (0x7f) the
string is '^?'. If the character has its meta bit (0x80) set, the meta bit
is stripped, the preceding rules applied, and '!' prepended to the result.
A 33-element string array that contains the ASCII mnemonics for the thirty-two
ASCII control characters from 0 (NUL) to 0x1f (US), in order, plus the mnemonic
SP for the space character.
Panels are windows with the added feature of depth, so they can be stacked on
top of each other, and only the visible portions of each window will be
displayed. Panels can be added, moved up or down in the stack, and removed.
Returns a panel object, associating it with the given window win. Be aware
that you need to keep the returned panel object referenced explicitly. If you
don’t, the panel object is garbage collected and removed from the panel stack.
Panel objects, as returned by new_panel() above, are windows with a
stacking order. There’s always a window associated with a panel which determines
the content, while the panel methods are responsible for the window’s depth in
the panel stack.
Queries the given executable (defaults to the Python interpreter binary) for
various architecture information.
Returns a tuple (bits,linkage) which contain information about the bit
architecture and the linkage format used for the executable. Both values are
returned as strings.
Values that cannot be determined are returned as given by the parameter presets.
If bits is given as '', the sizeof(pointer)() (or
sizeof(long)() on Python version < 1.5.2) is used as indicator for the
supported pointer size.
The function relies on the system’s file command to do the actual work.
This is available on most if not all Unix platforms and some non-Unix platforms
and then only if the executable points to the Python interpreter. Reasonable
defaults are used when the above needs are not met.
Note
On Mac OS X (and perhaps other platforms), executable files may be
universal files containing multiple architectures.
To get at the “64-bitness” of the current interpreter, it is more
reliable to query the sys.maxsize attribute:
Returns a single string identifying the underlying platform with as much useful
information as possible.
The output is intended to be human readable rather than machine parseable. It
may look different on different platforms and this is intended.
If aliased is true, the function will use aliases for various platforms that
report system names which differ from their common names, for example SunOS will
be reported as Solaris. The system_alias() function is used to implement
this.
Setting terse to true causes the function to return only the absolute minimum
information needed to identify the platform.
An empty string is returned if the value cannot be determined. Note that many
platforms do not provide this information or simply return the same value as for
machine(). NetBSD does this.
Returns (system,release,version) aliased to common marketing names used
for some systems. It also does some reordering of the information in some cases
where it would otherwise cause confusion.
Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being a
tuple (vm_name,vm_release,vm_vendor) and osinfo being a tuple
(os_name,os_version,os_arch). Values which cannot be determined are set to
the defaults given as parameters (which all default to '').
Get additional version information from the Windows Registry and return a tuple
(version,csd,ptype) referring to version number, CSD level and OS type
(multi/single processor).
As a hint: ptype is 'UniprocessorFree' on single processor NT machines
and 'MultiprocessorFree' on multi processor machines. The ‘Free’ refers
to the OS version being free of debugging code. It could also state ‘Checked’
which means the OS version uses debugging code, i.e. code that checks arguments,
ranges, etc.
Note
This function works best with Mark Hammond’s
win32all package installed, but also on Python 2.3 and
later (support for this was added in Python 2.6). It obviously
only runs on Win32 compatible platforms.
Portable popen() interface. Find a working popen implementation
preferring win32pipe.popen(). On Windows NT, win32pipe.popen()
should work; on Windows 9x it hangs due to bugs in the MS C library.
Get Mac OS version information and return it as tuple (release,versioninfo,machine) with versioninfo being a tuple (version,dev_stage,non_release_version).
Entries which cannot be determined are set to ''. All tuple entries are
strings.
Tries to determine the name of the Linux OS distribution name.
supported_dists may be given to define the set of Linux distributions to
look for. It defaults to a list of currently supported Linux distributions
identified by their release file name.
If full_distribution_name is true (default), the full distribution read
from the OS is returned. Otherwise the short name taken from
supported_dists is used.
Returns a tuple (distname,version,id) which defaults to the args given as
parameters. id is the item in parentheses after the version number. It
is usually the version codename.
Tries to determine the libc version against which the file executable (defaults
to the Python interpreter) is linked. Returns a tuple of strings (lib,version) which default to the given parameters in case the lookup fails.
Note that this function has intimate knowledge of how different libc versions
add symbols to the executable is probably only usable for executables compiled
using gcc.
The file is read and scanned in chunks of chunksize bytes.
This module makes available standard errno system symbols. The value of each
symbol is the corresponding integer value. The names and descriptions are
borrowed from linux/include/errno.h, which should be pretty
all-inclusive.
Dictionary providing a mapping from the errno value to the string name in the
underlying system. For instance, errno.errorcode[errno.EPERM] maps to
'EPERM'.
To translate a numeric error code to an error message, use os.strerror().
Of the following list, symbols that are not used on the current platform are not
defined by the module. The specific list of defined symbols is available as
errno.errorcode.keys(). Symbols available can include:
ctypes is a foreign function library for Python. It provides C compatible
data types, and allows calling functions in DLLs or shared libraries. It can be
used to wrap these libraries in pure Python.
Note: The code samples in this tutorial use doctest to make sure that
they actually work. Since some code samples behave differently under Linux,
Windows, or Mac OS X, they contain doctest directives in comments.
Note: Some code samples reference the ctypes c_int type. This type is
an alias for the c_long type on 32-bit systems. So, you should not be
confused if c_long is printed if you would expect c_int —
they are actually the same type.
ctypes exports the cdll, and on Windows windll and oledll
objects, for loading dynamic link libraries.
You load libraries by accessing them as attributes of these objects. cdll
loads libraries which export functions using the standard cdecl calling
convention, while windll libraries call functions using the stdcall
calling convention. oledll also uses the stdcall calling convention, and
assumes the functions return a Windows HRESULT error code. The error
code is used to automatically raise a WindowsError exception when the
function call fails.
Here are some examples for Windows. Note that msvcrt is the MS standard C
library containing most standard C functions, and uses the cdecl calling
convention:
Windows appends the usual .dll file suffix automatically.
On Linux, it is required to specify the filename including the extension to
load a library, so attribute access can not be used to load libraries. Either the
LoadLibrary() method of the dll loaders should be used, or you should load
the library by creating an instance of CDLL by calling the constructor:
Functions are accessed as attributes of dll objects:
>>> from ctypes import *
>>> libc.printf
<_FuncPtr object at 0x...>
>>> print(windll.kernel32.GetModuleHandleA) # doctest: +WINDOWS
<_FuncPtr object at 0x...>
>>> print(windll.kernel32.MyOwnFunction) # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "ctypes.py", line 239, in __getattr__
func = _StdcallFuncPtr(name, self)
AttributeError: function 'MyOwnFunction' not found
>>>
Note that win32 system dlls like kernel32 and user32 often export ANSI
as well as UNICODE versions of a function. The UNICODE version is exported with
an W appended to the name, while the ANSI version is exported with an A
appended to the name. The win32 GetModuleHandle function, which returns a
module handle for a given module name, has the following C prototype, and a
macro is used to expose one of them as GetModuleHandle depending on whether
UNICODE is defined or not:
/* ANSI version */
HMODULE GetModuleHandleA(LPCSTR lpModuleName);
/* UNICODE version */
HMODULE GetModuleHandleW(LPCWSTR lpModuleName);
windll does not try to select one of them by magic, you must access the
version you need by specifying GetModuleHandleA or GetModuleHandleW
explicitly, and then call it with bytes or string objects respectively.
Sometimes, dlls export functions with names which aren’t valid Python
identifiers, like "??2@YAPAXI@Z". In this case you have to use
getattr() to retrieve the function:
You can call these functions like any other Python callable. This example uses
the time() function, which returns system time in seconds since the Unix
epoch, and the GetModuleHandleA() function, which returns a win32 module
handle.
This example calls both functions with a NULL pointer (None should be used
as the NULL pointer):
ctypes tries to protect you from calling functions with the wrong number
of arguments or the wrong calling convention. Unfortunately this only works on
Windows. It does this by examining the stack after the function returns, so
although an error is raised the function has been called:
>>> windll.kernel32.GetModuleHandleA() # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: Procedure probably called with not enough arguments (4 bytes missing)
>>> windll.kernel32.GetModuleHandleA(0, 0) # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: Procedure probably called with too many arguments (4 bytes in excess)
>>>
The same exception is raised when you call an stdcall function with the
cdecl calling convention, or vice versa:
>>> cdll.kernel32.GetModuleHandleA(None) # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: Procedure probably called with not enough arguments (4 bytes missing)
>>>
>>> windll.msvcrt.printf(b"spam") # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: Procedure probably called with too many arguments (4 bytes in excess)
>>>
To find out the correct calling convention you have to look into the C header
file or the documentation for the function you want to call.
On Windows, ctypes uses win32 structured exception handling to prevent
crashes from general protection faults when functions are called with invalid
argument values:
There are, however, enough ways to crash Python with ctypes, so you
should be careful anyway.
None, integers, bytes objects and (unicode) strings are the only native
Python objects that can directly be used as parameters in these function calls.
None is passed as a C NULL pointer, bytes objects and strings are passed
as pointer to the memory block that contains their data (char* or
wchar_t*). Python integers are passed as the platforms default C
int type, their value is masked to fit into the C type.
Before we move on calling functions with other parameter types, we have to learn
more about ctypes data types.
Assigning a new value to instances of the pointer types c_char_p,
c_wchar_p, and c_void_p changes the memory location they
point to, not the contents of the memory block (of course not, because Python
bytes objects are immutable):
>>> s = "Hello, World"
>>> c_s = c_wchar_p(s)
>>> print(c_s)
c_wchar_p('Hello, World')
>>> c_s.value = "Hi, there"
>>> print(c_s)
c_wchar_p('Hi, there')
>>> print(s) # first object is unchanged
Hello, World
>>>
You should be careful, however, not to pass them to functions expecting pointers
to mutable memory. If you need mutable memory blocks, ctypes has a
create_string_buffer() function which creates these in various ways. The
current memory block contents can be accessed (or changed) with the raw
property; if you want to access it as NUL terminated string, use the value
property:
>>> from ctypes import *
>>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes
>>> print(sizeof(p), repr(p.raw))
3 b'\x00\x00\x00'
>>> p = create_string_buffer(b"Hello") # create a buffer containing a NUL terminated string
>>> print(sizeof(p), repr(p.raw))
6 b'Hello\x00'
>>> print(repr(p.value))
b'Hello'
>>> p = create_string_buffer(b"Hello", 10) # create a 10 byte buffer
>>> print(sizeof(p), repr(p.raw))
10 b'Hello\x00\x00\x00\x00\x00'
>>> p.value = b"Hi"
>>> print(sizeof(p), repr(p.raw))
10 b'Hi\x00lo\x00\x00\x00\x00\x00'
>>>
The create_string_buffer() function replaces the c_buffer() function
(which is still available as an alias), as well as the c_string() function
from earlier ctypes releases. To create a mutable memory block containing
unicode characters of the C type wchar_t use the
create_unicode_buffer() function.
Note that printf prints to the real standard output channel, not to
sys.stdout, so these examples will only work at the console prompt, not
from within IDLE or PythonWin:
>>> printf = libc.printf
>>> printf(b"Hello, %s\n", b"World!")
Hello, World!
14
>>> printf(b"Hello, %S\n", "World!")
Hello, World!
14
>>> printf(b"%d bottles of beer\n", 42)
42 bottles of beer
19
>>> printf(b"%f bottles of beer\n", 42.5)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ArgumentError: argument 2: exceptions.TypeError: Don't know how to convert parameter 2
>>>
As has been mentioned before, all Python types except integers, strings, and
bytes objects have to be wrapped in their corresponding ctypes type, so
that they can be converted to the required C data type:
>>> printf(b"An int %d, a double %f\n", 1234, c_double(3.14))
An int 1234, a double 3.140000
31
>>>
Calling functions with your own custom data types¶
You can also customize ctypes argument conversion to allow instances of
your own classes be used as function arguments. ctypes looks for an
_as_parameter_ attribute and uses this as the function argument. Of
course, it must be one of integer, string, or bytes:
>>> class Bottles:
... def __init__(self, number):
... self._as_parameter_ = number
...
>>> bottles = Bottles(42)
>>> printf(b"%d bottles of beer\n", bottles)
42 bottles of beer
19
>>>
If you don’t want to store the instance’s data in the _as_parameter_
instance variable, you could define a property which makes the
attribute available on request.
Specifying the required argument types (function prototypes)¶
It is possible to specify the required argument types of functions exported from
DLLs by setting the argtypes attribute.
argtypes must be a sequence of C data types (the printf function is
probably not a good example here, because it takes a variable number and
different types of parameters depending on the format string, on the other hand
this is quite handy to experiment with this feature):
Specifying a format protects against incompatible argument types (just as a
prototype for a C function), and tries to convert the arguments to valid types:
>>> printf(b"%d %d %d", 1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ArgumentError: argument 2: exceptions.TypeError: wrong type
>>> printf(b"%s %d %f\n", b"X", 2, 3)
X 2 3.000000
13
>>>
If you have defined your own classes which you pass to function calls, you have
to implement a from_param() class method for them to be able to use them
in the argtypes sequence. The from_param() class method receives
the Python object passed to the function call, it should do a typecheck or
whatever is needed to make sure this object is acceptable, and then return the
object itself, its _as_parameter_ attribute, or whatever you want to
pass as the C function argument in this case. Again, the result should be an
integer, string, bytes, a ctypes instance, or an object with an
_as_parameter_ attribute.
By default functions are assumed to return the C int type. Other
return types can be specified by setting the restype attribute of the
function object.
Here is a more advanced example, it uses the strchr function, which expects
a string pointer and a char, and returns a pointer to a string:
>>> strchr = libc.strchr
>>> strchr(b"abcdef", ord("d")) # doctest: +SKIP
8059983
>>> strchr.restype = c_char_p # c_char_p is a pointer to a string
>>> strchr(b"abcdef", ord("d"))
b'def'
>>> print(strchr(b"abcdef", ord("x")))
None
>>>
If you want to avoid the ord("x") calls above, you can set the
argtypes attribute, and the second argument will be converted from a
single character Python bytes object into a C char:
You can also use a callable Python object (a function or a class for example) as
the restype attribute, if the foreign function returns an integer. The
callable will be called with the integer the C function returns, and the
result of this call will be used as the result of your function call. This is
useful to check for error return values and automatically raise an exception:
>>> GetModuleHandle = windll.kernel32.GetModuleHandleA # doctest: +WINDOWS
>>> def ValidHandle(value):
... if value == 0:
... raise WinError()
... return value
...
>>>
>>> GetModuleHandle.restype = ValidHandle # doctest: +WINDOWS
>>> GetModuleHandle(None) # doctest: +WINDOWS
486539264
>>> GetModuleHandle("something silly") # doctest: +WINDOWS
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in ValidHandle
WindowsError: [Errno 126] The specified module could not be found.
>>>
WinError is a function which will call Windows FormatMessage() api to
get the string representation of an error code, and returns an exception.
WinError takes an optional error code parameter, if no one is used, it calls
GetLastError() to retrieve it.
Please note that a much more powerful error checking mechanism is available
through the errcheck attribute; see the reference manual for details.
Passing pointers (or: passing parameters by reference)¶
Sometimes a C api function expects a pointer to a data type as parameter,
probably to write into the corresponding location, or if the data is too large
to be passed by value. This is also known as passing parameters by reference.
ctypes exports the byref() function which is used to pass parameters
by reference. The same effect can be achieved with the pointer() function,
although pointer() does a lot more work since it constructs a real pointer
object, so it is faster to use byref() if you don’t need the pointer
object in Python itself:
Structures and unions must derive from the Structure and Union
base classes which are defined in the ctypes module. Each subclass must
define a _fields_ attribute. _fields_ must be a list of
2-tuples, containing a field name and a field type.
The field type must be a ctypes type like c_int, or any other
derived ctypes type: structure, union, array, pointer.
Here is a simple example of a POINT structure, which contains two integers named
x and y, and also shows how to initialize a structure in the constructor:
>>> from ctypes import *
>>> class POINT(Structure):
... _fields_ = [("x", c_int),
... ("y", c_int)]
...
>>> point = POINT(10, 20)
>>> print(point.x, point.y)
10 20
>>> point = POINT(y=5)
>>> print(point.x, point.y)
0 5
>>> POINT(1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: too many initializers
>>>
You can, however, build much more complicated structures. Structures can itself
contain other structures by using a structure as a field type.
Here is a RECT structure which contains two POINTs named upperleft and
lowerright:
By default, Structure and Union fields are aligned in the same way the C
compiler does it. It is possible to override this behavior be specifying a
_pack_ class attribute in the subclass definition. This must be set to a
positive integer and specifies the maximum alignment for the fields. This is
what #pragmapack(n) also does in MSVC.
ctypes uses the native byte order for Structures and Unions. To build
structures with non-native byte order, you can use one of the
BigEndianStructure, LittleEndianStructure,
BigEndianUnion, and LittleEndianUnion base classes. These
classes cannot contain pointer fields.
It is possible to create structures and unions containing bit fields. Bit fields
are only possible for integer fields, the bit width is specified as the third
item in the _fields_ tuples:
It is also possible to use indexes different from 0, but you must know what
you’re doing, just as in C: You can access or change arbitrary memory locations.
Generally you only use this feature if you receive a pointer from a C function,
and you know that the pointer actually points to an array instead of a single
item.
Behind the scenes, the pointer() function does more than simply create
pointer instances, it has to create pointer types first. This is done with the
POINTER() function, which accepts any ctypes type, and returns a
new type:
>>> PI = POINTER(c_int)
>>> PI
<class 'ctypes.LP_c_long'>
>>> PI(42)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: expected c_long instead of int
>>> PI(c_int(42))
<ctypes.LP_c_long object at 0x...>
>>>
Calling the pointer type without an argument creates a NULL pointer.
NULL pointers have a False boolean value:
Usually, ctypes does strict type checking. This means, if you have
POINTER(c_int) in the argtypes list of a function or as the type of
a member field in a structure definition, only instances of exactly the same
type are accepted. There are some exceptions to this rule, where ctypes accepts
other objects. For example, you can pass compatible array instances instead of
pointer types. So, for POINTER(c_int), ctypes accepts an array of c_int:
>>> class Bar(Structure):
... _fields_ = [("count", c_int), ("values", POINTER(c_int))]
...
>>> bar = Bar()
>>> bar.values = (c_int * 3)(1, 2, 3)
>>> bar.count = 3
>>> for i in range(bar.count):
... print(bar.values[i])
...
1
2
3
>>>
To set a POINTER type field to NULL, you can assign None:
>>> bar.values = None
>>>
Sometimes you have instances of incompatible types. In C, you can cast one type
into another type. ctypes provides a cast() function which can be
used in the same way. The Bar structure defined above accepts
POINTER(c_int) pointers or c_int arrays for its values field,
but not instances of other types:
>>> bar.values = (c_byte * 4)()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: incompatible types, c_byte_Array_4 instance instead of LP_c_long instance
>>>
The cast() function can be used to cast a ctypes instance into a pointer
to a different ctypes data type. cast() takes two parameters, a ctypes
object that is or can be converted to a pointer of some kind, and a ctypes
pointer type. It returns an instance of the second argument, which references
the same memory block as the first argument:
>>> a = (c_byte * 4)()
>>> cast(a, POINTER(c_int))
<ctypes.LP_c_long object at ...>
>>>
So, cast() can be used to assign to the values field of Bar the
structure:
Incomplete Types are structures, unions or arrays whose members are not yet
specified. In C, they are specified by forward declarations, which are defined
later:
The straightforward translation into ctypes code would be this, but it does not
work:
>>> class cell(Structure):
... _fields_ = [("name", c_char_p),
... ("next", POINTER(cell))]
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in cell
NameError: name 'cell' is not defined
>>>
because the new classcell is not available in the class statement itself.
In ctypes, we can define the cell class and set the _fields_
attribute later, after the class statement:
>>> from ctypes import *
>>> class cell(Structure):
... pass
...
>>> cell._fields_ = [("name", c_char_p),
... ("next", POINTER(cell))]
>>>
Lets try it. We create two instances of cell, and let them point to each
other, and finally follow the pointer chain a few times:
>>> c1 = cell()
>>> c1.name = "foo"
>>> c2 = cell()
>>> c2.name = "bar"
>>> c1.next = pointer(c2)
>>> c2.next = pointer(c1)
>>> p = c1
>>> for i in range(8):
... print(p.name, end=" ")
... p = p.next[0]
...
foo bar foo bar foo bar foo bar
>>>
ctypes allows to create C callable function pointers from Python callables.
These are sometimes called callback functions.
First, you must create a class for the callback function, the class knows the
calling convention, the return type, and the number and types of arguments this
function will receive.
The CFUNCTYPE factory function creates types for callback functions using the
normal cdecl calling convention, and, on Windows, the WINFUNCTYPE factory
function creates types for callback functions using the stdcall calling
convention.
Both of these factory functions are called with the result type as first
argument, and the callback functions expected argument types as the remaining
arguments.
I will present an example here which uses the standard C library’s
qsort() function, this is used to sort items with the help of a callback
function. qsort() will be used to sort an array of integers:
qsort() must be called with a pointer to the data to sort, the number of
items in the data array, the size of one item, and a pointer to the comparison
function, the callback. The callback will then be called with two pointers to
items, and it must return a negative integer if the first item is smaller than
the second, a zero if they are equal, and a positive integer else.
So our callback function receives pointers to integers, and must return an
integer. First we create the type for the callback function:
For the first implementation of the callback function, we simply print the
arguments we get, and return 0 (incremental development ;-):
>>> def py_cmp_func(a, b):
... print("py_cmp_func", a, b)
... return 0
...
>>>
Create the C callable callback:
>>> cmp_func = CMPFUNC(py_cmp_func)
>>>
And we’re ready to go:
>>> qsort(ia, len(ia), sizeof(c_int), cmp_func) # doctest: +WINDOWS
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
py_cmp_func <ctypes.LP_c_long object at 0x00...> <ctypes.LP_c_long object at 0x00...>
>>>
We know how to access the contents of a pointer, so lets redefine our callback:
It is quite interesting to see that the Windows qsort() function needs
more comparisons than the linux version!
As we can easily check, our array is sorted now:
>>> for i in ia: print(i, end=" ")
...
1 5 7 33 99
>>>
Important note for callback functions:
Make sure you keep references to CFUNCTYPE objects as long as they are used from
C code. ctypes doesn’t, and if you don’t, they may be garbage collected,
crashing your program when a callback is made.
Some shared libraries not only export functions, they also export variables. An
example in the Python library itself is the Py_OptimizeFlag, an integer
set to 0, 1, or 2, depending on the -O or -OO flag given on
startup.
ctypes can access values like this with the in_dll() class methods of
the type. pythonapi is a predefined symbol giving access to the Python C
api:
If the interpreter would have been started with -O, the sample would
have printed c_long(1), or c_long(2) if -OO would have been
specified.
An extended example which also demonstrates the use of pointers accesses the
PyImport_FrozenModules pointer exported by Python.
Quoting the docs for that value:
This pointer is initialized to point to an array of struct_frozen
records, terminated by one whose members are all NULL or zero. When a frozen
module is imported, it is searched in this table. Third-party code could play
tricks with this to provide a dynamically created collection of frozen modules.
So manipulating this pointer could even prove useful. To restrict the example
size, we show only how this table can be read with ctypes:
Since table is a pointer to the array of struct_frozen records, we
can iterate over it, but we just have to make sure that our loop terminates,
because pointers have no size. Sooner or later it would probably crash with an
access violation or whatever, so it’s better to break out of the loop when we
hit the NULL entry:
>>> for item in table:
... print(item.name, item.size)
... if item.name is None:
... break
...
__hello__ 104
__phello__ -104
__phello__.spam 104
None 0
>>>
The fact that standard Python has a frozen module and a frozen package
(indicated by the negative size member) is not well known, it is only used for
testing. Try it out with import__hello__ for example.
Note that temp0 and temp1 are objects still using the internal buffer of
the rc object above. So executing rc.a=temp0 copies the buffer
contents of temp0 into rc ‘s buffer. This, in turn, changes the
contents of temp1. So, the last assignment rc.b=temp1, doesn’t have
the expected effect.
Keep in mind that retrieving sub-objects from Structure, Unions, and Arrays
doesn’t copy the sub-object, instead it retrieves a wrapper object accessing
the root-object’s underlying buffer.
Another example that may behave different from what one would expect is this:
Why is it printing False? ctypes instances are objects containing a memory
block plus some descriptors accessing the contents of the memory.
Storing a Python object in the memory block does not store the object itself,
instead the contents of the object is stored. Accessing the contents again
constructs a new Python object each time!
ctypes provides some support for variable-sized arrays and structures.
The resize() function can be used to resize the memory buffer of an
existing ctypes object. The function takes the object as first argument, and
the requested size in bytes as the second argument. The memory block cannot be
made smaller than the natural memory block specified by the objects type, a
ValueError is raised if this is tried:
This is nice and fine, but how would one access the additional elements
contained in this array? Since the type still only knows about 4 elements, we
get errors accessing other elements:
Another way to use variable-sized data types with ctypes is to use the
dynamic nature of Python, and (re-)define the data type after the required size
is already known, on a case by case basis.
When programming in a compiled language, shared libraries are accessed when
compiling/linking a program, and when the program is run.
The purpose of the find_library() function is to locate a library in a way
similar to what the compiler does (on platforms with several versions of a
shared library the most recent should be loaded), while the ctypes library
loaders act like when a program is run, and call the runtime loader directly.
The ctypes.util module provides a function which can help to determine
the library to load.
ctypes.util.find_library(name)
Try to find a library and return a pathname. name is the library name without
any prefix like lib, suffix like .so, .dylib or version number (this
is the form used for the posix linker option -l). If no library can
be found, returns None.
The exact functionality is system dependent.
On Linux, find_library() tries to run external programs
(/sbin/ldconfig, gcc, and objdump) to find the library file. It
returns the filename of the library file. Here are some examples:
On Windows, find_library() searches along the system search path, and
returns the full pathname, but since there is no predefined naming scheme a call
like find_library("c") will fail and return None.
If wrapping a shared library with ctypes, it may be better to determine
the shared library name at development type, and hardcode that into the wrapper
module instead of using find_library() to locate the library at runtime.
There are several ways to loaded shared libraries into the Python process. One
way is to instantiate one of the following classes:
class ctypes.CDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False)¶
Instances of this class represent loaded shared libraries. Functions in these
libraries use the standard C calling convention, and are assumed to return
int.
class ctypes.OleDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False)¶
Windows only: Instances of this class represent loaded shared libraries,
functions in these libraries use the stdcall calling convention, and are
assumed to return the windows specific HRESULT code. HRESULT
values contain information specifying whether the function call failed or
succeeded, together with additional error code. If the return value signals a
failure, an WindowsError is automatically raised.
class ctypes.WinDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False)¶
Windows only: Instances of this class represent loaded shared libraries,
functions in these libraries use the stdcall calling convention, and are
assumed to return int by default.
On Windows CE only the standard calling convention is used, for convenience the
WinDLL and OleDLL use the standard calling convention on this
platform.
The Python global interpreter lock is released before calling any
function exported by these libraries, and reacquired afterwards.
class ctypes.PyDLL(name, mode=DEFAULT_MODE, handle=None)¶
Instances of this class behave like CDLL instances, except that the
Python GIL is not released during the function call, and after the function
execution the Python error flag is checked. If the error flag is set, a Python
exception is raised.
Thus, this is only useful to call Python C api functions directly.
All these classes can be instantiated by calling them with at least one
argument, the pathname of the shared library. If you have an existing handle to
an already loaded shared library, it can be passed as the handle named
parameter, otherwise the underlying platforms dlopen or LoadLibrary
function is used to load the library into the process, and to get a handle to
it.
The mode parameter can be used to specify how the library is loaded. For
details, consult the dlopen(3) manpage, on Windows, mode is
ignored.
The use_errno parameter, when set to True, enables a ctypes mechanism that
allows to access the system errno error number in a safe way.
ctypes maintains a thread-local copy of the systems errno
variable; if you call foreign functions created with use_errno=True then the
errno value before the function call is swapped with the ctypes private
copy, the same happens immediately after the function call.
The function ctypes.get_errno() returns the value of the ctypes private
copy, and the function ctypes.set_errno() changes the ctypes private copy
to a new value and returns the former value.
The use_last_error parameter, when set to True, enables the same mechanism for
the Windows error code which is managed by the GetLastError() and
SetLastError() Windows API functions; ctypes.get_last_error() and
ctypes.set_last_error() are used to request and change the ctypes private
copy of the windows error code.
ctypes.RTLD_GLOBAL
Flag to use as mode parameter. On platforms where this flag is not available,
it is defined as the integer zero.
ctypes.RTLD_LOCAL
Flag to use as mode parameter. On platforms where this is not available, it
is the same as RTLD_GLOBAL.
ctypes.DEFAULT_MODE
The default mode which is used to load shared libraries. On OSX 10.3, this is
RTLD_GLOBAL, otherwise it is the same as RTLD_LOCAL.
Instances of these classes have no public methods, however __getattr__()
and __getitem__() have special behavior: functions exported by the shared
library can be accessed as attributes of by index. Please note that both
__getattr__() and __getitem__() cache their result, so calling them
repeatedly returns the same object each time.
The following public attributes are available, their name starts with an
underscore to not clash with exported function names:
The name of the library passed in the constructor.
Shared libraries can also be loaded by using one of the prefabricated objects,
which are instances of the LibraryLoader class, either by calling the
LoadLibrary() method, or by retrieving the library as attribute of the
loader instance.
Class which loads shared libraries. dlltype should be one of the
CDLL, PyDLL, WinDLL, or OleDLL types.
__getattr__() has special behavior: It allows to load a shared library by
accessing it as attribute of a library loader instance. The result is cached,
so repeated attribute accesses return the same library each time.
For accessing the C Python api directly, a ready-to-use Python shared library
object is available:
ctypes.pythonapi
An instance of PyDLL that exposes Python C API functions as
attributes. Note that all these functions are assumed to return C
int, which is of course not always the truth, so you have to assign
the correct restype attribute to use these functions.
As explained in the previous section, foreign functions can be accessed as
attributes of loaded shared libraries. The function objects created in this way
by default accept any number of arguments, accept any ctypes data instances as
arguments, and return the default result type specified by the library loader.
They are instances of a private class:
Assign a ctypes type to specify the result type of the foreign function.
Use None for void, a function not returning anything.
It is possible to assign a callable Python object that is not a ctypes
type, in this case the function is assumed to return a C int, and
the callable will be called with this integer, allowing to do further
processing or error checking. Using this is deprecated, for more flexible
post processing or error checking use a ctypes data type as
restype and assign a callable to the errcheck attribute.
Assign a tuple of ctypes types to specify the argument types that the
function accepts. Functions using the stdcall calling convention can
only be called with the same number of arguments as the length of this
tuple; functions using the C calling convention accept additional,
unspecified arguments as well.
When a foreign function is called, each actual argument is passed to the
from_param() class method of the items in the argtypes
tuple, this method allows to adapt the actual argument to an object that
the foreign function accepts. For example, a c_char_p item in
the argtypes tuple will convert a string passed as argument into
a bytes object using ctypes conversion rules.
New: It is now possible to put items in argtypes which are not ctypes
types, but each item must have a from_param() method which returns a
value usable as argument (integer, string, ctypes instance). This allows
to define adapters that can adapt custom objects as function parameters.
Assign a Python function or another callable to this attribute. The
callable will be called with three or more arguments:
callable(result, func, arguments)
result is what the foreign function returns, as specified by the
restype attribute.
func is the foreign function object itself, this allows to reuse the
same callable object to check or post process the results of several
functions.
arguments is a tuple containing the parameters originally passed to
the function call, this allows to specialize the behavior on the
arguments used.
The object that this function returns will be returned from the
foreign function call, but it can also check the result value
and raise an exception if the foreign function call failed.
Foreign functions can also be created by instantiating function prototypes.
Function prototypes are similar to function prototypes in C; they describe a
function (return type, argument types, calling convention) without defining an
implementation. The factory functions must be called with the desired result
type and the argument types of the function.
The returned function prototype creates functions that use the standard C
calling convention. The function will release the GIL during the call. If
use_errno is set to True, the ctypes private copy of the system
errno variable is exchanged with the real errno value before
and after the call; use_last_error does the same for the Windows error
code.
Windows only: The returned function prototype creates functions that use the
stdcall calling convention, except on Windows CE where
WINFUNCTYPE() is the same as CFUNCTYPE(). The function will
release the GIL during the call. use_errno and use_last_error have the
same meaning as above.
The returned function prototype creates functions that use the Python calling
convention. The function will not release the GIL during the call.
Function prototypes created by these factory functions can be instantiated in
different ways, depending on the type and number of the parameters in the call:
prototype(address)
Returns a foreign function at the specified address which must be an integer.
prototype(callable)
Create a C callable function (a callback function) from a Python callable.
prototype(func_spec[, paramflags])
Returns a foreign function exported by a shared library. func_spec must
be a 2-tuple (name_or_ordinal,library). The first item is the name of
the exported function as string, or the ordinal of the exported function
as small integer. The second item is the shared library instance.
prototype(vtbl_index, name[, paramflags[, iid]])
Returns a foreign function that will call a COM method. vtbl_index is
the index into the virtual function table, a small non-negative
integer. name is name of the COM method. iid is an optional pointer to
the interface identifier which is used in extended error reporting.
COM methods use a special calling convention: They require a pointer to
the COM interface as first argument, in addition to those parameters that
are specified in the argtypes tuple.
The optional paramflags parameter creates foreign function wrappers with much
more functionality than the features described above.
paramflags must be a tuple of the same length as argtypes.
Each item in this tuple contains further information about a parameter, it must
be a tuple containing one, two, or three items.
The first item is an integer containing a combination of direction
flags for the parameter:
1
Specifies an input parameter to the function.
2
Output parameter. The foreign function fills in a value.
4
Input parameter which defaults to the integer zero.
The optional second item is the parameter name as string. If this is specified,
the foreign function can be called with named parameters.
The optional third item is the default value for this parameter.
This example demonstrates how to wrap the Windows MessageBoxA function so
that it supports default parameters and named arguments. The C declaration from
the windows header file is this:
A second example demonstrates output parameters. The win32 GetWindowRect
function retrieves the dimensions of a specified window by copying them into
RECT structure that the caller has to supply. Here is the C declaration:
Functions with output parameters will automatically return the output parameter
value if there is a single one, or a tuple containing the output parameter
values when there are more than one, so the GetWindowRect function now returns a
RECT instance, when called.
Output parameters can be combined with the errcheck protocol to do
further output processing and error checking. The win32 GetWindowRect api
function returns a BOOL to signal success or failure, so this function could
do the error checking, and raises an exception when the api call failed:
If the errcheck function returns the argument tuple it receives
unchanged, ctypes continues the normal processing it does on the output
parameters. If you want to return a tuple of window coordinates instead of a
RECT instance, you can retrieve the fields in the function and return them
instead, the normal processing will no longer take place:
Returns a light-weight pointer to obj, which must be an instance of a
ctypes type. offset defaults to zero, and must be an integer that will be
added to the internal pointer value.
byref(obj,offset) corresponds to this C code:
(((char *)&obj) + offset)
The returned object can only be used as a foreign function call parameter.
It behaves similar to pointer(obj), but the construction is a lot faster.
This function is similar to the cast operator in C. It returns a new instance
of type which points to the same memory block as obj. type must be a
pointer type, and obj must be an object that can be interpreted as a
pointer.
This function creates a mutable character buffer. The returned object is a
ctypes array of c_char.
init_or_size must be an integer which specifies the size of the array, or a
bytes object which will be used to initialize the array items.
If a bytes object is specified as first argument, the buffer is made one item
larger than its length so that the last element in the array is a NUL
termination character. An integer can be passed as second argument which allows
to specify the size of the array if the length of the bytes should not be used.
If the first parameter is a string, it is converted into a bytes object
according to ctypes conversion rules.
This function creates a mutable unicode character buffer. The returned object is
a ctypes array of c_wchar.
init_or_size must be an integer which specifies the size of the array, or a
string which will be used to initialize the array items.
If a string is specified as first argument, the buffer is made one item
larger than the length of the string so that the last element in the array is a
NUL termination character. An integer can be passed as second argument which
allows to specify the size of the array if the length of the string should not
be used.
If the first parameter is a bytes object, it is converted into an unicode string
according to ctypes conversion rules.
Windows only: This function is a hook which allows to implement in-process
COM servers with ctypes. It is called from the DllCanUnloadNow function that
the _ctypes extension dll exports.
Windows only: This function is a hook which allows to implement in-process
COM servers with ctypes. It is called from the DllGetClassObject function
that the _ctypes extension dll exports.
Try to find a library and return a pathname. name is the library name
without any prefix like lib, suffix like .so, .dylib or version
number (this is the form used for the posix linker option -l). If
no library can be found, returns None.
Windows only: return the filename of the VC runtype library used by Python,
and by the extension modules. If the name of the library cannot be
determined, None is returned.
If you need to free memory, for example, allocated by an extension module
with a call to the free(void*), it is important that you use the
function in the same library that allocated the memory.
Windows only: Returns a textual description of the error code code. If no
error code is specified, the last error code is used by calling the Windows
api function GetLastError.
Windows only: Returns the last error code set by Windows in the calling thread.
This function calls the Windows GetLastError() function directly,
it does not return the ctypes-private copy of the error code.
Same as the standard C memmove library function: copies count bytes from
src to dst. dst and src must be integers or ctypes instances that can
be converted to pointers.
Same as the standard C memset library function: fills the memory block at
address dst with count bytes of value c. dst must be an integer
specifying an address, or a ctypes instance.
This factory function creates and returns a new ctypes pointer type. Pointer
types are cached an reused internally, so calling this function repeatedly is
cheap. type must be a ctypes type.
This function resizes the internal memory buffer of obj, which must be an
instance of a ctypes type. It is not possible to make the buffer smaller
than the native size of the objects type, as given by sizeof(type(obj)),
but it is possible to enlarge the buffer.
Windows only: set the current value of the ctypes-private copy of the system
LastError variable in the calling thread to value and return the
previous value.
This function returns the C string starting at memory address address as a bytes
object. If size is specified, it is used as size, otherwise the string is assumed
to be zero-terminated.
Windows only: this function is probably the worst-named thing in ctypes. It
creates an instance of WindowsError. If code is not specified,
GetLastError is called to determine the error code. If descr is not
specified, FormatError() is called to get a textual description of the
error.
This function returns the wide character string starting at memory address
address as a string. If size is specified, it is used as the number of
characters of the string, otherwise the string is assumed to be
zero-terminated.
This non-public class is the common base class of all ctypes data types.
Among other things, all ctypes type instances contain a memory block that
hold C compatible data; the address of the memory block is returned by the
addressof() helper function. Another instance variable is exposed as
_objects; this contains other Python objects that need to be kept
alive in case the memory block contains pointers.
Common methods of ctypes data types, these are all class methods (to be
exact, they are methods of the metaclass):
This method returns a ctypes instance that shares the buffer of the
source object. The source object must support the writeable buffer
interface. The optional offset parameter specifies an offset into the
source buffer in bytes; the default is zero. If the source buffer is not
large enough a ValueError is raised.
This method creates a ctypes instance, copying the buffer from the
source object buffer which must be readable. The optional offset
parameter specifies an offset into the source buffer in bytes; the default
is zero. If the source buffer is not large enough a ValueError is
raised.
This method adapts obj to a ctypes type. It is called with the actual
object used in a foreign function call when the type is present in the
foreign function’s argtypes tuple; it must return an object that
can be used as a function call parameter.
All ctypes data types have a default implementation of this classmethod
that normally returns obj if that is an instance of the type. Some
types accept other objects as well.
This method returns a ctypes type instance exported by a shared
library. name is the name of the symbol that exports the data, library
is the loaded shared library.
Sometimes ctypes data instances do not own the memory block they contain,
instead they share part of the memory block of a base object. The
_b_base_ read-only member is the root ctypes object that owns the
memory block.
This member is either None or a dictionary containing Python objects
that need to be kept alive so that the memory block contents is kept
valid. This object is only exposed for debugging; never modify the
contents of this dictionary.
This non-public class is the base class of all fundamental ctypes data
types. It is mentioned here because it contains the common attributes of the
fundamental ctypes data types. _SimpleCData is a subclass of
_CData, so it inherits their methods and attributes. ctypes data
types that are not and do not contain pointers can now be pickled.
This attribute contains the actual value of the instance. For integer and
pointer types, it is an integer, for character types, it is a single
character bytes object or string, for character pointer types it is a
Python bytes object or string.
When the value attribute is retrieved from a ctypes instance, usually
a new object is returned each time. ctypes does not implement
original object return, always a new object is constructed. The same is
true for all other ctypes object instances.
Fundamental data types, when returned as foreign function call results, or, for
example, by retrieving structure field members or array items, are transparently
converted to native Python types. In other words, if a foreign function has a
restype of c_char_p, you will always receive a Python bytes
object, not a c_char_p instance.
Subclasses of fundamental data types do not inherit this behavior. So, if a
foreign functions restype is a subclass of c_void_p, you will
receive an instance of this subclass from the function call. Of course, you can
get the value of the pointer by accessing the value attribute.
Represents the C signedchar datatype, and interprets the value as
small integer. The constructor accepts an optional integer initializer; no
overflow checking is done.
Represents the C char datatype, and interprets the value as a single
character. The constructor accepts an optional string initializer, the
length of the string must be exactly one character.
Represents the C char* datatype when it points to a zero-terminated
string. For a general character pointer that may also point to binary data,
POINTER(c_char) must be used. The constructor accepts an integer
address, or a bytes object.
Represents the C longdouble datatype. The constructor accepts an
optional float initializer. On platforms where sizeof(longdouble)==sizeof(double) it is an alias to c_double.
Represents the C signedint datatype. The constructor accepts an
optional integer initializer; no overflow checking is done. On platforms
where sizeof(int)==sizeof(long) it is an alias to c_long.
Represents the C unsignedchar datatype, it interprets the value as
small integer. The constructor accepts an optional integer initializer; no
overflow checking is done.
Represents the C unsignedint datatype. The constructor accepts an
optional integer initializer; no overflow checking is done. On platforms
where sizeof(int)==sizeof(long) it is an alias for c_ulong.
Represents the C wchar_t datatype, and interprets the value as a
single character unicode string. The constructor accepts an optional string
initializer, the length of the string must be exactly one character.
Represents the C wchar_t* datatype, which must be a pointer to a
zero-terminated wide character string. The constructor accepts an integer
address, or a string.
Represent the C bool datatype (more accurately, _Bool from
C99). Its value can be True or False, and the constructor accepts any object
that has a truth value.
Represents the C PyObject* datatype. Calling this without an
argument creates a NULLPyObject* pointer.
The ctypes.wintypes module provides quite some other Windows specific
data types, for example HWND, WPARAM, or DWORD. Some
useful structures like MSG or RECT are also defined.
Abstract base class for structures in native byte order.
Concrete structure and union types must be created by subclassing one of these
types, and at least define a _fields_ class variable. ctypes will
create descriptors which allow reading and writing the fields by direct
attribute accesses. These are the
A sequence defining the structure fields. The items must be 2-tuples or
3-tuples. The first item is the name of the field, the second item
specifies the type of the field; it can be any ctypes data type.
For integer type fields like c_int, a third optional item can be
given. It must be a small positive integer defining the bit width of the
field.
Field names must be unique within one structure or union. This is not
checked, only one field can be accessed when names are repeated.
It is possible to define the _fields_ class variable after the
class statement that defines the Structure subclass, this allows to create
data types that directly or indirectly reference themselves:
class List(Structure):
pass
List._fields_ = [("pnext", POINTER(List)),
...
]
The _fields_ class variable must, however, be defined before the
type is first used (an instance is created, sizeof() is called on it,
and so on). Later assignments to the _fields_ class variable will
raise an AttributeError.
Structure and union subclass constructors accept both positional and named
arguments. Positional arguments are used to initialize the fields in the
same order as they appear in the _fields_ definition, named
arguments are used to initialize the fields with the corresponding name.
It is possible to defined sub-subclasses of structure types, they inherit
the fields of the base class plus the _fields_ defined in the
sub-subclass, if any.
An optional small integer that allows to override the alignment of
structure fields in the instance. _pack_ must already be defined
when _fields_ is assigned, otherwise it will have no effect.
An optional sequence that lists the names of unnamed (anonymous) fields.
_anonymous_ must be already defined when _fields_ is
assigned, otherwise it will have no effect.
The fields listed in this variable must be structure or union type fields.
ctypes will create descriptors in the structure type that allows to
access the nested fields directly, without the need to create the
structure or union field.
The TYPEDESC structure describes a COM data type, the vt field
specifies which one of the union fields is valid. Since the u field
is defined as anonymous field, it is now possible to access the members
directly off the TYPEDESC instance. td.lptdesc and td.u.lptdesc
are equivalent, but the former is faster since it does not need to create
a temporary union instance:
It is possible to defined sub-subclasses of structures, they inherit the
fields of the base class. If the subclass definition has a separate
_fields_ variable, the fields specified in this are appended to the
fields of the base class.
Structure and union constructors accept both positional and keyword
arguments. Positional arguments are used to initialize member fields in the
same order as they are appear in _fields_. Keyword arguments in the
constructor are interpreted as attribute assignments, so they will initialize
_fields_ with the same name, or create new attributes for names not
present in _fields_.
The modules described in this chapter provide interfaces to operating system
features that are available on selected operating systems only. The interfaces
are generally modeled after the Unix or C interfaces but they are available on
some other systems as well (e.g. Windows). Here’s an overview:
This module provides access to the select() and poll() functions
available in most operating systems, epoll() available on Linux 2.5+ and
kqueue() available on most BSD.
Note that on Windows, it only works for sockets; on other operating systems,
it also works for other file types (in particular, on Unix, it works on pipes).
It cannot be used on regular files to determine whether a file has grown since
it was last read.
The exception raised when an error occurs. The accompanying value is a pair
containing the numeric error code from errno and the corresponding
string, as would be printed by the C function perror().
(Only supported on Linux 2.5.44 and newer.) Returns an edge polling object,
which can be used as Edge or Level Triggered interface for I/O events; see
section Edge and Level Trigger Polling (epoll) Objects below for the methods supported by epolling
objects.
(Not supported by all operating systems.) Returns a polling object, which
supports registering and unregistering file descriptors, and then polling them
for I/O events; see section Polling Objects below for the methods supported
by polling objects.
This is a straightforward interface to the Unix select() system call.
The first three arguments are sequences of ‘waitable objects’: either
integers representing file descriptors or objects with a parameterless method
named fileno() returning such an integer:
rlist: wait until ready for reading
wlist: wait until ready for writing
xlist: wait for an “exceptional condition” (see the manual page for what
your system considers such a condition)
Empty sequences are allowed, but acceptance of three empty sequences is
platform-dependent. (It is known to work on Unix but not on Windows.) The
optional timeout argument specifies a time-out as a floating point number
in seconds. When the timeout argument is omitted the function blocks until
at least one file descriptor is ready. A time-out value of zero specifies a
poll and never blocks.
The return value is a triple of lists of objects that are ready: subsets of the
first three arguments. When the time-out is reached without a file descriptor
becoming ready, three empty lists are returned.
Among the acceptable object types in the sequences are Python file
objects (e.g. sys.stdin, or objects returned by
open() or os.popen()), socket objects returned by
socket.socket(). You may also define a wrapper class yourself,
as long as it has an appropriate fileno() method (that really returns
a file descriptor, not just a random integer).
Note
File objects on Windows are not acceptable, but sockets are. On Windows,
the underlying select() function is provided by the WinSock
library, and does not handle file descriptors that don’t originate from
WinSock.
The minimum number of bytes which can be written without blocking to a pipe
when the pipe has been reported as ready for writing by select(),
poll() or another interface in this module. This doesn’t apply
to other kind of file-like objects such as sockets.
This value is guaranteed by POSIX to be at least 512. Availability: Unix.
The poll() system call, supported on most Unix systems, provides better
scalability for network servers that service many, many clients at the same
time. poll() scales better because the system call only requires listing
the file descriptors of interest, while select() builds a bitmap, turns
on bits for the fds of interest, and then afterward the whole bitmap has to be
linearly scanned again. select() is O(highest file descriptor), while
poll() is O(number of file descriptors).
Register a file descriptor with the polling object. Future calls to the
poll() method will then check whether the file descriptor has any pending
I/O events. fd can be either an integer, or an object with a fileno()
method that returns an integer. File objects implement fileno(), so they
can also be used as the argument.
eventmask is an optional bitmask describing the type of events you want to
check for, and can be a combination of the constants POLLIN,
POLLPRI, and POLLOUT, described in the table below. If not
specified, the default value used will check for all 3 types of events.
Constant
Meaning
POLLIN
There is data to read
POLLPRI
There is urgent data to read
POLLOUT
Ready for output: writing will not block
POLLERR
Error condition of some sort
POLLHUP
Hung up
POLLNVAL
Invalid request: descriptor not open
Registering a file descriptor that’s already registered is not an error, and has
the same effect as registering the descriptor exactly once.
Modifies an already registered fd. This has the same effect as
register(fd,eventmask). Attempting to modify a file descriptor
that was never registered causes an IOError exception with errno
ENOENT to be raised.
Remove a file descriptor being tracked by a polling object. Just like the
register() method, fd can be an integer or an object with a
fileno() method that returns an integer.
Attempting to remove a file descriptor that was never registered causes a
KeyError exception to be raised.
Polls the set of registered file descriptors, and returns a possibly-empty list
containing (fd,event) 2-tuples for the descriptors that have events or
errors to report. fd is the file descriptor, and event is a bitmask with
bits set for the reported events for that descriptor — POLLIN for
waiting input, POLLOUT to indicate that the descriptor can be written
to, and so forth. An empty list indicates that the call timed out and no file
descriptors had any events to report. If timeout is given, it specifies the
length of time in milliseconds which the system will wait for events before
returning. If timeout is omitted, negative, or None, the call will
block until there is an event for this poll object.
Value used to identify the event. The interpretation depends on the filter
but it’s usually the file descriptor. In the constructor ident can either
be an int or an object with a fileno() function. kevent stores the integer
internally.
While they are not listed below, the camelCase names used for some
methods and functions in this module in the Python 2.x series are still
supported by this module.
CPython implementation detail: Due to the Global Interpreter Lock, in CPython only one thread
can execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation).
If you want your application to make better of use of the computational
resources of multi-core machines, you are advised to use
multiprocessing or concurrent.futures.ProcessPoolExecutor.
However, threading is still an appropriate model if you want to run
multiple I/O-bound tasks simultaneously.
This module defines the following functions and objects:
Return the number of Thread objects currently alive. The returned
count is equal to the length of the list returned by enumerate().
threading.Condition()
A factory function that returns a new condition variable object. A condition
variable allows one or more threads to wait until they are notified by another
thread.
Return the current Thread object, corresponding to the caller’s thread
of control. If the caller’s thread of control was not created through the
threading module, a dummy thread object with limited functionality is
returned.
Return a list of all Thread objects currently alive. The list
includes daemonic threads, dummy thread objects created by
current_thread(), and the main thread. It excludes terminated threads
and threads that have not yet been started.
threading.Event()
A factory function that returns a new event object. An event manages a flag
that can be set to true with the set() method and reset to false
with the clear() method. The wait() method blocks until the flag
is true.
A class that represents thread-local data. Thread-local data are data whose
values are thread specific. To manage thread-local data, just create an
instance of local (or a subclass) and store attributes on it:
mydata = threading.local()
mydata.x = 1
The instance’s values will be different for separate threads.
For more details and extensive examples, see the documentation string of the
_threading_local module.
A factory function that returns a new primitive lock object. Once a thread has
acquired it, subsequent attempts to acquire it block, until it is released; any
thread may release it.
A factory function that returns a new reentrant lock object. A reentrant lock
must be released by the thread that acquired it. Once a thread has acquired a
reentrant lock, the same thread may acquire it again without blocking; the
thread must release it once for each time it has acquired it.
A factory function that returns a new semaphore object. A semaphore manages a
counter representing the number of release() calls minus the number of
acquire() calls, plus an initial value. The acquire() method blocks
if necessary until it can return without making the counter negative. If not
given, value defaults to 1.
A factory function that returns a new bounded semaphore object. A bounded
semaphore checks to make sure its current value doesn’t exceed its initial
value. If it does, ValueError is raised. In most situations semaphores
are used to guard resources with limited capacity. If the semaphore is released
too many times it’s a sign of a bug. If not given, value defaults to 1.
class threading.Thread
A class that represents a thread of control. This class can be safely
subclassed in a limited fashion.
Set a trace function for all threads started from the threading module.
The func will be passed to sys.settrace() for each thread, before its
run() method is called.
Set a profile function for all threads started from the threading module.
The func will be passed to sys.setprofile() for each thread, before its
run() method is called.
Return the thread stack size used when creating new threads. The optional
size argument specifies the stack size to be used for subsequently created
threads, and must be 0 (use platform or configured default) or a positive
integer value of at least 32,768 (32kB). If changing the thread stack size is
unsupported, a ThreadError is raised. If the specified stack size is
invalid, a ValueError is raised and the stack size is unmodified. 32kB
is currently the minimum supported stack size value to guarantee sufficient
stack space for the interpreter itself. Note that some platforms may have
particular restrictions on values for the stack size, such as requiring a
minimum stack size > 32kB or requiring allocation in multiples of the system
memory page size - platform documentation should be referred to for more
information (4kB pages are common; using multiples of 4096 for the stack size is
the suggested approach in the absence of more specific information).
Availability: Windows, systems with POSIX threads.
Detailed interfaces for the objects are documented below.
The design of this module is loosely based on Java’s threading model. However,
where Java makes locks and condition variables basic behavior of every object,
they are separate objects in Python. Python’s Thread class supports a
subset of the behavior of Java’s Thread class; currently, there are no
priorities, no thread groups, and threads cannot be destroyed, stopped,
suspended, resumed, or interrupted. The static methods of Java’s Thread class,
when implemented, are mapped to module-level functions.
All of the methods described below are executed atomically.
This class represents an activity that is run in a separate thread of control.
There are two ways to specify the activity: by passing a callable object to the
constructor, or by overriding the run() method in a subclass. No other
methods (except for the constructor) should be overridden in a subclass. In
other words, only override the __init__() and run() methods of
this class.
Once a thread object is created, its activity must be started by calling the
thread’s start() method. This invokes the run() method in a
separate thread of control.
Once the thread’s activity is started, the thread is considered ‘alive’. It
stops being alive when its run() method terminates – either normally, or
by raising an unhandled exception. The is_alive() method tests whether the
thread is alive.
Other threads can call a thread’s join() method. This blocks the calling
thread until the thread whose join() method is called is terminated.
A thread has a name. The name can be passed to the constructor, and read or
changed through the name attribute.
A thread can be flagged as a “daemon thread”. The significance of this flag is
that the entire Python program exits when only daemon threads are left. The
initial value is inherited from the creating thread. The flag can be set
through the daemon property.
There is a “main thread” object; this corresponds to the initial thread of
control in the Python program. It is not a daemon thread.
There is the possibility that “dummy thread objects” are created. These are
thread objects corresponding to “alien threads”, which are threads of control
started outside the threading module, such as directly from C code. Dummy
thread objects have limited functionality; they are always considered alive and
daemonic, and cannot be join()ed. They are never deleted, since it is
impossible to detect the termination of alien threads.
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={})¶
This constructor should always be called with keyword arguments. Arguments
are:
group should be None; reserved for future extension when a
ThreadGroup class is implemented.
target is the callable object to be invoked by the run() method.
Defaults to None, meaning nothing is called.
name is the thread name. By default, a unique name is constructed of the
form “Thread-N” where N is a small decimal number.
args is the argument tuple for the target invocation. Defaults to ().
kwargs is a dictionary of keyword arguments for the target invocation.
Defaults to {}.
If the subclass overrides the constructor, it must make sure to invoke the
base class constructor (Thread.__init__()) before doing anything else to
the thread.
You may override this method in a subclass. The standard run()
method invokes the callable object passed to the object’s constructor as
the target argument, if any, with sequential and keyword arguments taken
from the args and kwargs arguments, respectively.
Wait until the thread terminates. This blocks the calling thread until the
thread whose join() method is called terminates – either normally
or through an unhandled exception – or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof). As join() always returns None, you must
call is_alive() after join() to decide whether a timeout
happened – if the thread is still alive, the join() call timed out.
When the timeout argument is not present or None, the operation will
block until the thread terminates.
join() raises a RuntimeError if an attempt is made to join
the current thread as that would cause a deadlock. It is also an error to
join() a thread before it has been started and attempts to do so
raises the same exception.
A string used for identification purposes only. It has no semantics.
Multiple threads may be given the same name. The initial name is set by
the constructor.
The ‘thread identifier’ of this thread or None if the thread has not
been started. This is a nonzero integer. See the
thread.get_ident() function. Thread identifiers may be recycled
when a thread exits and another thread is created. The identifier is
available even after the thread has exited.
This method returns True just before the run() method starts
until just after the run() method terminates. The module function
enumerate() returns a list of all alive threads.
A boolean value indicating whether this thread is a daemon thread (True)
or not (False). This must be set before start() is called,
otherwise RuntimeError is raised. Its initial value is inherited
from the creating thread; the main thread is not a daemon thread and
therefore all threads created in the main thread default to daemon
= False.
The entire Python program exits when no alive non-daemon threads are left.
A primitive lock is a synchronization primitive that is not owned by a
particular thread when locked. In Python, it is currently the lowest level
synchronization primitive available, implemented directly by the _thread
extension module.
A primitive lock is in one of two states, “locked” or “unlocked”. It is created
in the unlocked state. It has two basic methods, acquire() and
release(). When the state is unlocked, acquire() changes the state
to locked and returns immediately. When the state is locked, acquire()
blocks until a call to release() in another thread changes it to unlocked,
then the acquire() call resets it to locked and returns. The
release() method should only be called in the locked state; it changes the
state to unlocked and returns immediately. If an attempt is made to release an
unlocked lock, a RuntimeError will be raised.
When more than one thread is blocked in acquire() waiting for the state to
turn to unlocked, only one thread proceeds when a release() call resets
the state to unlocked; which one of the waiting threads proceeds is not defined,
and may vary across implementations.
When invoked without arguments, block until the lock is unlocked, then set it to
locked, and return true.
When invoked with the blocking argument set to true, do the same thing as when
called without arguments, and return true.
When invoked with the blocking argument set to false, do not block. If a call
without an argument would block, return false immediately; otherwise, do the
same thing as when called without arguments, and return true.
When invoked with the floating-point timeout argument set to a positive
value, block for at most the number of seconds specified by timeout
and as long as the lock cannot be acquired. A negative timeout argument
specifies an unbounded wait. It is forbidden to specify a timeout
when blocking is false.
The return value is True if the lock is acquired successfully,
False if not (for example if the timeout expired).
Changed in version 3.2:
Changed in version 3.2: The timeout parameter is new.
Changed in version 3.2:
Changed in version 3.2: Lock acquires can now be interrupted by signals on POSIX.
When the lock is locked, reset it to unlocked, and return. If any other threads
are blocked waiting for the lock to become unlocked, allow exactly one of them
to proceed.
Do not call this method when the lock is unlocked.
A reentrant lock is a synchronization primitive that may be acquired multiple
times by the same thread. Internally, it uses the concepts of “owning thread”
and “recursion level” in addition to the locked/unlocked state used by primitive
locks. In the locked state, some thread owns the lock; in the unlocked state,
no thread owns it.
To lock the lock, a thread calls its acquire() method; this returns once
the thread owns the lock. To unlock the lock, a thread calls its
release() method. acquire()/release() call pairs may be
nested; only the final release() (the release() of the outermost
pair) resets the lock to unlocked and allows another thread blocked in
acquire() to proceed.
When invoked without arguments: if this thread already owns the lock, increment
the recursion level by one, and return immediately. Otherwise, if another
thread owns the lock, block until the lock is unlocked. Once the lock is
unlocked (not owned by any thread), then grab ownership, set the recursion level
to one, and return. If more than one thread is blocked waiting until the lock
is unlocked, only one at a time will be able to grab ownership of the lock.
There is no return value in this case.
When invoked with the blocking argument set to true, do the same thing as when
called without arguments, and return true.
When invoked with the blocking argument set to false, do not block. If a call
without an argument would block, return false immediately; otherwise, do the
same thing as when called without arguments, and return true.
When invoked with the floating-point timeout argument set to a positive
value, block for at most the number of seconds specified by timeout
and as long as the lock cannot be acquired. Return true if the lock has
been acquired, false if the timeout has elapsed.
Changed in version 3.2:
Changed in version 3.2: The timeout parameter is new.
Release a lock, decrementing the recursion level. If after the decrement it is
zero, reset the lock to unlocked (not owned by any thread), and if any other
threads are blocked waiting for the lock to become unlocked, allow exactly one
of them to proceed. If after the decrement the recursion level is still
nonzero, the lock remains locked and owned by the calling thread.
Only call this method when the calling thread owns the lock. A
RuntimeError is raised if this method is called when the lock is
unlocked.
A condition variable is always associated with some kind of lock; this can be
passed in or one will be created by default. (Passing one in is useful when
several condition variables must share the same lock.)
A condition variable has acquire() and release() methods that call
the corresponding methods of the associated lock. It also has a wait()
method, and notify() and notify_all() methods. These three must only
be called when the calling thread has acquired the lock, otherwise a
RuntimeError is raised.
The wait() method releases the lock, and then blocks until it is awakened
by a notify() or notify_all() call for the same condition variable in
another thread. Once awakened, it re-acquires the lock and returns. It is also
possible to specify a timeout.
The notify() method wakes up one of the threads waiting for the condition
variable, if any are waiting. The notify_all() method wakes up all threads
waiting for the condition variable.
Note: the notify() and notify_all() methods don’t release the lock;
this means that the thread or threads awakened will not return from their
wait() call immediately, but only when the thread that called
notify() or notify_all() finally relinquishes ownership of the lock.
Tip: the typical programming style using condition variables uses the lock to
synchronize access to some shared state; threads that are interested in a
particular change of state call wait() repeatedly until they see the
desired state, while threads that modify the state call notify() or
notify_all() when they change the state in such a way that it could
possibly be a desired state for one of the waiters. For example, the following
code is a generic producer-consumer situation with unlimited buffer capacity:
# Consume one item
cv.acquire()
while not an_item_is_available():
cv.wait()
get_an_available_item()
cv.release()
# Produce one item
cv.acquire()
make_an_item_available()
cv.notify()
cv.release()
To choose between notify() and notify_all(), consider whether one
state change can be interesting for only one or several waiting threads. E.g.
in a typical producer-consumer situation, adding one item to the buffer only
needs to wake up one consumer thread.
Note: Condition variables can be, depending on the implementation, subject
to both spurious wakeups (when wait() returns without a notify()
call) and stolen wakeups (when another thread acquires the lock before the
awoken thread.) For this reason, it is always necessary to verify the state
the thread is waiting for when wait() returns and optionally repeat
the call as often as necessary.
If the lock argument is given and not None, it must be a Lock
or RLock object, and it is used as the underlying lock. Otherwise,
a new RLock object is created and used as the underlying lock.
Wait until notified or until a timeout occurs. If the calling thread has
not acquired the lock when this method is called, a RuntimeError is
raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same
condition variable in another thread, or until the optional timeout
occurs. Once awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using
its release() method, since this may not actually unlock the lock
when it was acquired multiple times recursively. Instead, an internal
interface of the RLock class is used, which really unlocks it
even when it has been recursively acquired several times. Another internal
interface is then used to restore the recursion level when the lock is
reacquired.
The return value is True unless a given timeout expired, in which
case it is False.
Changed in version 3.2:
Changed in version 3.2: Previously, the method always returned None.
Wait until a condition evaluates to True. predicate should be a
callable which result will be interpreted as a boolean value.
A timeout may be provided giving the maximum time to wait.
This utility method may call wait() repeatedly until the predicate
is satisfied, or until a timeout occurs. The return value is
the last return value of the predicate and will evaluate to
False if the method timed out.
Ignoring the timeout feature, calling this method is roughly equivalent to
writing:
while not predicate():
cv.wait()
Therefore, the same rules apply as with wait(): The lock must be
held when called and is re-aquired on return. The predicate is evaluated
with the lock held.
Using this method, the consumer example above can be written thus:
with cv:
cv.wait_for(an_item_is_available)
get_an_available_item()
Wake up a thread waiting on this condition, if any. If the calling thread
has not acquired the lock when this method is called, a
RuntimeError is raised.
This method wakes up one of the threads waiting for the condition
variable, if any are waiting; it is a no-op if no threads are waiting.
The current implementation wakes up exactly one thread, if any are
waiting. However, it’s not safe to rely on this behavior. A future,
optimized implementation may occasionally wake up more than one thread.
Note: the awakened thread does not actually return from its wait()
call until it can reacquire the lock. Since notify() does not
release the lock, its caller should.
Wake up all threads waiting on this condition. This method acts like
notify(), but wakes up all waiting threads instead of one. If the
calling thread has not acquired the lock when this method is called, a
RuntimeError is raised.
This is one of the oldest synchronization primitives in the history of computer
science, invented by the early Dutch computer scientist Edsger W. Dijkstra (he
used P() and V() instead of acquire() and release()).
A semaphore manages an internal counter which is decremented by each
acquire() call and incremented by each release() call. The counter
can never go below zero; when acquire() finds that it is zero, it blocks,
waiting until some other thread calls release().
When invoked without arguments: if the internal counter is larger than
zero on entry, decrement it by one and return immediately. If it is zero
on entry, block, waiting until some other thread has called
release() to make it larger than zero. This is done with proper
interlocking so that if multiple acquire() calls are blocked,
release() will wake exactly one of them up. The implementation may
pick one at random, so the order in which blocked threads are awakened
should not be relied on. Returns true (or blocks indefinitely).
When invoked with blocking set to false, do not block. If a call
without an argument would block, return false immediately; otherwise,
do the same thing as when called without arguments, and return true.
When invoked with a timeout other than None, it will block for at
most timeout seconds. If acquire does not complete successfully in
that interval, return false. Return true otherwise.
Changed in version 3.2:
Changed in version 3.2: The timeout parameter is new.
Release a semaphore, incrementing the internal counter by one. When it
was zero on entry and another thread is waiting for it to become larger
than zero again, wake up that thread.
Semaphores are often used to guard resources with limited capacity, for example,
a database server. In any situation where the size of the resource is fixed,
you should use a bounded semaphore. Before spawning any worker threads, your
main thread would initialize the semaphore:
Once spawned, worker threads call the semaphore’s acquire and release methods
when they need to connect to the server:
pool_sema.acquire()
conn = connectdb()
... use connection ...
conn.close()
pool_sema.release()
The use of a bounded semaphore reduces the chance that a programming error which
causes the semaphore to be released more than it’s acquired will go undetected.
This is one of the simplest mechanisms for communication between threads: one
thread signals an event and other threads wait for it.
An event object manages an internal flag that can be set to true with the
set() method and reset to false with the clear() method. The
wait() method blocks until the flag is true.
Set the internal flag to true. All threads waiting for it to become true
are awakened. Threads that call wait() once the flag is true will
not block at all.
Block until the internal flag is true. If the internal flag is true on
entry, return immediately. Otherwise, block until another thread calls
set() to set the flag to true, or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
This method returns the internal flag on exit, so it will always return
True except if a timeout is given and the operation times out.
Changed in version 3.1:
Changed in version 3.1: Previously, the method always returned None.
This class represents an action that should be run only after a certain amount
of time has passed — a timer. Timer is a subclass of Thread
and as such also functions as an example of creating custom threads.
Timers are started, as with threads, by calling their start() method. The
timer can be stopped (before its action has begun) by calling the cancel()
method. The interval the timer will wait before executing its action may not be
exactly the same as the interval specified by the user.
For example:
def hello():
print("hello, world")
t = Timer(30.0, hello)
t.start() # after 30 seconds, "hello, world" will be printed
class threading.Timer(interval, function, args=[], kwargs={})¶
Create a timer that will run function with arguments args and keyword
arguments kwargs, after interval seconds have passed.
This class provides a simple synchronization primitive for use by a fixed number
of threads that need to wait for each other. Each of the threads tries to pass
the barrier by calling the wait() method and will block until all of the
threads have made the call. At this points, the threads are released
simultanously.
The barrier can be reused any number of times for the same number of threads.
As an example, here is a simple way to synchronize a client and server thread:
b = Barrier(2, timeout=5)
def server():
start_server()
b.wait()
while True:
connection = accept_connection()
process_server_connection(connection)
def client():
b.wait()
while True:
connection = make_connection()
process_client_connection(connection)
class threading.Barrier(parties, action=None, timeout=None)¶
Create a barrier object for parties number of threads. An action, when
provided, is a callable to be called by one of the threads when they are
released. timeout is the default timeout value if none is specified for
the wait() method.
Pass the barrier. When all the threads party to the barrier have called
this function, they are all released simultaneously. If a timeout is
provided, is is used in preference to any that was supplied to the class
constructor.
The return value is an integer in the range 0 to parties – 1, different
for each thread. This can be used to select a thread to do some special
housekeeping, e.g.:
i = barrier.wait()
if i == 0:
# Only one thread needs to print this
print("passed the barrier")
If an action was provided to the constructor, one of the threads will
have called it prior to being released. Should this call raise an error,
the barrier is put into the broken state.
If the call times out, the barrier is put into the broken state.
This method may raise a BrokenBarrierError exception if the
barrier is broken or reset while a thread is waiting.
Return the barrier to the default, empty state. Any threads waiting on it
will receive the BrokenBarrierError exception.
Note that using this function may can require some external
synchronization if there are other threads whose state is unknown. If a
barrier is broken it may be better to just leave it and create a new one.
Put the barrier into a broken state. This causes any active or future
calls to wait() to fail with the BrokenBarrierError. Use
this for example if one of the needs to abort, to avoid deadlocking the
application.
It may be preferable to simply create the barrier with a sensible
timeout value to automatically guard against one of the threads going
awry.
This exception, a subclass of RuntimeError, is raised when the
Barrier object is reset or broken.
Using locks, conditions, and semaphores in the with statement¶
All of the objects provided by this module that have acquire() and
release() methods can be used as context managers for a with
statement. The acquire() method will be called when the block is entered,
and release() will be called when the block is exited.
While the import machinery is thread-safe, there are two key restrictions on
threaded imports due to inherent limitations in the way that thread-safety is
provided:
Firstly, other than in the main module, an import should not have the
side effect of spawning a new thread and then waiting for that thread in
any way. Failing to abide by this restriction can lead to a deadlock if
the spawned thread directly or indirectly attempts to import a module.
Secondly, all import attempts must be completed before the interpreter
starts shutting itself down. This can be most easily achieved by only
performing imports from non-daemon threads created through the threading
module. Daemon threads and threads created directly with the thread
module will require some other form of synchronization to ensure they do
not attempt imports after system shutdown has commenced. Failure to
abide by this restriction will lead to intermittent exceptions and
crashes during interpreter shutdown (as the late imports attempt to
access machinery which is no longer in a valid state).
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping the
Global Interpreter Lock by using subprocesses instead of threads. Due
to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix and
Windows.
Note
Some of this package’s functionality requires a functioning shared semaphore
implementation on the host operating system. Without one, the
multiprocessing.synchronize module will be disabled, and attempts to
import it will result in an ImportError. See
issue 3770 for additional information.
Note
Functionality within this package requires that the __main__ module be
importable by the children. This is covered in Programming guidelines
however it is worth pointing out here. This means that some examples, such
as the multiprocessing.Pool examples will not work in the
interactive interpreter. For example:
>>> from multiprocessing import Pool
>>> p = Pool(5)
>>> def f(x):
... return x*x
...
>>> p.map(f, [1,2,3])
Process PoolWorker-1:
Process PoolWorker-2:
Process PoolWorker-3:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
AttributeError: 'module' object has no attribute 'f'
AttributeError: 'module' object has no attribute 'f'
AttributeError: 'module' object has no attribute 'f'
(If you try this it will actually output three full tracebacks
interleaved in a semi-random fashion, and then you may have to
stop the master process somehow.)
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
To show the individual process IDs involved, here is an expanded example:
from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
def f(name):
info('function f')
print('hello', name)
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()
For an explanation of why (on Windows) the if__name__=='__main__' part is
necessary, see Programming guidelines.
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
Queues are thread and process safe, but note that they must never
be instantiated as a side effect of importing a module: this can lead
to a deadlock! (see Importing in threaded code)
Pipes
The Pipe() function returns a pair of connection objects connected by a
pipe which by default is duplex (two-way). For example:
from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print(parent_conn.recv()) # prints "[42, None, 'hello']"
p.join()
The two connection objects returned by Pipe() represent the two ends of
the pipe. Each connection object has send() and
recv() methods (among others). Note that data in a pipe
may become corrupted if two processes (or threads) try to read from or write
to the same end of the pipe at the same time. Of course there is no risk
of corruption from processes using different ends of the pipe at the same
time.
multiprocessing contains equivalents of all the synchronization
primitives from threading. For instance one can use a lock to ensure
that only one process prints to standard output at a time:
from multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print('hello world', i)
l.release()
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
Without using the lock output from the different processes is liable to get all
mixed up.
As mentioned above, when doing concurrent programming it is usually best to
avoid using shared state as far as possible. This is particularly true when
using multiple processes.
However, if you really do need to use some shared data then
multiprocessing provides a couple of ways of doing so.
Shared memory
Data can be stored in a shared memory map using Value or
Array. For example, the following code
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print(num.value)
print(arr[:])
will print
3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
The 'd' and 'i' arguments used when creating num and arr are
typecodes of the kind used by the array module: 'd' indicates a
double precision float and 'i' indicates a signed integer. These shared
objects will be process and thread-safe.
For more flexibility in using shared memory one can use the
multiprocessing.sharedctypes module which supports the creation of
arbitrary ctypes objects allocated from shared memory.
Server process
A manager object returned by Manager() controls a server process which
holds Python objects and allows other processes to manipulate them using
proxies.
Server process managers are more flexible than using shared memory objects
because they can be made to support arbitrary object types. Also, a single
manager can be shared by processes on different computers over a network.
They are, however, slower than using shared memory.
The Pool class represents a pool of worker
processes. It has methods which allows tasks to be offloaded to the worker
processes in a few different ways.
For example:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow
print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]"
class multiprocessing.Process([group[, target[, name[, args[, kwargs]]]]])¶
Process objects represent activity that is run in a separate process. The
Process class has equivalents of all the methods of
threading.Thread.
The constructor should always be called with keyword arguments. group
should always be None; it exists solely for compatibility with
threading.Thread. target is the callable object to be invoked by
the run() method. It defaults to None, meaning nothing is
called. name is the process name. By default, a unique name is constructed
of the form ‘Process-N1:N2:...:Nk‘ where N1,N2,...,Nk is a sequence of integers whose length
is determined by the generation of the process. args is the argument
tuple for the target invocation. kwargs is a dictionary of keyword
arguments for the target invocation. By default, no arguments are passed to
target.
If a subclass overrides the constructor, it must make sure it invokes the
base class constructor (Process.__init__()) before doing anything else
to the process.
You may override this method in a subclass. The standard run()
method invokes the callable object passed to the object’s constructor as
the target argument, if any, with sequential and keyword arguments taken
from the args and kwargs arguments, respectively.
The name is a string used for identification purposes only. It has no
semantics. Multiple processes may be given the same name. The initial
name is set by the constructor.
The process’s daemon flag, a Boolean value. This must be set before
start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child
processes.
Note that a daemonic process is not allowed to create child processes.
Otherwise a daemonic process would leave its children orphaned if it gets
terminated when its parent process exits. Additionally, these are not
Unix daemons or services, they are normal processes that will be
terminated (and not joined) if non-daemonic processes have exited.
In addition to the Threading.Thread API, Process objects
also support the following attributes and methods:
The child’s exit code. This will be None if the process has not yet
terminated. A negative value -N indicates that the child was terminated
by signal N.
When multiprocessing is initialized the main process is assigned a
random string using os.random().
When a Process object is created, it will inherit the
authentication key of its parent process, although this may be changed by
setting authkey to another byte string.
Terminate the process. On Unix this is done using the SIGTERM signal;
on Windows TerminateProcess() is used. Note that exit handlers and
finally clauses, etc., will not be executed.
Note that descendant processes of the process will not be terminated –
they will simply become orphaned.
Warning
If this method is used when the associated process is using a pipe or
queue then the pipe or queue is liable to become corrupted and may
become unusable by other process. Similarly, if the process has
acquired a lock or semaphore etc. then terminating it is liable to
cause other processes to deadlock.
Note that the start(), join(), is_alive(),
terminate() and exit_code methods should only be called by
the process that created the process object.
When using multiple processes, one generally uses message passing for
communication between processes and avoids having to use any synchronization
primitives like locks.
For passing messages one can use Pipe() (for a connection between two
processes) or a queue (which allows multiple producers and consumers).
The Queue and JoinableQueue types are multi-producer,
multi-consumer FIFO queues modelled on the queue.Queue class in the
standard library. They differ in that Queue lacks the
task_done() and join() methods introduced
into Python 2.5’s queue.Queue class.
If you use JoinableQueue then you must call
JoinableQueue.task_done() for each task removed from the queue or else the
semaphore used to count the number of unfinished tasks may eventually overflow
raising an exception.
Note that one can also create a shared queue by using a manager object – see
Managers.
If a process is killed using Process.terminate() or os.kill()
while it is trying to use a Queue, then the data in the queue is
likely to become corrupted. This may cause any other processes to get an
exception when it tries to use the queue later on.
Warning
As mentioned above, if a child process has put items on a queue (and it has
not used JoinableQueue.cancel_join_thread()), then that process will
not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless
you are sure that all items which have been put on the queue have been
consumed. Similarly, if the child process is non-daemonic then the parent
process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See
Programming guidelines.
For an example of the usage of queues for interprocess communication see
Examples.
Returns a pair (conn1,conn2) of Connection objects representing
the ends of a pipe.
If duplex is True (the default) then the pipe is bidirectional. If
duplex is False then the pipe is unidirectional: conn1 can only be
used for receiving messages and conn2 can only be used for sending
messages.
Returns a process shared queue implemented using a pipe and a few
locks/semaphores. When a process first puts an item on the queue a feeder
thread is started which transfers objects from a buffer into the pipe.
The usual queue.Empty and queue.Full exceptions from the
standard library’s Queue module are raised to signal timeouts.
Put item into the queue. If the optional argument block is True
(the default) and timeout is None (the default), block if necessary until
a free slot is available. If timeout is a positive number, it blocks at
most timeout seconds and raises the queue.Full exception if no
free slot was available within that time. Otherwise (block is
False), put an item on the queue if a free slot is immediately
available, else raise the queue.Full exception (timeout is
ignored in that case).
Remove and return an item from the queue. If optional args block is
True (the default) and timeout is None (the default), block if
necessary until an item is available. If timeout is a positive number,
it blocks at most timeout seconds and raises the queue.Empty
exception if no item was available within that time. Otherwise (block is
False), return an item if one is immediately available, else raise the
queue.Empty exception (timeout is ignored in that case).
Indicate that no more data will be put on this queue by the current
process. The background thread will quit once it has flushed all buffered
data to the pipe. This is called automatically when the queue is garbage
collected.
Join the background thread. This can only be used after close() has
been called. It blocks until the background thread exits, ensuring that
all data in the buffer has been flushed to the pipe.
By default if a process is not the creator of the queue then on exit it
will attempt to join the queue’s background thread. The process can call
cancel_join_thread() to make join_thread() do nothing.
Prevent join_thread() from blocking. In particular, this prevents
the background thread from being joined automatically when the process
exits – see join_thread().
Indicate that a formerly enqueued task is complete. Used by queue consumer
threads. For each get() used to fetch a task, a subsequent
call to task_done() tells the queue that the processing on the task
is complete.
If a join() is currently blocking, it will resume when all
items have been processed (meaning that a task_done() call was
received for every item that had been put() into the queue).
Raises a ValueError if called more times than there were items
placed in the queue.
Block until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on
it is complete. When the count of unfinished tasks drops to zero,
join() unblocks.
Add support for when a program which uses multiprocessing has been
frozen to produce a Windows executable. (Has been tested with py2exe,
PyInstaller and cx_Freeze.)
One needs to call this function straight after the if__name__=='__main__' line of the main module. For example:
from multiprocessing import Process, freeze_support
def f():
print('hello world!')
if __name__ == '__main__':
freeze_support()
Process(target=f).start()
If the freeze_support() line is omitted then trying to run the frozen
executable will raise RuntimeError.
If the module is being run normally by the Python interpreter then
freeze_support() has no effect.
Sets the path of the Python interpreter to use when starting a child process.
(By default sys.executable is used). Embedders will probably need to
do some thing like
Return an object sent from the other end of the connection using
send(). Raises EOFError if there is nothing left to receive
and the other end was closed.
Return whether there is any data available to be read.
If timeout is not specified then it will return immediately. If
timeout is a number then this specifies the maximum time in seconds to
block. If timeout is None then an infinite timeout is used.
Send byte data from an object supporting the buffer interface as a
complete message.
If offset is given then data is read from that position in buffer. If
size is given then that many bytes will be read from buffer. Very large
buffers (approximately 32 MB+, though it depends on the OS) may raise a
ValueError exception
Return a complete message of byte data sent from the other end of the
connection as a string. Raises EOFError if there is nothing left
to receive and the other end has closed.
If maxlength is specified and the message is longer than maxlength
then IOError is raised and the connection will no longer be
readable.
Read into buffer a complete message of byte data sent from the other end
of the connection and return the number of bytes in the message. Raises
EOFError if there is nothing left to receive and the other end was
closed.
buffer must be an object satisfying the writable buffer interface. If
offset is given then the message will be written into the buffer from
that position. Offset must be a non-negative integer less than the
length of buffer (in bytes).
If the buffer is too short then a BufferTooShort exception is
raised and the complete message is available as e.args[0] where e
is the exception instance.
The Connection.recv() method automatically unpickles the data it
receives, which can be a security risk unless you can trust the process
which sent the message.
Therefore, unless the connection object was produced using Pipe() you
should only use the recv() and send()
methods after performing some sort of authentication. See
Authentication keys.
Warning
If a process is killed while it is trying to read or write to a pipe then
the data in the pipe is likely to become corrupted, because it may become
impossible to be sure where the message boundaries lie.
Generally synchronization primitives are not as necessary in a multiprocess
program as they are in a multithreaded program. See the documentation for
threading module.
Note that one can also create synchronization primitives by using a manager
object – see Managers.
A clone of threading.Event.
This method returns the state of the internal semaphore on exit, so it
will always return True except if a timeout is given and the operation
times out.
Changed in version 3.1:
Changed in version 3.1: Previously, the method always returned None.
The acquire() method of BoundedSemaphore, Lock,
RLock and Semaphore has a timeout parameter not supported
by the equivalents in threading. The signature is
acquire(block=True,timeout=None) with keyword parameters being
acceptable. If block is True and timeout is not None then it
specifies a timeout in seconds. If block is False then timeout is
ignored.
On Mac OS X, sem_timedwait is unsupported, so calling acquire() with
a timeout will emulate that function’s behavior using a sleeping loop.
Note
If the SIGINT signal generated by Ctrl-C arrives while the main thread is
blocked by a call to BoundedSemaphore.acquire(), Lock.acquire(),
RLock.acquire(), Semaphore.acquire(), Condition.acquire()
or Condition.wait() then the call will be immediately interrupted and
KeyboardInterrupt will be raised.
This differs from the behaviour of threading where SIGINT will be
ignored while the equivalent blocking calls are in progress.
Return a ctypes object allocated from shared memory. By default the
return value is actually a synchronized wrapper for the object.
typecode_or_type determines the type of the returned object: it is either a
ctypes type or a one character typecode of the kind used by the array
module. *args is passed on to the constructor for the type.
If lock is True (the default) then a new lock object is created to
synchronize access to the value. If lock is a Lock or
RLock object then that will be used to synchronize access to the
value. If lock is False then access to the returned object will not be
automatically protected by a lock, so it will not necessarily be
“process-safe”.
Return a ctypes array allocated from shared memory. By default the return
value is actually a synchronized wrapper for the array.
typecode_or_type determines the type of the elements of the returned array:
it is either a ctypes type or a one character typecode of the kind used by
the array module. If size_or_initializer is an integer, then it
determines the length of the array, and the array will be initially zeroed.
Otherwise, size_or_initializer is a sequence which is used to initialize
the array and whose length determines the length of the array.
If lock is True (the default) then a new lock object is created to
synchronize access to the value. If lock is a Lock or
RLock object then that will be used to synchronize access to the
value. If lock is False then access to the returned object will not be
automatically protected by a lock, so it will not necessarily be
“process-safe”.
Note that lock is a keyword only argument.
Note that an array of ctypes.c_char has value and raw
attributes which allow one to use it to store and retrieve strings.
The multiprocessing.sharedctypes module provides functions for allocating
ctypes objects from shared memory which can be inherited by child
processes.
Note
Although it is possible to store a pointer in shared memory remember that
this will refer to a location in the address space of a specific process.
However, the pointer is quite likely to be invalid in the context of a second
process and trying to dereference the pointer from the second process may
cause a crash.
Return a ctypes array allocated from shared memory.
typecode_or_type determines the type of the elements of the returned array:
it is either a ctypes type or a one character typecode of the kind used by
the array module. If size_or_initializer is an integer then it
determines the length of the array, and the array will be initially zeroed.
Otherwise size_or_initializer is a sequence which is used to initialize the
array and whose length determines the length of the array.
Note that setting and getting an element is potentially non-atomic – use
Array() instead to make sure that access is automatically synchronized
using a lock.
Return a ctypes object allocated from shared memory.
typecode_or_type determines the type of the returned object: it is either a
ctypes type or a one character typecode of the kind used by the array
module. *args is passed on to the constructor for the type.
Note that setting and getting the value is potentially non-atomic – use
Value() instead to make sure that access is automatically synchronized
using a lock.
Note that an array of ctypes.c_char has value and raw
attributes which allow one to use it to store and retrieve strings – see
documentation for ctypes.
The same as RawArray() except that depending on the value of lock a
process-safe synchronization wrapper may be returned instead of a raw ctypes
array.
If lock is True (the default) then a new lock object is created to
synchronize access to the value. If lock is a Lock or
RLock object then that will be used to synchronize access to the
value. If lock is False then access to the returned object will not be
automatically protected by a lock, so it will not necessarily be
“process-safe”.
The same as RawValue() except that depending on the value of lock a
process-safe synchronization wrapper may be returned instead of a raw ctypes
object.
If lock is True (the default) then a new lock object is created to
synchronize access to the value. If lock is a Lock or
RLock object then that will be used to synchronize access to the
value. If lock is False then access to the returned object will not be
automatically protected by a lock, so it will not necessarily be
“process-safe”.
Return a process-safe wrapper object for a ctypes object which uses lock to
synchronize access. If lock is None (the default) then a
multiprocessing.RLock object is created automatically.
A synchronized wrapper will have two methods in addition to those of the
object it wraps: get_obj() returns the wrapped object and
get_lock() returns the lock object used for synchronization.
Note that accessing the ctypes object through the wrapper can be a lot slower
than accessing the raw ctypes object.
The table below compares the syntax for creating shared ctypes objects from
shared memory with the normal ctypes syntax. (In the table MyStruct is some
subclass of ctypes.Structure.)
ctypes
sharedctypes using type
sharedctypes using typecode
c_double(2.4)
RawValue(c_double, 2.4)
RawValue(‘d’, 2.4)
MyStruct(4, 6)
RawValue(MyStruct, 4, 6)
(c_short * 7)()
RawArray(c_short, 7)
RawArray(‘h’, 7)
(c_int * 3)(9, 2, 8)
RawArray(c_int, (9, 2, 8))
RawArray(‘i’, (9, 2, 8))
Below is an example where a number of ctypes objects are modified by a child
process:
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Value, Array
from ctypes import Structure, c_double
class Point(Structure):
_fields_ = [('x', c_double), ('y', c_double)]
def modify(n, x, s, A):
n.value **= 2
x.value **= 2
s.value = s.value.upper()
for a in A:
a.x **= 2
a.y **= 2
if __name__ == '__main__':
lock = Lock()
n = Value('i', 7)
x = Value(c_double, 1.0/3.0, lock=False)
s = Array('c', 'hello world', lock=lock)
A = Array(Point, [(1.875,-6.25), (-5.75,2.0), (2.375,9.5)], lock=lock)
p = Process(target=modify, args=(n, x, s, A))
p.start()
p.join()
print(n.value)
print(x.value)
print(s.value)
print([(a.x, a.y) for a in A])
The results printed are
49
0.1111111111111111
HELLO WORLD
[(3.515625, 39.0625), (33.0625, 4.0), (5.640625, 90.25)]
Managers provide a way to create data which can be shared between different
processes. A manager object controls a server process which manages shared
objects. Other processes can access the shared objects by using proxies.
Returns a started SyncManager object which
can be used for sharing objects between processes. The returned manager
object corresponds to a spawned child process and has methods which will
create shared objects and return corresponding proxies.
Manager processes will be shutdown as soon as they are garbage collected or
their parent process exits. The manager classes are defined in the
multiprocessing.managers module:
class multiprocessing.managers.BaseManager([address[, authkey]])¶
Create a BaseManager object.
Once created one should call start() or get_server().serve_forever() to ensure
that the manager object refers to a started manager process.
address is the address on which the manager process listens for new
connections. If address is None then an arbitrary one is chosen.
authkey is the authentication key which will be used to check the validity
of incoming connections to the server process. If authkey is None then
current_process().authkey. Otherwise authkey is used and it
must be a string.
A classmethod which can be used for registering a type or callable with
the manager class.
typeid is a “type identifier” which is used to identify a particular
type of shared object. This must be a string.
callable is a callable used for creating objects for this type
identifier. If a manager instance will be created using the
from_address() classmethod or if the create_method argument is
False then this can be left as None.
proxytype is a subclass of BaseProxy which is used to create
proxies for shared objects with this typeid. If None then a proxy
class is created automatically.
exposed is used to specify a sequence of method names which proxies for
this typeid should be allowed to access using
BaseProxy._callMethod(). (If exposed is None then
proxytype._exposed_ is used instead if it exists.) In the case
where no exposed list is specified, all “public methods” of the shared
object will be accessible. (Here a “public method” means any attribute
which has a __call__() method and whose name does not begin with
'_'.)
method_to_typeid is a mapping used to specify the return type of those
exposed methods which should return a proxy. It maps method names to
typeid strings. (If method_to_typeid is None then
proxytype._method_to_typeid_ is used instead if it exists.) If a
method’s name is not a key of this mapping or if the mapping is None
then the object returned by the method will be copied by value.
create_method determines whether a method should be created with name
typeid which can be used to tell the server process to create a new
shared object and return a proxy for it. By default it is True.
BaseManager instances also have one read-only property:
Create a shared list object and return a proxy for it.
Note
Modifications to mutable values or items in dict and list proxies will not
be propagated through the manager, because the proxy has no way of knowing
when its values or items are modified. To modify such an item, you can
re-assign the modified object to the container proxy:
# create a list proxy and append a mutable object (a dictionary)lproxy=manager.list()lproxy.append({})# now mutate the dictionaryd=lproxy[0]d['a']=1d['b']=2# at this point, the changes to d are not yet synced, but by# reassigning the dictionary, the proxy is notified of the changelproxy[0]=d
A namespace object has no public methods, but does have writable attributes.
Its representation shows the values of its attributes.
However, when using a proxy for a namespace object, an attribute beginning with
'_' will be an attribute of the proxy and not an attribute of the referent:
>>> manager=multiprocessing.Manager()>>> Global=manager.Namespace()>>> Global.x=10>>> Global.y='hello'>>> Global._z=12.3# this is an attribute of the proxy>>> print(Global)Namespace(x=10, y='hello')
To create one’s own manager, one creates a subclass of BaseManager and
use the register() classmethod to register new types or
callables with the manager class. For example:
frommultiprocessing.managersimportBaseManagerclassMathsClass:defadd(self,x,y):returnx+ydefmul(self,x,y):returnx*yclassMyManager(BaseManager):passMyManager.register('Maths',MathsClass)if__name__=='__main__':manager=MyManager()manager.start()maths=manager.Maths()print(maths.add(4,3))# prints 7print(maths.mul(7,8))# prints 56
A proxy is an object which refers to a shared object which lives (presumably)
in a different process. The shared object is said to be the referent of the
proxy. Multiple proxy objects may have the same referent.
A proxy object has methods which invoke corresponding methods of its referent
(although not every method of the referent will necessarily be available through
the proxy). A proxy can usually be used in most of the same ways that its
referent can:
Notice that applying str() to a proxy will return the representation of
the referent, whereas applying repr() will return the representation of
the proxy.
An important feature of proxy objects is that they are picklable so they can be
passed between processes. Note, however, that if a proxy is sent to the
corresponding manager’s process then unpickling it will produce the referent
itself. This means, for example, that one shared object can contain a second:
>>> a=manager.list()>>> b=manager.list()>>> a.append(b)# referent of a now contains referent of b>>> print(a,b)[[]] []>>> b.append('hello')>>> print(a,b)[['hello']] ['hello']
Note
The proxy types in multiprocessing do nothing to support comparisons
by value. So, for instance, we have:
>>> manager.list([1,2,3])==[1,2,3]False
One should just use a copy of the referent instead when making comparisons.
Call and return the result of a method of the proxy’s referent.
If proxy is a proxy whose referent is obj then the expression
proxy._callmethod(methodname,args,kwds)
will evaluate the expression
getattr(obj,methodname)(*args,**kwds)
in the manager’s process.
The returned value will be a copy of the result of the call or a proxy to
a new shared object – see documentation for the method_to_typeid
argument of BaseManager.register().
If an exception is raised by the call, then then is re-raised by
_callmethod(). If some other exception is raised in the manager’s
process then this is converted into a RemoteError exception and is
raised by _callmethod().
Note in particular that an exception will be raised if methodname has
not been exposed
>>> l=manager.list(range(10))>>> l._callmethod('__len__')10>>> l._callmethod('__getslice__',(2,7))# equiv to `l[2:7]`[2, 3, 4, 5, 6]>>> l._callmethod('__getitem__',(20,))# equiv to `l[20]`Traceback (most recent call last):...IndexError: list index out of range
One can create a pool of processes which will carry out tasks submitted to it
with the Pool class.
class multiprocessing.Pool([processes[, initializer[, initargs[, maxtasksperchild]]]])¶
A process pool object which controls a pool of worker processes to which jobs
can be submitted. It supports asynchronous results with timeouts and
callbacks and has a parallel map implementation.
processes is the number of worker processes to use. If processes is
None then the number returned by cpu_count() is used. If
initializer is not None then each worker process will call
initializer(*initargs) when it starts.
New in version 3.2:
New in version 3.2: maxtasksperchild is the number of tasks a worker process can complete
before it will exit and be replaced with a fresh worker process, to enable
unused resources to be freed. The default maxtasksperchild is None, which
means worker processes will live as long as the pool.
Note
Worker processes within a Pool typically live for the complete
duration of the Pool’s work queue. A frequent pattern found in other
systems (such as Apache, mod_wsgi, etc) to free resources held by
workers is to allow a worker within a pool to complete only a set
amount of work before being exiting, being cleaned up and a new
process spawned to replace the old one. The maxtasksperchild
argument to the Pool exposes this ability to the end user.
Call func with arguments args and keyword arguments kwds. It blocks
till the result is ready. Given this blocks, apply_async() is better
suited for performing work in parallel. Additionally, the passed in
function is only executed in one of the workers of the pool.
A variant of the apply() method which returns a result object.
If callback is specified then it should be a callable which accepts a
single argument. When the result becomes ready callback is applied to
it, that is unless the call failed, in which case the error_callback
is applied instead
If error_callback is specified then it should be a callable which
accepts a single argument. If the target function fails, then
the error_callback is called with the exception instance.
Callbacks should complete immediately since otherwise the thread which
handles the results will get blocked.
A parallel equivalent of the map() built-in function (it supports only
one iterable argument though). It blocks till the result is ready.
This method chops the iterable into a number of chunks which it submits to
the process pool as separate tasks. The (approximate) size of these
chunks can be specified by setting chunksize to a positive integer.
A variant of the map() method which returns a result object.
If callback is specified then it should be a callable which accepts a
single argument. When the result becomes ready callback is applied to
it, that is unless the call failed, in which case the error_callback
is applied instead
If error_callback is specified then it should be a callable which
accepts a single argument. If the target function fails, then
the error_callback is called with the exception instance.
Callbacks should complete immediately since otherwise the thread which
handles the results will get blocked.
The chunksize argument is the same as the one used by the map()
method. For very long iterables using a large value for chunksize can
make make the job complete much faster than using the default value of
1.
Also if chunksize is 1 then the next() method of the iterator
returned by the imap() method has an optional timeout parameter:
next(timeout) will raise multiprocessing.TimeoutError if the
result cannot be returned within timeout seconds.
The same as imap() except that the ordering of the results from the
returned iterator should be considered arbitrary. (Only when there is
only one worker process is the order guaranteed to be “correct”.)
Stops the worker processes immediately without completing outstanding
work. When the pool object is garbage collected terminate() will be
called immediately.
Return the result when it arrives. If timeout is not None and the
result does not arrive within timeout seconds then
multiprocessing.TimeoutError is raised. If the remote call raised
an exception then that exception will be reraised by get().
Return whether the call completed without raising an exception. Will
raise AssertionError if the result is not ready.
The following example demonstrates the use of a pool:
frommultiprocessingimportPooldeff(x):returnx*xif__name__=='__main__':pool=Pool(processes=4)# start 4 worker processesresult=pool.apply_async(f,(10,))# evaluate "f(10)" asynchronouslyprint(result.get(timeout=1))# prints "100" unless your computer is *very* slowprint(pool.map(f,range(10)))# prints "[0, 1, 4,..., 81]"it=pool.imap(f,range(10))print(next(it))# prints "0"print(next(it))# prints "1"print(it.next(timeout=1))# prints "4" unless your computer is *very* slowimporttimeresult=pool.apply_async(time.sleep,(10,))print(result.get(timeout=1))# raises TimeoutError
Usually message passing between processes is done using queues or by using
Connection objects returned by Pipe().
However, the multiprocessing.connection module allows some extra
flexibility. It basically gives a high level message oriented API for dealing
with sockets or Windows named pipes, and also has support for digest
authentication using the hmac module.
Send a randomly generated message to the other end of the connection and wait
for a reply.
If the reply matches the digest of the message using authkey as the key
then a welcome message is sent to the other end of the connection. Otherwise
AuthenticationError is raised.
Attempt to set up a connection to the listener which is using address
address, returning a Connection.
The type of the connection is determined by family argument, but this can
generally be omitted since it can usually be inferred from the format of
address. (See Address Formats)
If authenticate is True or authkey is a string then digest
authentication is used. The key used for authentication will be either
authkey or current_process().authkey) if authkey is None.
If authentication fails then AuthenticationError is raised. See
Authentication keys.
class multiprocessing.connection.Listener([address[, family[, backlog[, authenticate[, authkey]]]]])¶
A wrapper for a bound socket or Windows named pipe which is ‘listening’ for
connections.
address is the address to be used by the bound socket or named pipe of the
listener object.
Note
If an address of ‘0.0.0.0’ is used, the address will not be a connectable
end point on Windows. If you require a connectable end-point,
you should use ‘127.0.0.1’.
family is the type of socket (or named pipe) to use. This can be one of
the strings 'AF_INET' (for a TCP socket), 'AF_UNIX' (for a Unix
domain socket) or 'AF_PIPE' (for a Windows named pipe). Of these only
the first is guaranteed to be available. If family is None then the
family is inferred from the format of address. If address is also
None then a default is chosen. This default is the family which is
assumed to be the fastest available. See
Address Formats. Note that if family is
'AF_UNIX' and address is None then the socket will be created in a
private temporary directory created using tempfile.mkstemp().
If the listener object uses a socket then backlog (1 by default) is passed
to the listen() method of the socket once it has been bound.
If authenticate is True (False by default) or authkey is not
None then digest authentication is used.
If authkey is a string then it will be used as the authentication key;
otherwise it must be None.
If authkey is None and authenticate is True then
current_process().authkey is used as the authentication key. If
authkey is None and authenticate is False then no
authentication is done. If authentication fails then
AuthenticationError is raised. See Authentication keys.
Accept a connection on the bound socket or named pipe of the listener
object and return a Connection object. If authentication is
attempted and fails, then AuthenticationError is raised.
Close the bound socket or named pipe of the listener object. This is
called automatically when the listener is garbage collected. However it
is advisable to call it explicitly.
Listener objects have the following read-only properties:
Exception raised when there is an authentication error.
Examples
The following server code creates a listener which uses 'secretpassword' as
an authentication key. It then waits for a connection and sends some data to
the client:
frommultiprocessing.connectionimportListenerfromarrayimportarrayaddress=('localhost',6000)# family is deduced to be 'AF_INET'listener=Listener(address,authkey=b'secret password')conn=listener.accept()print('connection accepted from',listener.last_accepted)conn.send([2.25,None,'junk',float])conn.send_bytes(b'hello')conn.send_bytes(array('i',[42,1729]))conn.close()listener.close()
The following code connects to the server and receives some data from the
server:
An 'AF_INET' address is a tuple of the form (hostname,port) where
hostname is a string and port is an integer.
An 'AF_UNIX' address is a string representing a filename on the
filesystem.
An 'AF_PIPE' address is a string of the form
r'\\.\pipe\PipeName'. To use Client() to connect to a named
pipe on a remote computer called ServerName one should use an address of the
form r'\\ServerName\pipe\PipeName' instead.
Note that any string beginning with two backslashes is assumed by default to be
an 'AF_PIPE' address rather than an 'AF_UNIX' address.
When one uses Connection.recv(), the data received is automatically
unpickled. Unfortunately unpickling data from an untrusted source is a security
risk. Therefore Listener and Client() use the hmac module
to provide digest authentication.
An authentication key is a string which can be thought of as a password: once a
connection is established both ends will demand proof that the other knows the
authentication key. (Demonstrating that both ends are using the same key does
not involve sending the key over the connection.)
If authentication is requested but do authentication key is specified then the
return value of current_process().authkey is used (see
Process). This value will automatically inherited by
any Process object that the current process creates.
This means that (by default) all processes of a multi-process program will share
a single authentication key which can be used when setting up connections
between themselves.
Suitable authentication keys can also be generated by using os.urandom().
Some support for logging is available. Note, however, that the logging
package does not use process shared locks so it is possible (depending on the
handler type) for messages from different processes to get mixed up.
Returns the logger used by multiprocessing. If necessary, a new one
will be created.
When first created the logger has level logging.NOTSET and no
default handler. Messages sent to this logger will not by default propagate
to the root logger.
Note that on Windows child processes will only inherit the level of the
parent process’s logger – any other customization of the logger will not be
inherited.
This function performs a call to get_logger() but in addition to
returning the logger created by get_logger, it adds a handler which sends
output to sys.stderr using format
'[%(levelname)s/%(processName)s]%(message)s'.
Below is an example session with logging turned on:
>>> importmultiprocessing,logging>>> logger=multiprocessing.log_to_stderr()>>> logger.setLevel(logging.INFO)>>> logger.warning('doomed')[WARNING/MainProcess] doomed>>> m=multiprocessing.Manager()[INFO/SyncManager-...] child process calling self.run()[INFO/SyncManager-...] created temp directory /.../pymp-...[INFO/SyncManager-...] manager serving at '/.../listener-...'>>> delm[INFO/MainProcess] sending shutdown message to manager[INFO/SyncManager-...] manager exiting with exitcode 0
In addition to having these two logging functions, the multiprocessing also
exposes two additional logging level attributes. These are SUBWARNING
and SUBDEBUG. The table below illustrates where theses fit in the
normal level hierarchy.
Level
Numeric value
SUBWARNING
25
SUBDEBUG
5
For a full table of logging levels, see the logging module.
These additional logging levels are used primarily for certain debug messages
within the multiprocessing module. Below is the same example as above, except
with SUBDEBUG enabled:
>>> importmultiprocessing,logging>>> logger=multiprocessing.log_to_stderr()>>> logger.setLevel(multiprocessing.SUBDEBUG)>>> logger.warning('doomed')[WARNING/MainProcess] doomed>>> m=multiprocessing.Manager()[INFO/SyncManager-...] child process calling self.run()[INFO/SyncManager-...] created temp directory /.../pymp-...[INFO/SyncManager-...] manager serving at '/.../pymp-djGBXN/listener-...'>>> delm[SUBDEBUG/MainProcess] finalizer calling ...[INFO/MainProcess] sending shutdown message to manager[DEBUG/SyncManager-...] manager received shutdown message[SUBDEBUG/SyncManager-...] calling <Finalize object, callback=unlink, ...[SUBDEBUG/SyncManager-...] finalizer calling <built-in function unlink> ...[SUBDEBUG/SyncManager-...] calling <Finalize object, dead>[SUBDEBUG/SyncManager-...] finalizer calling <function rmtree at 0x5aa730> ...[INFO/SyncManager-...] manager exiting with exitcode 0
As far as possible one should try to avoid shifting large amounts of data
between processes.
It is probably best to stick to using queues or pipes for communication
between processes rather than using the lower level synchronization
primitives from the threading module.
Picklability
Ensure that the arguments to the methods of proxies are picklable.
Thread safety of proxies
Do not use a proxy object from more than one thread unless you protect it
with a lock.
(There is never a problem with different processes using the same proxy.)
Joining zombie processes
On Unix when a process finishes but has not been joined it becomes a zombie.
There should never be very many because each time a new process starts (or
active_children() is called) all completed processes which have not
yet been joined will be joined. Also calling a finished process’s
Process.is_alive() will join the process. Even so it is probably good
practice to explicitly join all the processes that you start.
Better to inherit than pickle/unpickle
On Windows many types from multiprocessing need to be picklable so
that child processes can use them. However, one should generally avoid
sending shared objects to other processes using pipes or queues. Instead
you should arrange the program so that a process which need access to a
shared resource created elsewhere can inherit it from an ancestor process.
Avoid terminating processes
Using the Process.terminate() method to stop a process is liable to
cause any shared resources (such as locks, semaphores, pipes and queues)
currently being used by the process to become broken or unavailable to other
processes.
Therefore it is probably best to only consider using
Process.terminate() on processes which never use any shared resources.
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread() method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be automatically be joined.
An example which will deadlock is the following:
frommultiprocessingimportProcess,Queuedeff(q):q.put('X'*1000000)if__name__=='__main__':queue=Queue()p=Process(target=f,args=(queue,))p.start()p.join()# this deadlocksobj=queue.get()
A fix here would be to swap the last two lines round (or simply remove the
p.join() line).
Explicitly pass resources to child processes
On Unix a child process can make use of a shared resource created in a
parent process using a global resource. However, it is better to pass the
object as an argument to the constructor for the child process.
Apart from making the code (potentially) compatible with Windows this also
ensures that as long as the child process is still alive the object will not
be garbage collected in the parent process. This might be important if some
resource is freed when the object is garbage collected in the parent
process.
in the multiprocessing.Process._bootstrap() method — this resulted
in issues with processes-in-processes. This has been changed to:
sys.stdin.close()sys.stdin=open(os.devnull)
Which solves the fundamental issue of processes colliding with each other
resulting in a bad file descriptor error, but introduces a potential danger
to applications which replace sys.stdin() with a “file-like object”
with output buffering. This danger is that if multiple processes call
close() on this file-like object, it could result in the same
data being flushed to the object multiple times, resulting in corruption.
If you write a file-like object and implement your own caching, you can
make it fork-safe by storing the pid whenever you append to the cache,
and discarding the cache when the pid changes. For example:
Since Windows lacks os.fork() it has a few extra restrictions:
More picklability
Ensure that all arguments to Process.__init__() are picklable. This
means, in particular, that bound or unbound methods cannot be used directly
as the target argument on Windows — just define a function and use
that instead.
Also, if you subclass Process then make sure that instances will be
picklable when the Process.start() method is called.
Global variables
Bear in mind that if code run in a child process tries to access a global
variable, then the value it sees (if any) may not be the same as the value
in the parent process at the time that Process.start() was called.
However, global variables which are just module level constants cause no
problems.
Safe importing of main module
Make sure that the main module can be safely imported by a new Python
interpreter without causing unintended side effects (such a starting a new
process).
For example, under Windows running the following module would fail with a
RuntimeError:
Demonstration of how to create and use customized managers and proxies:
#
# This module shows how to use arbitrary callables with a subclass of
# `BaseManager`.
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
from multiprocessing import freeze_support
from multiprocessing.managers import BaseManager, BaseProxy
import operator
##
class Foo:
def f(self):
print('you called Foo.f()')
def g(self):
print('you called Foo.g()')
def _h(self):
print('you called Foo._h()')
# A simple generator function
def baz():
for i in range(10):
yield i*i
# Proxy type for generator objects
class GeneratorProxy(BaseProxy):
_exposed_ = ('next', '__next__')
def __iter__(self):
return self
def __next__(self):
return self._callmethod('next')
def __next__(self):
return self._callmethod('__next__')
# Function to return the operator module
def get_operator_module():
return operator
##
class MyManager(BaseManager):
pass
# register the Foo class; make `f()` and `g()` accessible via proxy
MyManager.register('Foo1', Foo)
# register the Foo class; make `g()` and `_h()` accessible via proxy
MyManager.register('Foo2', Foo, exposed=('g', '_h'))
# register the generator function baz; use `GeneratorProxy` to make proxies
MyManager.register('baz', baz, proxytype=GeneratorProxy)
# register get_operator_module(); make public functions accessible via proxy
MyManager.register('operator', get_operator_module)
##
def test():
manager = MyManager()
manager.start()
print('-' * 20)
f1 = manager.Foo1()
f1.f()
f1.g()
assert not hasattr(f1, '_h')
assert sorted(f1._exposed_) == sorted(['f', 'g'])
print('-' * 20)
f2 = manager.Foo2()
f2.g()
f2._h()
assert not hasattr(f2, 'f')
assert sorted(f2._exposed_) == sorted(['g', '_h'])
print('-' * 20)
it = manager.baz()
for i in it:
print('<%d>' % i, end=' ')
print()
print('-' * 20)
op = manager.operator()
print('op.add(23, 45) =', op.add(23, 45))
print('op.pow(2, 94) =', op.pow(2, 94))
print('op.getslice(range(10), 2, 6) =', op.getslice(list(range(10)), 2, 6))
print('op.repeat(range(5), 3) =', op.repeat(list(range(5)), 3))
print('op._exposed_ =', op._exposed_)
##
if __name__ == '__main__':
freeze_support()
test()
Using Pool:
#
# A test of `multiprocessing.Pool` class
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
import multiprocessing
import time
import random
import sys
#
# Functions used by test code
#
def calculate(func, args):
result = func(*args)
return '%s says that %s%s = %s' % (
multiprocessing.current_process().name,
func.__name__, args, result
)
def calculatestar(args):
return calculate(*args)
def mul(a, b):
time.sleep(0.5 * random.random())
return a * b
def plus(a, b):
time.sleep(0.5 * random.random())
return a + b
def f(x):
return 1.0 / (x - 5.0)
def pow3(x):
return x ** 3
def noop(x):
pass
#
# Test code
#
def test():
print('cpu_count() = %d\n' % multiprocessing.cpu_count())
#
# Create pool
#
PROCESSES = 4
print('Creating pool with %d processes\n' % PROCESSES)
pool = multiprocessing.Pool(PROCESSES)
print('pool = %s' % pool)
print()
#
# Tests
#
TASKS = [(mul, (i, 7)) for i in range(10)] + \
[(plus, (i, 8)) for i in range(10)]
results = [pool.apply_async(calculate, t) for t in TASKS]
imap_it = pool.imap(calculatestar, TASKS)
imap_unordered_it = pool.imap_unordered(calculatestar, TASKS)
print('Ordered results using pool.apply_async():')
for r in results:
print('\t', r.get())
print()
print('Ordered results using pool.imap():')
for x in imap_it:
print('\t', x)
print()
print('Unordered results using pool.imap_unordered():')
for x in imap_unordered_it:
print('\t', x)
print()
print('Ordered results using pool.map() --- will block till complete:')
for x in pool.map(calculatestar, TASKS):
print('\t', x)
print()
#
# Simple benchmarks
#
N = 100000
print('def pow3(x): return x**3')
t = time.time()
A = list(map(pow3, range(N)))
print('\tmap(pow3, range(%d)):\n\t\t%s seconds' % \
(N, time.time() - t))
t = time.time()
B = pool.map(pow3, range(N))
print('\tpool.map(pow3, range(%d)):\n\t\t%s seconds' % \
(N, time.time() - t))
t = time.time()
C = list(pool.imap(pow3, range(N), chunksize=N//8))
print('\tlist(pool.imap(pow3, range(%d), chunksize=%d)):\n\t\t%s' \
' seconds' % (N, N//8, time.time() - t))
assert A == B == C, (len(A), len(B), len(C))
print()
L = [None] * 1000000
print('def noop(x): pass')
print('L = [None] * 1000000')
t = time.time()
A = list(map(noop, L))
print('\tmap(noop, L):\n\t\t%s seconds' % \
(time.time() - t))
t = time.time()
B = pool.map(noop, L)
print('\tpool.map(noop, L):\n\t\t%s seconds' % \
(time.time() - t))
t = time.time()
C = list(pool.imap(noop, L, chunksize=len(L)//8))
print('\tlist(pool.imap(noop, L, chunksize=%d)):\n\t\t%s seconds' % \
(len(L)//8, time.time() - t))
assert A == B == C, (len(A), len(B), len(C))
print()
del A, B, C, L
#
# Test error handling
#
print('Testing error handling:')
try:
print(pool.apply(f, (5,)))
except ZeroDivisionError:
print('\tGot ZeroDivisionError as expected from pool.apply()')
else:
raise AssertionError('expected ZeroDivisionError')
try:
print(pool.map(f, list(range(10))))
except ZeroDivisionError:
print('\tGot ZeroDivisionError as expected from pool.map()')
else:
raise AssertionError('expected ZeroDivisionError')
try:
print(list(pool.imap(f, list(range(10)))))
except ZeroDivisionError:
print('\tGot ZeroDivisionError as expected from list(pool.imap())')
else:
raise AssertionError('expected ZeroDivisionError')
it = pool.imap(f, list(range(10)))
for i in range(10):
try:
x = next(it)
except ZeroDivisionError:
if i == 5:
pass
except StopIteration:
break
else:
if i == 5:
raise AssertionError('expected ZeroDivisionError')
assert i == 9
print('\tGot ZeroDivisionError as expected from IMapIterator.next()')
print()
#
# Testing timeouts
#
print('Testing ApplyResult.get() with timeout:', end=' ')
res = pool.apply_async(calculate, TASKS[0])
while 1:
sys.stdout.flush()
try:
sys.stdout.write('\n\t%s' % res.get(0.02))
break
except multiprocessing.TimeoutError:
sys.stdout.write('.')
print()
print()
print('Testing IMapIterator.next() with timeout:', end=' ')
it = pool.imap(calculatestar, TASKS)
while 1:
sys.stdout.flush()
try:
sys.stdout.write('\n\t%s' % it.next(0.02))
except StopIteration:
break
except multiprocessing.TimeoutError:
sys.stdout.write('.')
print()
print()
#
# Testing callback
#
print('Testing callback:')
A = []
B = [56, 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
r = pool.apply_async(mul, (7, 8), callback=A.append)
r.wait()
r = pool.map_async(pow3, list(range(10)), callback=A.extend)
r.wait()
if A == B:
print('\tcallbacks succeeded\n')
else:
print('\t*** callbacks failed\n\t\t%s != %s\n' % (A, B))
#
# Check there are no outstanding tasks
#
assert not pool._cache, 'cache = %r' % pool._cache
#
# Check close() methods
#
print('Testing close():')
for worker in pool._pool:
assert worker.is_alive()
result = pool.apply_async(time.sleep, [0.5])
pool.close()
pool.join()
assert result.get() is None
for worker in pool._pool:
assert not worker.is_alive()
print('\tclose() succeeded\n')
#
# Check terminate() method
#
print('Testing terminate():')
pool = multiprocessing.Pool(2)
DELTA = 0.1
ignore = pool.apply(pow3, [2])
results = [pool.apply_async(time.sleep, [DELTA]) for i in range(100)]
pool.terminate()
pool.join()
for worker in pool._pool:
assert not worker.is_alive()
print('\tterminate() succeeded\n')
#
# Check garbage collection
#
print('Testing garbage collection:')
pool = multiprocessing.Pool(2)
DELTA = 0.1
processes = pool._pool
ignore = pool.apply(pow3, [2])
results = [pool.apply_async(time.sleep, [DELTA]) for i in range(100)]
results = pool = None
time.sleep(DELTA * 2)
for worker in processes:
assert not worker.is_alive()
print('\tgarbage collection succeeded\n')
if __name__ == '__main__':
multiprocessing.freeze_support()
assert len(sys.argv) in (1, 2)
if len(sys.argv) == 1 or sys.argv[1] == 'processes':
print(' Using processes '.center(79, '-'))
elif sys.argv[1] == 'threads':
print(' Using threads '.center(79, '-'))
import multiprocessing.dummy as multiprocessing
else:
print('Usage:\n\t%s [processes | threads]' % sys.argv[0])
raise SystemExit(2)
test()
Synchronization types like locks, conditions and queues:
#
# A test file for the `multiprocessing` package
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
import time
import sys
import random
from queue import Empty
import multiprocessing # may get overwritten
#### TEST_VALUE
def value_func(running, mutex):
random.seed()
time.sleep(random.random()*4)
mutex.acquire()
print('\n\t\t\t' + str(multiprocessing.current_process()) + ' has finished')
running.value -= 1
mutex.release()
def test_value():
TASKS = 10
running = multiprocessing.Value('i', TASKS)
mutex = multiprocessing.Lock()
for i in range(TASKS):
p = multiprocessing.Process(target=value_func, args=(running, mutex))
p.start()
while running.value > 0:
time.sleep(0.08)
mutex.acquire()
print(running.value, end=' ')
sys.stdout.flush()
mutex.release()
print()
print('No more running processes')
#### TEST_QUEUE
def queue_func(queue):
for i in range(30):
time.sleep(0.5 * random.random())
queue.put(i*i)
queue.put('STOP')
def test_queue():
q = multiprocessing.Queue()
p = multiprocessing.Process(target=queue_func, args=(q,))
p.start()
o = None
while o != 'STOP':
try:
o = q.get(timeout=0.3)
print(o, end=' ')
sys.stdout.flush()
except Empty:
print('TIMEOUT')
print()
#### TEST_CONDITION
def condition_func(cond):
cond.acquire()
print('\t' + str(cond))
time.sleep(2)
print('\tchild is notifying')
print('\t' + str(cond))
cond.notify()
cond.release()
def test_condition():
cond = multiprocessing.Condition()
p = multiprocessing.Process(target=condition_func, args=(cond,))
print(cond)
cond.acquire()
print(cond)
cond.acquire()
print(cond)
p.start()
print('main is waiting')
cond.wait()
print('main has woken up')
print(cond)
cond.release()
print(cond)
cond.release()
p.join()
print(cond)
#### TEST_SEMAPHORE
def semaphore_func(sema, mutex, running):
sema.acquire()
mutex.acquire()
running.value += 1
print(running.value, 'tasks are running')
mutex.release()
random.seed()
time.sleep(random.random()*2)
mutex.acquire()
running.value -= 1
print('%s has finished' % multiprocessing.current_process())
mutex.release()
sema.release()
def test_semaphore():
sema = multiprocessing.Semaphore(3)
mutex = multiprocessing.RLock()
running = multiprocessing.Value('i', 0)
processes = [
multiprocessing.Process(target=semaphore_func,
args=(sema, mutex, running))
for i in range(10)
]
for p in processes:
p.start()
for p in processes:
p.join()
#### TEST_JOIN_TIMEOUT
def join_timeout_func():
print('\tchild sleeping')
time.sleep(5.5)
print('\n\tchild terminating')
def test_join_timeout():
p = multiprocessing.Process(target=join_timeout_func)
p.start()
print('waiting for process to finish')
while 1:
p.join(timeout=1)
if not p.is_alive():
break
print('.', end=' ')
sys.stdout.flush()
#### TEST_EVENT
def event_func(event):
print('\t%r is waiting' % multiprocessing.current_process())
event.wait()
print('\t%r has woken up' % multiprocessing.current_process())
def test_event():
event = multiprocessing.Event()
processes = [multiprocessing.Process(target=event_func, args=(event,))
for i in range(5)]
for p in processes:
p.start()
print('main is sleeping')
time.sleep(2)
print('main is setting event')
event.set()
for p in processes:
p.join()
#### TEST_SHAREDVALUES
def sharedvalues_func(values, arrays, shared_values, shared_arrays):
for i in range(len(values)):
v = values[i][1]
sv = shared_values[i].value
assert v == sv
for i in range(len(values)):
a = arrays[i][1]
sa = list(shared_arrays[i][:])
assert a == sa
print('Tests passed')
def test_sharedvalues():
values = [
('i', 10),
('h', -2),
('d', 1.25)
]
arrays = [
('i', list(range(100))),
('d', [0.25 * i for i in range(100)]),
('H', list(range(1000)))
]
shared_values = [multiprocessing.Value(id, v) for id, v in values]
shared_arrays = [multiprocessing.Array(id, a) for id, a in arrays]
p = multiprocessing.Process(
target=sharedvalues_func,
args=(values, arrays, shared_values, shared_arrays)
)
p.start()
p.join()
assert p.exitcode == 0
####
def test(namespace=multiprocessing):
global multiprocessing
multiprocessing = namespace
for func in [test_value, test_queue, test_condition,
test_semaphore, test_join_timeout, test_event,
test_sharedvalues]:
print('\n\t######## %s\n' % func.__name__)
func()
ignore = multiprocessing.active_children() # cleanup any old processes
if hasattr(multiprocessing, '_debug_info'):
info = multiprocessing._debug_info()
if info:
print(info)
raise ValueError('there should be no positive refcounts left')
if __name__ == '__main__':
multiprocessing.freeze_support()
assert len(sys.argv) in (1, 2)
if len(sys.argv) == 1 or sys.argv[1] == 'processes':
print(' Using processes '.center(79, '-'))
namespace = multiprocessing
elif sys.argv[1] == 'manager':
print(' Using processes and a manager '.center(79, '-'))
namespace = multiprocessing.Manager()
namespace.Process = multiprocessing.Process
namespace.current_process = multiprocessing.current_process
namespace.active_children = multiprocessing.active_children
elif sys.argv[1] == 'threads':
print(' Using threads '.center(79, '-'))
import multiprocessing.dummy as namespace
else:
print('Usage:\n\t%s [processes | manager | threads]' % sys.argv[0])
raise SystemExit(2)
test(namespace)
An example showing how to use queues to feed tasks to a collection of worker
process and collect the results:
## Simple example which uses a pool of workers to carry out some tasks.## Notice that the results will probably not come out of the output# queue in the same in the same order as the corresponding tasks were# put on the input queue. If it is important to get the results back# in the original order then consider using `Pool.map()` or# `Pool.imap()` (which will save on the amount of code needed anyway).## Copyright (c) 2006-2008, R Oudkerk# All rights reserved.#importtimeimportrandomfrommultiprocessingimportProcess,Queue,current_process,freeze_support## Function run by worker processes#defworker(input,output):forfunc,argsiniter(input.get,'STOP'):result=calculate(func,args)output.put(result)## Function used to calculate result#defcalculate(func,args):result=func(*args)return'%s says that %s%s = %s'% \
(current_process().name,func.__name__,args,result)## Functions referenced by tasks#defmul(a,b):time.sleep(0.5*random.random())returna*bdefplus(a,b):time.sleep(0.5*random.random())returna+b###deftest():NUMBER_OF_PROCESSES=4TASKS1=[(mul,(i,7))foriinrange(20)]TASKS2=[(plus,(i,8))foriinrange(10)]# Create queuestask_queue=Queue()done_queue=Queue()# Submit tasksfortaskinTASKS1:task_queue.put(task)# Start worker processesforiinrange(NUMBER_OF_PROCESSES):Process(target=worker,args=(task_queue,done_queue)).start()# Get and print resultsprint('Unordered results:')foriinrange(len(TASKS1)):print('\t',done_queue.get())# Add more tasks using `put()`fortaskinTASKS2:task_queue.put(task)# Get and print some more resultsforiinrange(len(TASKS2)):print('\t',done_queue.get())# Tell child processes to stopforiinrange(NUMBER_OF_PROCESSES):task_queue.put('STOP')if__name__=='__main__':freeze_support()test()
An example of how a pool of worker processes can each run a
SimpleHTTPRequestHandler instance while sharing a single
listening socket.
## Example where a pool of http servers share a single listening socket## On Windows this module depends on the ability to pickle a socket# object so that the worker processes can inherit a copy of the server# object. (We import `multiprocessing.reduction` to enable this pickling.)## Not sure if we should synchronize access to `socket.accept()` method by# using a process-shared lock -- does not seem to be necessary.## Copyright (c) 2006-2008, R Oudkerk# All rights reserved.#importosimportsysfrommultiprocessingimportProcess,current_process,freeze_supportfromhttp.serverimportHTTPServerfromhttp.serverimportSimpleHTTPRequestHandlerifsys.platform=='win32':importmultiprocessing.reduction# make sockets pickable/inheritabledefnote(format,*args):sys.stderr.write('[%s]\t%s\n'%(current_process().name,format%args))classRequestHandler(SimpleHTTPRequestHandler):# we override log_message() to show which process is handling the requestdeflog_message(self,format,*args):note(format,*args)defserve_forever(server):note('starting server')try:server.serve_forever()exceptKeyboardInterrupt:passdefrunpool(address,number_of_processes):# create a single server object -- children will each inherit a copyserver=HTTPServer(address,RequestHandler)# create child processes to act as workersforiinrange(number_of_processes-1):Process(target=serve_forever,args=(server,)).start()# main process also acts as a workerserve_forever(server)deftest():DIR=os.path.join(os.path.dirname(__file__),'..')ADDRESS=('localhost',8000)NUMBER_OF_PROCESSES=4print('Serving at http://%s:%d using %d worker processes'% \
(ADDRESS[0],ADDRESS[1],NUMBER_OF_PROCESSES))print('To exit press Ctrl-'+['C','Break'][sys.platform=='win32'])os.chdir(DIR)runpool(ADDRESS,NUMBER_OF_PROCESSES)if__name__=='__main__':freeze_support()test()
## Simple benchmarks for the multiprocessing package## Copyright (c) 2006-2008, R Oudkerk# All rights reserved.#importtimeimportsysimportmultiprocessingimportthreadingimportqueueimportgcifsys.platform=='win32':_timer=time.clockelse:_timer=time.timedelta=1#### TEST_QUEUESPEEDdefqueuespeed_func(q,c,iterations):a='0'*256c.acquire()c.notify()c.release()foriinrange(iterations):q.put(a)q.put('STOP')deftest_queuespeed(Process,q,c):elapsed=0iterations=1whileelapsed<delta:iterations*=2p=Process(target=queuespeed_func,args=(q,c,iterations))c.acquire()p.start()c.wait()c.release()result=Nonet=_timer()whileresult!='STOP':result=q.get()elapsed=_timer()-tp.join()print(iterations,'objects passed through the queue in',elapsed,'seconds')print('average number/sec:',iterations/elapsed)#### TEST_PIPESPEEDdefpipe_func(c,cond,iterations):a='0'*256cond.acquire()cond.notify()cond.release()foriinrange(iterations):c.send(a)c.send('STOP')deftest_pipespeed():c,d=multiprocessing.Pipe()cond=multiprocessing.Condition()elapsed=0iterations=1whileelapsed<delta:iterations*=2p=multiprocessing.Process(target=pipe_func,args=(d,cond,iterations))cond.acquire()p.start()cond.wait()cond.release()result=Nonet=_timer()whileresult!='STOP':result=c.recv()elapsed=_timer()-tp.join()print(iterations,'objects passed through connection in',elapsed,'seconds')print('average number/sec:',iterations/elapsed)#### TEST_SEQSPEEDdeftest_seqspeed(seq):elapsed=0iterations=1whileelapsed<delta:iterations*=2t=_timer()foriinrange(iterations):a=seq[5]elapsed=_timer()-tprint(iterations,'iterations in',elapsed,'seconds')print('average number/sec:',iterations/elapsed)#### TEST_LOCKdeftest_lockspeed(l):elapsed=0iterations=1whileelapsed<delta:iterations*=2t=_timer()foriinrange(iterations):l.acquire()l.release()elapsed=_timer()-tprint(iterations,'iterations in',elapsed,'seconds')print('average number/sec:',iterations/elapsed)#### TEST_CONDITIONdefconditionspeed_func(c,N):c.acquire()c.notify()foriinrange(N):c.wait()c.notify()c.release()deftest_conditionspeed(Process,c):elapsed=0iterations=1whileelapsed<delta:iterations*=2c.acquire()p=Process(target=conditionspeed_func,args=(c,iterations))p.start()c.wait()t=_timer()foriinrange(iterations):c.notify()c.wait()elapsed=_timer()-tc.release()p.join()print(iterations*2,'waits in',elapsed,'seconds')print('average number/sec:',iterations*2/elapsed)####deftest():manager=multiprocessing.Manager()gc.disable()print('\n\t######## testing Queue.Queue\n')test_queuespeed(threading.Thread,queue.Queue(),threading.Condition())print('\n\t######## testing multiprocessing.Queue\n')test_queuespeed(multiprocessing.Process,multiprocessing.Queue(),multiprocessing.Condition())print('\n\t######## testing Queue managed by server process\n')test_queuespeed(multiprocessing.Process,manager.Queue(),manager.Condition())print('\n\t######## testing multiprocessing.Pipe\n')test_pipespeed()print()print('\n\t######## testing list\n')test_seqspeed(list(range(10)))print('\n\t######## testing list managed by server process\n')test_seqspeed(manager.list(list(range(10))))print('\n\t######## testing Array("i", ..., lock=False)\n')test_seqspeed(multiprocessing.Array('i',list(range(10)),lock=False))print('\n\t######## testing Array("i", ..., lock=True)\n')test_seqspeed(multiprocessing.Array('i',list(range(10)),lock=True))print()print('\n\t######## testing threading.Lock\n')test_lockspeed(threading.Lock())print('\n\t######## testing threading.RLock\n')test_lockspeed(threading.RLock())print('\n\t######## testing multiprocessing.Lock\n')test_lockspeed(multiprocessing.Lock())print('\n\t######## testing multiprocessing.RLock\n')test_lockspeed(multiprocessing.RLock())print('\n\t######## testing lock managed by server process\n')test_lockspeed(manager.Lock())print('\n\t######## testing rlock managed by server process\n')test_lockspeed(manager.RLock())print()print('\n\t######## testing threading.Condition\n')test_conditionspeed(threading.Thread,threading.Condition())print('\n\t######## testing multiprocessing.Condition\n')test_conditionspeed(multiprocessing.Process,multiprocessing.Condition())print('\n\t######## testing condition managed by a server process\n')test_conditionspeed(multiprocessing.Process,manager.Condition())gc.enable()if__name__=='__main__':multiprocessing.freeze_support()test()
The concurrent.futures module provides a high-level interface for
asynchronously executing callables.
The asynchronous execution can be be performed with threads, using
ThreadPoolExecutor, or separate processes, using
ProcessPoolExecutor. Both implement the same interface, which is
defined by the abstract Executor class.
Equivalent to map(func,*iterables) except func is executed
asynchronously and several calls to func may be made concurrently. The
returned iterator raises a TimeoutError if __next__() is
called and the result isn’t available after timeout seconds from the
original call to Executor.map(). timeout can be an int or a
float. If timeout is not specified or None, there is no limit to
the wait time. If a call raises an exception, then that exception will
be raised when its value is retrieved from the iterator.
Signal the executor that it should free any resources that it is using
when the currently pending futures are done executing. Calls to
Executor.submit() and Executor.map() made after shutdown will
raise RuntimeError.
If wait is True then this method will not return until all the
pending futures are done executing and the resources associated with the
executor have been freed. If wait is False then this method will
return immediately and the resources associated with the executor will be
freed when all pending futures are done executing. Regardless of the
value of wait, the entire Python program will not exit until all
pending futures are done executing.
You can avoid having to call this method explicitly if you use the
with statement, which will shutdown the Executor
(waiting as if Executor.shutdown() were called with wait set to
True):
Deadlocks can occur when the callable associated with a Future waits on
the results of another Future. For example:
importtimedefwait_on_b():time.sleep(5)print(b.result())# b will never complete because it is waiting on a.return5defwait_on_a():time.sleep(5)print(a.result())# a will never complete because it is waiting on b.return6executor=ThreadPoolExecutor(max_workers=2)a=executor.submit(wait_on_b)b=executor.submit(wait_on_a)
And:
defwait_on_future():f=executor.submit(pow,5,2)# This will never complete because there is only one worker thread and# it is executing this function.print(f.result())executor=ThreadPoolExecutor(max_workers=1)executor.submit(wait_on_future)
class concurrent.futures.ThreadPoolExecutor(max_workers)¶
An Executor subclass that uses a pool of at most max_workers
threads to execute calls asynchronously.
importconcurrent.futuresimporturllib.requestURLS=['http://www.foxnews.com/','http://www.cnn.com/','http://europe.wsj.com/','http://www.bbc.co.uk/','http://some-made-up-domain.com/']defload_url(url,timeout):returnurllib.request.urlopen(url,timeout=timeout).read()withconcurrent.futures.ThreadPoolExecutor(max_workers=5)asexecutor:future_to_url=dict((executor.submit(load_url,url,60),url)forurlinURLS)forfutureinconcurrent.futures.as_completed(future_to_url):url=future_to_url[future]iffuture.exception()isnotNone:print('%r generated an exception: %s'%(url,future.exception()))else:print('%r page is %d bytes'%(url,len(future.result())))
class concurrent.futures.ProcessPoolExecutor(max_workers=None)¶
An Executor subclass that executes calls asynchronously using a pool
of at most max_workers processes. If max_workers is None or not
given, it will default to the number of processors on the machine.
importconcurrent.futuresimportmathPRIMES=[112272535095293,112582705942171,112272535095293,115280095190773,115797848077099,1099726899285419]defis_prime(n):ifn%2==0:returnFalsesqrt_n=int(math.floor(math.sqrt(n)))foriinrange(3,sqrt_n+1,2):ifn%i==0:returnFalsereturnTruedefmain():withconcurrent.futures.ProcessPoolExecutor()asexecutor:fornumber,primeinzip(PRIMES,executor.map(is_prime,PRIMES)):print('%d is prime: %s'%(number,prime))if__name__=='__main__':main()
Encapsulates the asynchronous execution of a callable. Future
instances are created by Executor.submit() and should not be created
directly except for testing.
Attempt to cancel the call. If the call is currently being executed and
cannot be cancelled then the method will return False, otherwise the
call will be cancelled and the method will return True.
Return the value returned by the call. If the call hasn’t yet completed
then this method will wait up to timeout seconds. If the call hasn’t
completed in timeout seconds, then a TimeoutError will be
raised. timeout can be an int or float. If timeout is not specified
or None, there is no limit to the wait time.
If the future is cancelled before completing then CancelledError
will be raised.
If the call raised, this method will raise the same exception.
Return the exception raised by the call. If the call hasn’t yet
completed then this method will wait up to timeout seconds. If the
call hasn’t completed in timeout seconds, then a TimeoutError
will be raised. timeout can be an int or float. If timeout is not
specified or None, there is no limit to the wait time.
If the future is cancelled before completing then CancelledError
will be raised.
If the call completed without raising, None is returned.
Attaches the callable fn to the future. fn will be called, with the
future as its only argument, when the future is cancelled or finishes
running.
Added callables are called in the order that they were added and are
always called in a thread belonging to the process that added them. If
the callable raises a Exception subclass, it will be logged and
ignored. If the callable raises a BaseException subclass, the
behavior is undefined.
If the future has already completed or been cancelled, fn will be
called immediately.
The following Future methods are meant for use in unit tests and
Executor implementations.
This method should only be called by Executor implementations
before executing the work associated with the Future and by unit
tests.
If the method returns False then the Future was cancelled,
i.e. Future.cancel() was called and returned True. Any threads
waiting on the Future completing (i.e. through
as_completed() or wait()) will be woken up.
If the method returns True then the Future was not cancelled
and has been put in the running state, i.e. calls to
Future.running() will return True.
Wait for the Future instances (possibly created by different
Executor instances) given by fs to complete. Returns a named
2-tuple of sets. The first set, named done, contains the futures that
completed (finished or were cancelled) before the wait completed. The second
set, named not_done, contains uncompleted futures.
timeout can be used to control the maximum number of seconds to wait before
returning. timeout can be an int or float. If timeout is not specified
or None, there is no limit to the wait time.
return_when indicates when this function should return. It must be one of
the following constants:
Constant
Description
FIRST_COMPLETED
The function will return when any
future finishes or is cancelled.
FIRST_EXCEPTION
The function will return when any
future finishes by raising an
exception. If no future raises an
exception then it is equivalent to
ALL_COMPLETED.
ALL_COMPLETED
The function will return when all
futures finish or are cancelled.
Returns an iterator over the Future instances (possibly created by
different Executor instances) given by fs that yields futures as
they complete (finished or were cancelled). Any futures that completed
before as_completed() is called will be yielded first. The returned
iterator raises a TimeoutError if __next__() is called and the
result isn’t available after timeout seconds from the original call to
as_completed(). timeout can be an int or float. If timeout is not
specified or None, there is no limit to the wait time.
See also
PEP 3148 – futures - execute computations asynchronously
The proposal which described this feature for inclusion in the Python
standard library.
Memory-mapped file objects behave like both bytearray and like
file objects. You can use mmap objects in most places
where bytearray are expected; for example, you can use the re
module to search through a memory-mapped file. You can also change a single
byte by doing obj[index]=97, or change a subsequence by assigning to a
slice: obj[i1:i2]=b'...'. You can also read and write data starting at
the current file position, and seek() through the file to different positions.
A memory-mapped file is created by the mmap constructor, which is
different on Unix and on Windows. In either case you must provide a file
descriptor for a file opened for update. If you wish to map an existing Python
file object, use its fileno() method to obtain the correct value for the
fileno parameter. Otherwise, you can open the file using the
os.open() function, which returns a file descriptor directly (the file
still needs to be closed when done).
Note
If you want to create a memory-mapping for a writable, buffered file, you
should flush() the file first. This is necessary to ensure
that local modifications to the buffers are actually available to the
mapping.
For both the Unix and Windows versions of the constructor, access may be
specified as an optional keyword parameter. access accepts one of three
values: ACCESS_READ, ACCESS_WRITE, or ACCESS_COPY
to specify read-only, write-through or copy-on-write memory respectively.
access can be used on both Unix and Windows. If access is not specified,
Windows mmap returns a write-through mapping. The initial memory values for
all three access types are taken from the specified file. Assignment to an
ACCESS_READ memory map raises a TypeError exception.
Assignment to an ACCESS_WRITE memory map affects both memory and the
underlying file. Assignment to an ACCESS_COPY memory map affects
memory but does not update the underlying file.
To map anonymous memory, -1 should be passed as the fileno along with the length.
class mmap.mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT[, offset])¶
(Windows version) Maps length bytes from the file specified by the
file handle fileno, and creates a mmap object. If length is larger
than the current size of the file, the file is extended to contain length
bytes. If length is 0, the maximum length of the map is the current
size of the file, except that if the file is empty Windows raises an
exception (you cannot create an empty mapping on Windows).
tagname, if specified and not None, is a string giving a tag name for
the mapping. Windows allows you to have many different mappings against
the same file. If you specify the name of an existing tag, that tag is
opened, otherwise a new tag of this name is created. If this parameter is
omitted or None, the mapping is created without a name. Avoiding the
use of the tag parameter will assist in keeping your code portable between
Unix and Windows.
offset may be specified as a non-negative integer offset. mmap references
will be relative to the offset from the beginning of the file. offset
defaults to 0. offset must be a multiple of the ALLOCATIONGRANULARITY.
class mmap.mmap(fileno, length, flags=MAP_SHARED, prot=PROT_WRITE|PROT_READ, access=ACCESS_DEFAULT[, offset])
(Unix version) Maps length bytes from the file specified by the file
descriptor fileno, and returns a mmap object. If length is 0, the
maximum length of the map will be the current size of the file when
mmap is called.
flags specifies the nature of the mapping. MAP_PRIVATE creates a
private copy-on-write mapping, so changes to the contents of the mmap
object will be private to this process, and MAP_SHARED creates a
mapping that’s shared with all other processes mapping the same areas of
the file. The default value is MAP_SHARED.
prot, if specified, gives the desired memory protection; the two most
useful values are PROT_READ and PROT_WRITE, to specify
that the pages may be read or written. prot defaults to
PROT_READ|PROT_WRITE.
access may be specified in lieu of flags and prot as an optional
keyword parameter. It is an error to specify both flags, prot and
access. See the description of access above for information on how to
use this parameter.
offset may be specified as a non-negative integer offset. mmap references
will be relative to the offset from the beginning of the file. offset
defaults to 0. offset must be a multiple of the PAGESIZE or
ALLOCATIONGRANULARITY.
To ensure validity of the created memory mapping the file specified
by the descriptor fileno is internally automatically synchronized
with physical backing store on Mac OS X and OpenVMS.
importmmap# write a simple example filewithopen("hello.txt","wb")asf:f.write(b"Hello Python!\n")withopen("hello.txt","r+b")asf:# memory-map the file, size 0 means whole filemap=mmap.mmap(f.fileno(),0)# read content via standard file methodsprint(map.readline())# prints b"Hello Python!\n"# read content via slice notationprint(map[:5])# prints b"Hello"# update content using slice notation;# note that new content must have same sizemap[6:]=b" world!\n"# ... and read again using standard file methodsmap.seek(0)print(map.readline())# prints b"Hello world!\n"# close the mapmap.close()
mmap can also be used as a context manager in a with
statement.:
Returns the lowest index in the object where the subsequence sub is
found, such that sub is contained in the range [start, end].
Optional arguments start and end are interpreted as in slice notation.
Returns -1 on failure.
Flushes changes made to the in-memory copy of a file back to disk. Without
use of this call there is no guarantee that changes are written back before
the object is destroyed. If offset and size are specified, only
changes to the given range of bytes will be flushed to disk; otherwise, the
whole extent of the mapping is flushed.
(Windows version) A nonzero value returned indicates success; zero
indicates failure.
(Unix version) A zero value is returned to indicate success. An
exception is raised when the call failed.
Copy the count bytes starting at offset src to the destination index
dest. If the mmap was created with ACCESS_READ, then calls to
move will raise a TypeError exception.
Return a bytes containing up to num bytes starting from the
current file position; the file position is updated to point after the
bytes that were returned.
Resizes the map and the underlying file, if any. If the mmap was created
with ACCESS_READ or ACCESS_COPY, resizing the map will
raise a TypeError exception.
Returns the highest index in the object where the subsequence sub is
found, such that sub is contained in the range [start, end].
Optional arguments start and end are interpreted as in slice notation.
Returns -1 on failure.
Set the file’s current position. whence argument is optional and
defaults to os.SEEK_SET or 0 (absolute file positioning); other
values are os.SEEK_CUR or 1 (seek relative to the current
position) and os.SEEK_END or 2 (seek relative to the file’s end).
Write the bytes in bytes into memory at the current position of the
file pointer; the file position is updated to point after the bytes that
were written. If the mmap was created with ACCESS_READ, then
writing to it will raise a TypeError exception.
Write the the integer byte into memory at the current
position of the file pointer; the file position is advanced by 1. If
the mmap was created with ACCESS_READ, then writing to it will
raise a TypeError exception.
The readline module defines a number of functions to facilitate
completion and reading/writing of history files from the Python interpreter.
This module can be used directly or via the rlcompleter module. Settings
made using this module affect the behaviour of both the interpreter’s
interactive prompt and the prompts offered by the built-in input()
function.
Note
On MacOS X the readline module can be implemented using
the libedit library instead of GNU readline.
The configuration file for libedit is different from that
of GNU readline. If you programmatically load configuration strings
you can check for the text “libedit” in readline.__doc__
to differentiate between GNU readline and libedit.
The readline module defines the following functions:
Set the number of lines to save in the history file. write_history_file()
uses this value to truncate the history file when saving. Negative values imply
unlimited history file size.
Return the number of lines currently in the history. (This is different from
get_history_length(), which returns the maximum number of lines that will
be written to a history file.)
Set or remove the startup_hook function. If function is specified, it will be
used as the new startup_hook function; if omitted or None, any hook function
already installed is removed. The startup_hook function is called with no
arguments just before readline prints the first prompt.
Set or remove the pre_input_hook function. If function is specified, it will
be used as the new pre_input_hook function; if omitted or None, any hook
function already installed is removed. The pre_input_hook function is called
with no arguments after the first prompt has been printed and just before
readline starts reading input characters.
Set or remove the completer function. If function is specified, it will be
used as the new completer function; if omitted or None, any completer
function already installed is removed. The completer function is called as
function(text,state), for state in 0, 1, 2, ..., until it
returns a non-string value. It should return the next possible completion
starting with text.
Set or remove the completion display function. If function is
specified, it will be used as the new completion display function;
if omitted or None, any completion display function already
installed is removed. The completion display function is called as
function(substitution,[matches],longest_match_length) once
each time matches need to be displayed.
The following example demonstrates how to use the readline module’s
history reading and writing functions to automatically load and save a history
file named .pyhist from the user’s home directory. The code below would
normally be executed automatically during interactive sessions from the user’s
PYTHONSTARTUP file.
The rlcompleter module defines a completion function suitable for the
readline module by completing valid Python identifiers and keywords.
When this module is imported on a Unix platform with the readline module
available, an instance of the Completer class is automatically created
and its complete() method is set as the readline completer.
The rlcompleter module is designed for use with Python’s interactive
mode. A user can add the following lines to his or her initialization file
(identified by the PYTHONSTARTUP environment variable) to get
automatic Tab completion:
try:importreadlineexceptImportError:print("Module readline not available.")else:importrlcompleterreadline.parse_and_bind("tab: complete")
On platforms without readline, the Completer class defined by
this module can still be used for custom purposes.
If called for text that doesn’t include a period character ('.'), it will
complete from names currently defined in __main__, builtins and
keywords (as defined by the keyword module).
If called for a dotted name, it will try to evaluate anything without obvious
side-effects (functions will not be evaluated, but it can generate calls to
__getattr__()) up to the last part, and find matches for the rest via the
dir() function. Any exception raised during the evaluation of the
expression is caught, silenced and None is returned.
Be careful to not use this module where deadlock might occur from a thread being
created that blocks waiting for another thread to be created. This often occurs
with blocking I/O.
This module provides low-level primitives for working with multiple threads
(also called light-weight processes or tasks) — multiple threads of
control sharing their global data space. For synchronization, simple locks
(also called mutexes or binary semaphores) are provided.
The threading module provides an easier to use and higher-level
threading API built on top of this module.
The module is optional. It is supported on Windows, Linux, SGI IRIX, Solaris
2.x, as well as on systems that have a POSIX thread (a.k.a. “pthread”)
implementation. For systems lacking the _thread module, the
_dummy_thread module is available. It duplicates this module’s interface
and can be used as a drop-in replacement.
Start a new thread and return its identifier. The thread executes the function
function with the argument list args (which must be a tuple). The optional
kwargs argument specifies a dictionary of keyword arguments. When the function
returns, the thread silently exits. When the function terminates with an
unhandled exception, a stack trace is printed and then the thread exits (but
other threads continue to run).
Return the ‘thread identifier’ of the current thread. This is a nonzero
integer. Its value has no direct meaning; it is intended as a magic cookie to
be used e.g. to index a dictionary of thread-specific data. Thread identifiers
may be recycled when a thread exits and another thread is created.
Return the thread stack size used when creating new threads. The optional
size argument specifies the stack size to be used for subsequently created
threads, and must be 0 (use platform or configured default) or a positive
integer value of at least 32,768 (32kB). If changing the thread stack size is
unsupported, a ThreadError is raised. If the specified stack size is
invalid, a ValueError is raised and the stack size is unmodified. 32kB
is currently the minimum supported stack size value to guarantee sufficient
stack space for the interpreter itself. Note that some platforms may have
particular restrictions on values for the stack size, such as requiring a
minimum stack size > 32kB or requiring allocation in multiples of the system
memory page size - platform documentation should be referred to for more
information (4kB pages are common; using multiples of 4096 for the stack size is
the suggested approach in the absence of more specific information).
Availability: Windows, systems with POSIX threads.
Without any optional argument, this method acquires the lock unconditionally, if
necessary waiting until it is released by another thread (only one thread at a
time can acquire a lock — that’s their reason for existence).
If the integer waitflag argument is present, the action depends on its
value: if it is zero, the lock is only acquired if it can be acquired
immediately without waiting, while if it is nonzero, the lock is acquired
unconditionally as above.
If the floating-point timeout argument is present and positive, it
specifies the maximum wait time in seconds before returning. A negative
timeout argument specifies an unbounded wait. You cannot specify
a timeout if waitflag is zero.
The return value is True if the lock is acquired successfully,
False if not.
Changed in version 3.2:
Changed in version 3.2: The timeout parameter is new.
Changed in version 3.2:
Changed in version 3.2: Lock acquires can now be interrupted by signals on POSIX.
Return the status of the lock: True if it has been acquired by some thread,
False if not.
In addition to these methods, lock objects can also be used via the
with statement, e.g.:
import_threada_lock=_thread.allocate_lock()witha_lock:print("a_lock is locked while this executes")
Caveats:
Threads interact strangely with interrupts: the KeyboardInterrupt
exception will be received by an arbitrary thread. (When the signal
module is available, interrupts always go to the main thread.)
Not all built-in functions that may block waiting for I/O allow other threads
to run. (The most popular ones (time.sleep(), file.read(),
select.select()) work as expected.)
It is not possible to interrupt the acquire() method on a lock — the
KeyboardInterrupt exception will happen after the lock has been acquired.
When the main thread exits, it is system defined whether the other threads
survive. On most systems, they are killed without executing
try ... finally clauses or executing object
destructors.
When the main thread exits, it does not do any of its usual cleanup (except
that try ... finally clauses are honored), and the
standard I/O files are not flushed.
Be careful to not use this module where deadlock might occur from a thread being
created that blocks waiting for another thread to be created. This often occurs
with blocking I/O.
The modules described in this chapter provide mechanisms for different processes
to communicate.
Some modules only work for two processes that are on the same machine, e.g.
signal and subprocess. Other modules support networking protocols
that two or more processes can used to communicate across machines.
The subprocess module allows you to spawn new processes, connect to their
input/output/error pipes, and obtain their return codes. This module intends to
replace several other, older modules and functions, such as:
os.system
os.spawn*
Information about how the subprocess module can be used to replace these
modules and functions can be found in the following sections.
args should be a string, or a sequence of program arguments. The program
to execute is normally the first item in the args sequence or the string if
a string is given, but can be explicitly set by using the executable
argument. When executable is given, the first item in the args sequence
is still treated by most programs as the command name, which can then be
different from the actual executable name. On Unix, it becomes the display
name for the executing program in utilities such as ps.
On Unix, with shell=False (default): In this case, the Popen class uses
os.execvp() like behavior to execute the child program.
args should normally be a
sequence. If a string is specified for args, it will be used as the name
or path of the program to execute; this will only work if the program is
being given no arguments.
Note
shlex.split() can be useful when determining the correct
tokenization for args, especially in complex cases:
Note in particular that options (such as -input) and arguments (such
as eggs.txt) that are separated by whitespace in the shell go in separate
list elements, while arguments that need quoting or backslash escaping when
used in the shell (such as filenames containing spaces or the echo command
shown above) are single list elements.
On Unix, with shell=True: If args is a string, it specifies the command
string to execute through the shell. This means that the string must be
formatted exactly as it would be when typed at the shell prompt. This
includes, for example, quoting or backslash escaping filenames with spaces in
them. If args is a sequence, the first item specifies the command string, and
any additional items will be treated as additional arguments to the shell
itself. That is to say, Popen does the equivalent of:
Popen(['/bin/sh','-c',args[0],args[1],...])
Warning
Executing shell commands that incorporate unsanitized input from an
untrusted source makes a program vulnerable to shell injection,
a serious security flaw which can result in arbitrary command execution.
For this reason, the use of shell=True is strongly discouraged in cases
where the command string is constructed from external input:
>>> fromsubprocessimportcall>>> filename=input("What file would you like to display?\n")What file would you like to display?non_existent; rm -rf / #>>> call("cat "+filename,shell=True)# Uh-oh. This will end badly...
shell=False does not suffer from this vulnerability; the above Note may be
helpful in getting code using shell=False to work.
On Windows: the Popen class uses CreateProcess() to execute the
child program, which operates on strings. If args is a sequence, it will
be converted to a string in a manner described in
Converting an argument sequence to a string on Windows.
bufsize, if given, has the same meaning as the corresponding argument to the
built-in open() function: 0 means unbuffered, 1 means line
buffered, any other positive value means use a buffer of (approximately) that
size. A negative bufsize means to use the system default, which usually means
fully buffered. The default value for bufsize is 0 (unbuffered).
Note
If you experience performance issues, it is recommended that you try to
enable buffering by setting bufsize to either -1 or a large enough
positive value (such as 4096).
The executable argument specifies the program to execute. It is very seldom
needed: Usually, the program to execute is defined by the args argument. If
shell=True, the executable argument specifies which shell to use. On Unix,
the default shell is /bin/sh. On Windows, the default shell is
specified by the COMSPEC environment variable. The only reason you
would need to specify shell=True on Windows is where the command you
wish to execute is actually built in to the shell, eg dir, copy.
You don’t need shell=True to run a batch file, nor to run a console-based
executable.
stdin, stdout and stderr specify the executed programs’ standard input,
standard output and standard error file handles, respectively. Valid values
are PIPE, an existing file descriptor (a positive integer), an
existing file object, and None. PIPE indicates that a
new pipe to the child should be created. With None, no redirection will
occur; the child’s file handles will be inherited from the parent. Additionally,
stderr can be STDOUT, which indicates that the stderr data from the
applications should be captured into the same file handle as for stdout.
If preexec_fn is set to a callable object, this object will be called in the
child process just before the child is executed.
(Unix only)
Warning
The preexec_fn parameter is not safe to use in the presence of threads
in your application. The child process could deadlock before exec is
called.
If you must use it, keep it trivial! Minimize the number of libraries
you call into.
Note
If you need to modify the environment for the child use the env
parameter rather than doing it in a preexec_fn.
The start_new_session parameter can take the place of a previously
common use of preexec_fn to call os.setsid() in the child.
If close_fds is true, all file descriptors except 0, 1 and
2 will be closed before the child process is executed. (Unix only).
The default varies by platform: Always true on Unix. On Windows it is
true when stdin/stdout/stderr are None, false otherwise.
On Windows, if close_fds is true then no handles will be inherited by the
child process. Note that on Windows, you cannot set close_fds to true and
also redirect the standard handles by setting stdin, stdout or stderr.
Changed in version 3.2:
Changed in version 3.2: The default for close_fds was changed from False to
what is described above.
pass_fds is an optional sequence of file descriptors to keep open
between the parent and child. Providing any pass_fds forces
close_fds to be True. (Unix only)
New in version 3.2:
New in version 3.2: The pass_fds parameter was added.
If cwd is not None, the child’s current directory will be changed to cwd
before it is executed. Note that this directory is not considered when
searching the executable, so you can’t specify the program’s path relative to
cwd.
If restore_signals is True (the default) all signals that Python has set to
SIG_IGN are restored to SIG_DFL in the child process before the exec.
Currently this includes the SIGPIPE, SIGXFZ and SIGXFSZ signals.
(Unix only)
Changed in version 3.2:
Changed in version 3.2: restore_signals was added.
If start_new_session is True the setsid() system call will be made in the
child process prior to the execution of the subprocess. (Unix only)
Changed in version 3.2:
Changed in version 3.2: start_new_session was added.
If env is not None, it must be a mapping that defines the environment
variables for the new process; these are used instead of the default
behavior of inheriting the current process’ environment.
Note
If specified, env must provide any variables required for the program to
execute. On Windows, in order to run a side-by-side assembly the
specified envmust include a valid SystemRoot.
If universal_newlines is True, the file objects stdout and stderr are
opened as text files, but lines may be terminated by any of '\n', the Unix
end-of-line convention, '\r', the old Macintosh convention or '\r\n', the
Windows convention. All of these external representations are seen as '\n'
by the Python program.
Note
This feature is only available if Python is built with universal newline
support (the default). Also, the newlines attribute of the file objects
stdout, stdin and stderr are not updated by the
communicate() method.
Run command with arguments. Wait for command to complete, then return the
returncode attribute.
The arguments are the same as for the Popen constructor. Example:
>>> retcode=subprocess.call(["ls","-l"])
Warning
Like Popen.wait(), this will deadlock when using
stdout=PIPE and/or stderr=PIPE and the child process
generates enough output to a pipe such that it blocks waiting
for the OS pipe buffer to accept more data.
Run command with arguments. Wait for command to complete. If the exit code was
zero then return, otherwise raise CalledProcessError. The
CalledProcessError object will have the return code in the
returncode attribute.
The arguments are the same as for the Popen constructor. Example:
Run command with arguments and return its output as a byte string.
If the exit code was non-zero it raises a CalledProcessError. The
CalledProcessError object will have the return code in the
returncode
attribute and output in the output attribute.
The arguments are the same as for the Popen constructor. Example:
The stdout argument is not allowed as it is used internally.
To capture standard error in the result, use stderr=subprocess.STDOUT:
>>> subprocess.check_output(... ["/bin/sh","-c","ls non_existent_file; exit 0"],... stderr=subprocess.STDOUT)b'ls: non_existent_file: No such file or directory\n'
Return (status,output) of executing cmd in a shell.
Execute the string cmd in a shell with os.popen() and return a 2-tuple
(status,output). cmd is actually run as {cmd;}2>&1, so that the
returned output will contain output or error messages. A trailing newline is
stripped from the output. The exit status for the command can be interpreted
according to the rules for the C function wait(). Example:
>>> subprocess.getstatusoutput('ls /bin/ls')(0, '/bin/ls')>>> subprocess.getstatusoutput('cat /bin/junk')(256, 'cat: /bin/junk: No such file or directory')>>> subprocess.getstatusoutput('/bin/junk')(256, 'sh: /bin/junk: not found')
Exceptions raised in the child process, before the new program has started to
execute, will be re-raised in the parent. Additionally, the exception object
will have one extra attribute called child_traceback, which is a string
containing traceback information from the child’s point of view.
The most common exception raised is OSError. This occurs, for example,
when trying to execute a non-existent file. Applications should prepare for
OSError exceptions.
A ValueError will be raised if Popen is called with invalid
arguments.
check_call() will raise CalledProcessError, if the called process returns
a non-zero return code.
Unlike some other popen functions, this implementation will never call /bin/sh
implicitly. This means that all characters, including shell metacharacters, can
safely be passed to child processes.
Wait for child process to terminate. Set and return returncode
attribute.
Warning
This will deadlock when using stdout=PIPE and/or
stderr=PIPE and the child process generates enough output to
a pipe such that it blocks waiting for the OS pipe buffer to
accept more data. Use communicate() to avoid that.
Interact with process: Send data to stdin. Read data from stdout and stderr,
until end-of-file is reached. Wait for process to terminate. The optional
input argument should be a byte string to be sent to the child process, or
None, if no data should be sent to the child.
communicate() returns a tuple (stdoutdata,stderrdata).
Note that if you want to send data to the process’s stdin, you need to create
the Popen object with stdin=PIPE. Similarly, to get anything other than
None in the result tuple, you need to give stdout=PIPE and/or
stderr=PIPE too.
Note
The data read is buffered in memory, so do not use this method if the data
size is large or unlimited.
On Windows, SIGTERM is an alias for terminate(). CTRL_C_EVENT and
CTRL_BREAK_EVENT can be sent to processes started with a creationflags
parameter which includes CREATE_NEW_PROCESS_GROUP.
Kills the child. On Posix OSs the function sends SIGKILL to the child.
On Windows kill() is an alias for terminate().
The following attributes are also available:
Warning
Use communicate() rather than .stdin.write,
.stdout.read or .stderr.read to avoid
deadlocks due to any of the other OS pipe buffers filling up and blocking the
child process.
If dwFlags specifies STARTF_USESTDHANDLES, this attribute
is the standard input handle for the process. If
STARTF_USESTDHANDLES is not specified, the default for standard
input is the keyboard buffer.
If dwFlags specifies STARTF_USESTDHANDLES, this attribute
is the standard output handle for the process. Otherwise, this attribute
is ignored and the default for standard output is the console window’s
buffer.
If dwFlags specifies STARTF_USESTDHANDLES, this attribute
is the standard error handle for the process. Otherwise, this attribute is
ignored and the default for standard error is the console window’s buffer.
If dwFlags specifies STARTF_USESHOWWINDOW, this attribute
can be any of the values that can be specified in the nCmdShow
parameter for the
ShowWindow
function, except for SW_SHOWDEFAULT. Otherwise, this attribute is
ignored.
SW_HIDE is provided for this attribute. It is used when
Popen is called with shell=True.
pipe = os.popen(cmd, 'w')
...
rc = pipe.close()
if rc is not None and rc >> 8:
print("There were some errors")
==>
process = Popen(cmd, 'w', stdin=PIPE)
...
process.stdin.close()
if process.wait() != 0:
print("There were some errors")
the capturestderr argument is replaced with the stderr argument.
stdin=PIPE and stdout=PIPE must be specified.
popen2 closes all file descriptors by default, but you have to specify
close_fds=True with Popen to guarantee this behavior on
all platforms or past Python versions.
Converting an argument sequence to a string on Windows¶
On Windows, an args sequence is converted to a string that can be parsed
using the following rules (which correspond to the rules used by the MS C
runtime):
Arguments are delimited by white space, which is either a
space or a tab.
A string surrounded by double quotation marks is
interpreted as a single argument, regardless of white space
contained within. A quoted string can be embedded in an
argument.
A double quotation mark preceded by a backslash is
interpreted as a literal double quotation mark.
Backslashes are interpreted literally, unless they
immediately precede a double quotation mark.
If backslashes immediately precede a double quotation mark,
every pair of backslashes is interpreted as a literal
backslash. If the number of backslashes is odd, the last
backslash escapes the next double quotation mark as
described in rule 3.
This module provides access to the BSD socket interface. It is available on
all modern Unix systems, Windows, MacOS, OS/2, and probably additional
platforms.
Note
Some behavior may be platform dependent, since calls are made to the operating
system socket APIs.
The Python interface is a straightforward transliteration of the Unix system
call and library interface for sockets to Python’s object-oriented style: the
socket() function returns a socket object whose methods implement
the various socket system calls. Parameter types are somewhat higher-level than
in the C interface: as with read() and write() operations on Python
files, buffer allocation on receive operations is automatic, and buffer length
is implicit on send operations.
Depending on the system and the build options, various socket families
are supported by this module.
Socket addresses are represented as follows:
A single string is used for the AF_UNIX address family.
A pair (host,port) is used for the AF_INET address family,
where host is a string representing either a hostname in Internet domain
notation like 'daring.cwi.nl' or an IPv4 address like '100.50.200.5',
and port is an integral port number.
For AF_INET6 address family, a four-tuple (host,port,flowinfo,scopeid) is used, where flowinfo and scopeid represent the sin6_flowinfo
and sin6_scope_id members in structsockaddr_in6 in C. For
socket module methods, flowinfo and scopeid can be omitted just for
backward compatibility. Note, however, omission of scopeid can cause problems
in manipulating scoped IPv6 addresses.
AF_NETLINK sockets are represented as pairs (pid,groups).
Linux-only support for TIPC is available using the AF_TIPC
address family. TIPC is an open, non-IP based networked protocol designed
for use in clustered computer environments. Addresses are represented by a
tuple, and the fields depend on the address type. The general tuple form is
(addr_type,v1,v2,v3[,scope]), where:
addr_type is one of TIPC_ADDR_NAMESEQ, TIPC_ADDR_NAME, or
TIPC_ADDR_ID.
scope is one of TIPC_ZONE_SCOPE, TIPC_CLUSTER_SCOPE, and
TIPC_NODE_SCOPE.
If addr_type is TIPC_ADDR_NAME, then v1 is the server type, v2 is
the port identifier, and v3 should be 0.
If addr_type is TIPC_ADDR_NAMESEQ, then v1 is the server type, v2
is the lower port number, and v3 is the upper port number.
If addr_type is TIPC_ADDR_ID, then v1 is the node, v2 is the
reference, and v3 should be set to 0.
If addr_type is TIPC_ADDR_ID, then v1 is the node, v2 is the
reference, and v3 should be set to 0.
Certain other address families (AF_BLUETOOTH, AF_PACKET)
support specific representations.
For IPv4 addresses, two special forms are accepted instead of a host address:
the empty string represents INADDR_ANY, and the string
'<broadcast>' represents INADDR_BROADCAST. This behavior is not
compatible with IPv6, therefore, you may want to avoid these if you intend
to support IPv6 with your Python programs.
If you use a hostname in the host portion of IPv4/v6 socket address, the
program may show a nondeterministic behavior, as Python uses the first address
returned from the DNS resolution. The socket address will be resolved
differently into an actual IPv4/v6 address, depending on the results from DNS
resolution and/or the host configuration. For deterministic behavior use a
numeric address in host portion.
All errors raise exceptions. The normal exceptions for invalid argument types
and out-of-memory conditions can be raised; errors related to socket or address
semantics raise socket.error or one of its subclasses.
Non-blocking mode is supported through setblocking(). A
generalization of this based on timeouts is supported through
settimeout().
A subclass of IOError, this exception is raised for socket-related
errors. It is recommended that you inspect its errno attribute to
discriminate between different kinds of errors.
See also
The errno module contains symbolic names for the error codes
defined by the underlying operating system.
A subclass of socket.error, this exception is raised for
address-related errors, i.e. for functions that use h_errno in the POSIX
C API, including gethostbyname_ex() and gethostbyaddr().
The accompanying value is a pair (h_errno,string) representing an
error returned by a library call. h_errno is a numeric value, while
string represents the description of h_errno, as returned by the
hstrerror() C function.
A subclass of socket.error, this exception is raised for
address-related errors by getaddrinfo() and getnameinfo().
The accompanying value is a pair (error,string) representing an error
returned by a library call. string represents the description of
error, as returned by the gai_strerror() C function. The
numeric error value will match one of the EAI_* constants
defined in this module.
A subclass of socket.error, this exception is raised when a timeout
occurs on a socket which has had timeouts enabled via a prior call to
settimeout() (or implicitly through
setdefaulttimeout()). The accompanying value is a string
whose value is currently always “timed out”.
These constants represent the address (and protocol) families, used for the
first argument to socket(). If the AF_UNIX constant is not
defined then this protocol is unsupported. More constants may be available
depending on the system.
These constants represent the socket types, used for the second argument to
socket(). More constants may be available depending on the system.
(Only SOCK_STREAM and SOCK_DGRAM appear to be generally
useful.)
These two constants, if defined, can be combined with the socket types and
allow you to set some flags atomically (thus avoiding possible race
conditions and the need for separate calls).
Many constants of these forms, documented in the Unix documentation on sockets
and/or the IP protocol, are also defined in the socket module. They are
generally used in arguments to the setsockopt() and getsockopt()
methods of socket objects. In most cases, only those symbols that are defined
in the Unix header files are defined; for a few symbols, default values are
provided.
SIO_*
RCVALL_*
Constants for Windows’ WSAIoctl(). The constants are used as arguments to the
ioctl() method of socket objects.
TIPC_*
TIPC related constants, matching the ones exported by the C socket API. See
the TIPC documentation for more information.
Convenience function. Connect to address (a 2-tuple (host,port)),
and return the socket object. Passing the optional timeout parameter will
set the timeout on the socket instance before attempting to connect. If no
timeout is supplied, the global default timeout setting returned by
getdefaulttimeout() is used.
If supplied, source_address must be a 2-tuple (host,port) for the
socket to bind to as its source address before connecting. If host or port
are ‘’ or 0 respectively the OS default behavior will be used.
Changed in version 3.2:
Changed in version 3.2: source_address was added.
Changed in version 3.2:
Changed in version 3.2: support for the with statement was added.
Translate the host/port argument into a sequence of 5-tuples that contain
all the necessary arguments for creating a socket connected to that service.
host is a domain name, a string representation of an IPv4/v6 address
or None. port is a string service name such as 'http', a numeric
port number or None. By passing None as the value of host
and port, you can pass NULL to the underlying C API.
The family, type and proto arguments can be optionally specified
in order to narrow the list of addresses returned. Passing zero as a
value for each of these arguments selects the full range of results.
The flags argument can be one or several of the AI_* constants,
and will influence how results are computed and returned.
For example, AI_NUMERICHOST will disable domain name resolution
and will raise an error if host is a domain name.
The function returns a list of 5-tuples with the following structure:
(family,type,proto,canonname,sockaddr)
In these tuples, family, type, proto are all integers and are
meant to be passed to the socket() function. canonname will be
a string representing the canonical name of the host if
AI_CANONNAME is part of the flags argument; else canonname
will be empty. sockaddr is a tuple describing a socket address, whose
format depends on the returned family (a (address,port) 2-tuple for
AF_INET, a (address,port,flowinfo,scopeid) 4-tuple for
AF_INET6), and is meant to be passed to the socket.connect()
method.
The following example fetches address information for a hypothetical TCP
connection to www.python.org on port 80 (results may differ on your
system if IPv6 isn’t enabled):
Return a fully qualified domain name for name. If name is omitted or empty,
it is interpreted as the local host. To find the fully qualified name, the
hostname returned by gethostbyaddr() is checked, followed by aliases for the
host, if available. The first name which includes a period is selected. In
case no fully qualified domain name is available, the hostname as returned by
gethostname() is returned.
Translate a host name to IPv4 address format. The IPv4 address is returned as a
string, such as '100.50.200.5'. If the host name is an IPv4 address itself
it is returned unchanged. See gethostbyname_ex() for a more complete
interface. gethostbyname() does not support IPv6 name resolution, and
getaddrinfo() should be used instead for IPv4/v6 dual stack support.
Translate a host name to IPv4 address format, extended interface. Return a
triple (hostname,aliaslist,ipaddrlist) where hostname is the primary
host name responding to the given ip_address, aliaslist is a (possibly
empty) list of alternative host names for the same address, and ipaddrlist is
a list of IPv4 addresses for the same interface on the same host (often but not
always a single address). gethostbyname_ex() does not support IPv6 name
resolution, and getaddrinfo() should be used instead for IPv4/v6 dual
stack support.
Return a string containing the hostname of the machine where the Python
interpreter is currently executing.
If you want to know the current machine’s IP address, you may want to use
gethostbyname(gethostname()). This operation assumes that there is a
valid address-to-host mapping for the host, and the assumption does not
always hold.
Note: gethostname() doesn’t always return the fully qualified domain
name; use getfqdn() (see above).
Return a triple (hostname,aliaslist,ipaddrlist) where hostname is the
primary host name responding to the given ip_address, aliaslist is a
(possibly empty) list of alternative host names for the same address, and
ipaddrlist is a list of IPv4/v6 addresses for the same interface on the same
host (most likely containing only a single address). To find the fully qualified
domain name, use the function getfqdn(). gethostbyaddr() supports
both IPv4 and IPv6.
Translate a socket address sockaddr into a 2-tuple (host,port). Depending
on the settings of flags, the result can contain a fully-qualified domain name
or numeric address representation in host. Similarly, port can contain a
string port name or a numeric port number.
Translate an Internet protocol name (for example, 'icmp') to a constant
suitable for passing as the (optional) third argument to the socket()
function. This is usually only needed for sockets opened in “raw” mode
(SOCK_RAW); for the normal socket modes, the correct protocol is chosen
automatically if the protocol is omitted or zero.
Translate an Internet service name and protocol name to a port number for that
service. The optional protocol name, if given, should be 'tcp' or
'udp', otherwise any protocol will match.
Translate an Internet port number and protocol name to a service name for that
service. The optional protocol name, if given, should be 'tcp' or
'udp', otherwise any protocol will match.
Create a new socket using the given address family, socket type and protocol
number. The address family should be AF_INET (the default),
AF_INET6 or AF_UNIX. The socket type should be
SOCK_STREAM (the default), SOCK_DGRAM or perhaps one of the
other SOCK_ constants. The protocol number is usually zero and may be
omitted in that case.
Build a pair of connected socket objects using the given address family, socket
type, and protocol number. Address family, socket type, and protocol number are
as for the socket() function above. The default family is AF_UNIX
if defined on the platform; otherwise, the default is AF_INET.
Availability: Unix.
Changed in version 3.2:
Changed in version 3.2: The returned socket objects now support the whole socket API, rather
than a subset.
Duplicate the file descriptor fd (an integer as returned by a file object’s
fileno() method) and build a socket object from the result. Address
family, socket type and protocol number are as for the socket() function
above. The file descriptor should refer to a socket, but this is not checked —
subsequent operations on the object may fail if the file descriptor is invalid.
This function is rarely needed, but can be used to get or set socket options on
a socket passed to a program as standard input or output (such as a server
started by the Unix inet daemon). The socket is assumed to be in blocking mode.
Convert 32-bit positive integers from network to host byte order. On machines
where the host byte order is the same as network byte order, this is a no-op;
otherwise, it performs a 4-byte swap operation.
Convert 16-bit positive integers from network to host byte order. On machines
where the host byte order is the same as network byte order, this is a no-op;
otherwise, it performs a 2-byte swap operation.
Convert 32-bit positive integers from host to network byte order. On machines
where the host byte order is the same as network byte order, this is a no-op;
otherwise, it performs a 4-byte swap operation.
Convert 16-bit positive integers from host to network byte order. On machines
where the host byte order is the same as network byte order, this is a no-op;
otherwise, it performs a 2-byte swap operation.
Convert an IPv4 address from dotted-quad string format (for example,
‘123.45.67.89’) to 32-bit packed binary format, as a bytes object four characters in
length. This is useful when conversing with a program that uses the standard C
library and needs objects of type structin_addr, which is the C type
for the 32-bit packed binary this function returns.
inet_aton() also accepts strings with less than three dots; see the
Unix manual page inet(3) for details.
If the IPv4 address string passed to this function is invalid,
socket.error will be raised. Note that exactly what is valid depends on
the underlying C implementation of inet_aton().
inet_aton() does not support IPv6, and inet_pton() should be used
instead for IPv4/v6 dual stack support.
Convert a 32-bit packed IPv4 address (a bytes object four characters in
length) to its standard dotted-quad string representation (for example,
‘123.45.67.89’). This is useful when conversing with a program that uses the
standard C library and needs objects of type structin_addr, which
is the C type for the 32-bit packed binary data this function takes as an
argument.
If the byte sequence passed to this function is not exactly 4 bytes in
length, socket.error will be raised. inet_ntoa() does not
support IPv6, and inet_ntop() should be used instead for IPv4/v6 dual
stack support.
Convert an IP address from its family-specific string format to a packed,
binary format. inet_pton() is useful when a library or network protocol
calls for an object of type structin_addr (similar to
inet_aton()) or structin6_addr.
Supported values for address_family are currently AF_INET and
AF_INET6. If the IP address string ip_string is invalid,
socket.error will be raised. Note that exactly what is valid depends on
both the value of address_family and the underlying implementation of
inet_pton().
Convert a packed IP address (a bytes object of some number of characters) to its
standard, family-specific string representation (for example, '7.10.0.5' or
'5aef:2b::8'). inet_ntop() is useful when a library or network protocol
returns an object of type structin_addr (similar to inet_ntoa())
or structin6_addr.
Supported values for address_family are currently AF_INET and
AF_INET6. If the string packed_ip is not the correct length for the
specified address family, ValueError will be raised. A
socket.error is raised for errors from the call to inet_ntop().
Return the default timeout in seconds (float) for new socket objects. A value
of None indicates that new socket objects have no timeout. When the socket
module is first imported, the default is None.
Set the default timeout in seconds (float) for new socket objects. When
the socket module is first imported, the default is None. See
settimeout() for possible values and their respective
meanings.
Accept a connection. The socket must be bound to an address and listening for
connections. The return value is a pair (conn,address) where conn is a
new socket object usable to send and receive data on the connection, and
address is the address bound to the socket on the other end of the connection.
Close the socket. All future operations on the socket object will fail. The
remote end will receive no more data (after queued data is flushed). Sockets are
automatically closed when they are garbage-collected.
Note
close() releases the resource associated with a connection but
does not necessarily close the connection immediately. If you want
to close the connection in a timely fashion, call shutdown()
before close().
Like connect(address), but return an error indicator instead of raising an
exception for errors returned by the C-level connect() call (other
problems, such as “host not found,” can still raise exceptions). The error
indicator is 0 if the operation succeeded, otherwise the value of the
errno variable. This is useful to support, for example, asynchronous
connects.
Put the socket object into closed state without actually closing the
underlying file descriptor. The file descriptor is returned, and can
be reused for other purposes.
Return the socket’s file descriptor (a small integer). This is useful with
select.select().
Under Windows the small integer returned by this method cannot be used where a
file descriptor can be used (such as os.fdopen()). Unix does not have
this limitation.
Return the remote address to which the socket is connected. This is useful to
find out the port number of a remote IPv4/v6 socket, for instance. (The format
of the address returned depends on the address family — see above.) On some
systems this function is not supported.
Return the socket’s own address. This is useful to find out the port number of
an IPv4/v6 socket, for instance. (The format of the address returned depends on
the address family — see above.)
Return the value of the given socket option (see the Unix man page
getsockopt(2)). The needed symbolic constants (SO_* etc.)
are defined in this module. If buflen is absent, an integer option is assumed
and its integer value is returned by the function. If buflen is present, it
specifies the maximum length of the buffer used to receive the option in, and
this buffer is returned as a bytes object. It is up to the caller to decode the
contents of the buffer (see the optional built-in module struct for a way
to decode C structures encoded as byte strings).
Return the timeout in seconds (float) associated with socket operations,
or None if no timeout is set. This reflects the last call to
setblocking() or settimeout().
Listen for connections made to the socket. The backlog argument specifies the
maximum number of queued connections and should be at least 0; the maximum value
is system-dependent (usually 5), the minimum value is forced to 0.
Return a file object associated with the socket. The exact returned
type depends on the arguments given to makefile(). These arguments are
interpreted the same way as by the built-in open() function.
Closing the file object won’t close the socket unless there are no remaining
references to the socket. The socket must be in blocking mode; it can have
a timeout, but the file object’s internal buffer may end up in a inconsistent
state if a timeout occurs.
Note
On Windows, the file-like object created by makefile() cannot be
used where a file object with a file descriptor is expected, such as the
stream arguments of subprocess.Popen().
Receive data from the socket. The return value is a bytes object representing the
data received. The maximum amount of data to be received at once is specified
by bufsize. See the Unix manual page recv(2) for the meaning of
the optional argument flags; it defaults to zero.
Note
For best match with hardware and network realities, the value of bufsize
should be a relatively small power of 2, for example, 4096.
Receive data from the socket. The return value is a pair (bytes,address)
where bytes is a bytes object representing the data received and address is the
address of the socket sending the data. See the Unix manual page
recv(2) for the meaning of the optional argument flags; it defaults
to zero. (The format of address depends on the address family — see above.)
Receive data from the socket, writing it into buffer instead of creating a
new bytestring. The return value is a pair (nbytes,address) where nbytes is
the number of bytes received and address is the address of the socket sending
the data. See the Unix manual page recv(2) for the meaning of the
optional argument flags; it defaults to zero. (The format of address
depends on the address family — see above.)
Receive up to nbytes bytes from the socket, storing the data into a buffer
rather than creating a new bytestring. If nbytes is not specified (or 0),
receive up to the size available in the given buffer. Returns the number of
bytes received. See the Unix manual page recv(2) for the meaning
of the optional argument flags; it defaults to zero.
Send data to the socket. The socket must be connected to a remote socket. The
optional flags argument has the same meaning as for recv() above.
Returns the number of bytes sent. Applications are responsible for checking that
all data has been sent; if only some of the data was transmitted, the
application needs to attempt delivery of the remaining data.
Send data to the socket. The socket must be connected to a remote socket. The
optional flags argument has the same meaning as for recv() above.
Unlike send(), this method continues to send data from bytes until
either all data has been sent or an error occurs. None is returned on
success. On error, an exception is raised, and there is no way to determine how
much data, if any, was successfully sent.
Send data to the socket. The socket should not be connected to a remote socket,
since the destination socket is specified by address. The optional flags
argument has the same meaning as for recv() above. Return the number of
bytes sent. (The format of address depends on the address family — see
above.)
Set a timeout on blocking socket operations. The value argument can be a
nonnegative floating point number expressing seconds, or None.
If a non-zero value is given, subsequent socket operations will raise a
timeout exception if the timeout period value has elapsed before
the operation has completed. If zero is given, the socket is put in
non-blocking mode. If None is given, the socket is put in blocking mode.
Set the value of the given socket option (see the Unix manual page
setsockopt(2)). The needed symbolic constants are defined in the
socket module (SO_* etc.). The value can be an integer or a
bytes object representing a buffer. In the latter case it is up to the caller to
ensure that the bytestring contains the proper bits (see the optional built-in
module struct for a way to encode C structures as bytestrings).
Shut down one or both halves of the connection. If how is SHUT_RD,
further receives are disallowed. If how is SHUT_WR, further sends
are disallowed. If how is SHUT_RDWR, further sends and receives are
disallowed. Depending on the platform, shutting down one half of the connection
can also close the opposite half (e.g. on Mac OS X, shutdown(SHUT_WR) does
not allow further reads on the other end of the connection).
Note that there are no methods read() or write(); use
recv() and send() without flags argument instead.
Socket objects also have these (read-only) attributes that correspond to the
values given to the socket constructor.
A socket object can be in one of three modes: blocking, non-blocking, or
timeout. Sockets are by default always created in blocking mode, but this
can be changed by calling setdefaulttimeout().
In blocking mode, operations block until complete or the system returns
an error (such as connection timed out).
In non-blocking mode, operations fail (with an error that is unfortunately
system-dependent) if they cannot be completed immediately: functions from the
select can be used to know when and whether a socket is available for
reading or writing.
In timeout mode, operations fail if they cannot be completed within the
timeout specified for the socket (they raise a timeout exception)
or if the system returns an error.
Note
At the operating system level, sockets in timeout mode are internally set
in non-blocking mode. Also, the blocking and timeout modes are shared between
file descriptors and socket objects that refer to the same network endpoint.
This implementation detail can have visible consequences if e.g. you decide
to use the fileno() of a socket.
The connect() operation is also subject to the timeout
setting, and in general it is recommended to call settimeout()
before calling connect() or pass a timeout parameter to
create_connection(). However, the system network stack may also
return a connection timeout error of its own regardless of any Python socket
timeout setting.
If getdefaulttimeout() is not None, sockets returned by
the accept() method inherit that timeout. Otherwise, the
behaviour depends on settings of the listening socket:
if the listening socket is in blocking mode or in timeout mode,
the socket returned by accept() is in blocking mode;
if the listening socket is in non-blocking mode, whether the socket
returned by accept() is in blocking or non-blocking mode
is operating system-dependent. If you want to ensure cross-platform
behaviour, it is recommended you manually override this setting.
Here are four minimal example programs using the TCP/IP protocol: a server that
echoes all data that it receives back (servicing only one client), and a client
using it. Note that a server must perform the sequence socket(),
bind(), listen(), accept() (possibly
repeating the accept() to service more than one client), while a
client only needs the sequence socket(), connect(). Also
note that the server does not send()/recv() on the
socket it is listening on but on the new socket returned by
accept().
The first two examples support IPv4 only.
# Echo server programimportsocketHOST=''# Symbolic name meaning all available interfacesPORT=50007# Arbitrary non-privileged ports=socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.bind((HOST,PORT))s.listen(1)conn,addr=s.accept()print('Connected by',addr)whileTrue:data=conn.recv(1024)ifnotdata:breakconn.send(data)conn.close()
# Echo client programimportsocketHOST='daring.cwi.nl'# The remote hostPORT=50007# The same port as used by the servers=socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.connect((HOST,PORT))s.send(b'Hello, world')data=s.recv(1024)s.close()print('Received',repr(data))
The next two examples are identical to the above two, but support both IPv4 and
IPv6. The server side will listen to the first address family available (it
should listen to both instead). On most of IPv6-ready systems, IPv6 will take
precedence and the server may not accept IPv4 traffic. The client side will try
to connect to the all addresses returned as a result of the name resolution, and
sends traffic to the first one connected successfully.
# Echo server programimportsocketimportsysHOST=None# Symbolic name meaning all available interfacesPORT=50007# Arbitrary non-privileged ports=Noneforresinsocket.getaddrinfo(HOST,PORT,socket.AF_UNSPEC,socket.SOCK_STREAM,0,socket.AI_PASSIVE):af,socktype,proto,canonname,sa=restry:s=socket.socket(af,socktype,proto)exceptsocket.errorasmsg:s=Nonecontinuetry:s.bind(sa)s.listen(1)exceptsocket.errorasmsg:s.close()s=NonecontinuebreakifsisNone:print('could not open socket')sys.exit(1)conn,addr=s.accept()print('Connected by',addr)whileTrue:data=conn.recv(1024)ifnotdata:breakconn.send(data)conn.close()
# Echo client programimportsocketimportsysHOST='daring.cwi.nl'# The remote hostPORT=50007# The same port as used by the servers=Noneforresinsocket.getaddrinfo(HOST,PORT,socket.AF_UNSPEC,socket.SOCK_STREAM):af,socktype,proto,canonname,sa=restry:s=socket.socket(af,socktype,proto)exceptsocket.errorasmsg:s=Nonecontinuetry:s.connect(sa)exceptsocket.errorasmsg:s.close()s=NonecontinuebreakifsisNone:print('could not open socket')sys.exit(1)s.send(b'Hello, world')data=s.recv(1024)s.close()print('Received',repr(data))
The last example shows how to write a very simple network sniffer with raw
sockets on Windows. The example requires administrator privileges to modify
the interface:
importsocket# the public network interfaceHOST=socket.gethostbyname(socket.gethostname())# create a raw socket and bind it to the public interfaces=socket.socket(socket.AF_INET,socket.SOCK_RAW,socket.IPPROTO_IP)s.bind((HOST,0))# Include IP headerss.setsockopt(socket.IPPROTO_IP,socket.IP_HDRINCL,1)# receive all packagess.ioctl(socket.SIO_RCVALL,socket.RCVALL_ON)# receive a packageprint(s.recvfrom(65565))# disabled promiscuous modes.ioctl(socket.SIO_RCVALL,socket.RCVALL_OFF)
Running an example several times with too small delay between executions, could
lead to this error:
socket.error: [Errno 98] Address already in use
This is because the previous execution has left the socket in a TIME_WAIT
state, and can’t be immediately reused.
There is a socket flag to set, in order to prevent this,
socket.SO_REUSEADDR:
the SO_REUSEADDR flag tells the kernel to reuse a local socket in
TIME_WAIT state, without waiting for its natural timeout to expire.
See also
For an introduction to socket programming (in C), see the following papers:
An Introductory 4.3BSD Interprocess Communication Tutorial, by Stuart Sechrest
An Advanced 4.3BSD Interprocess Communication Tutorial, by Samuel J. Leffler et
al,
both in the UNIX Programmer’s Manual, Supplementary Documents 1 (sections
PS1:7 and PS1:8). The platform-specific reference material for the various
socket-related system calls are also a valuable source of information on the
details of socket semantics. For Unix, refer to the manual pages; for Windows,
see the WinSock (or Winsock 2) specification. For IPv6-ready APIs, readers may
want to refer to RFC 3493 titled Basic Socket Interface Extensions for IPv6.
This module provides access to Transport Layer Security (often known as “Secure
Sockets Layer”) encryption and peer authentication facilities for network
sockets, both client-side and server-side. This module uses the OpenSSL
library. It is available on all modern Unix systems, Windows, Mac OS X, and
probably additional platforms, as long as OpenSSL is installed on that platform.
Note
Some behavior may be platform dependent, since calls are made to the
operating system socket APIs. The installed version of OpenSSL may also
cause variations in behavior.
This section documents the objects and functions in the ssl module; for more
general information about TLS, SSL, and certificates, the reader is referred to
the documents in the “See Also” section at the bottom.
This module provides a class, ssl.SSLSocket, which is derived from the
socket.socket type, and provides a socket-like wrapper that also
encrypts and decrypts the data going over the socket with SSL. It supports
additional methods such as getpeercert(), which retrieves the
certificate of the other side of the connection, and cipher(),which
retrieves the cipher being used for the secure connection.
For more sophisticated applications, the ssl.SSLContext class
helps manage settings and certificates, which can then be inherited
by SSL sockets created through the SSLContext.wrap_socket() method.
Raised to signal an error from the underlying SSL implementation
(currently provided by the OpenSSL library). This signifies some
problem in the higher-level encryption and authentication layer that’s
superimposed on the underlying network connection. This error
is a subtype of socket.error, which in turn is a subtype of
IOError. The error code and message of SSLError instances
are provided by the OpenSSL library.
The following function allows for standalone socket creation. Starting from
Python 3.2, it can be more flexible to use SSLContext.wrap_socket()
instead.
Takes an instance sock of socket.socket, and returns an instance
of ssl.SSLSocket, a subtype of socket.socket, which wraps
the underlying socket in an SSL context. For client-side sockets, the
context construction is lazy; if the underlying socket isn’t connected yet,
the context construction will be performed after connect() is called on
the socket. For server-side sockets, if the socket has no remote peer, it is
assumed to be a listening socket, and the server-side SSL wrapping is
automatically performed on client connections accepted via the accept()
method. wrap_socket() may raise SSLError.
The keyfile and certfile parameters specify optional files which
contain a certificate to be used to identify the local side of the
connection. See the discussion of Certificates for more
information on how the certificate is stored in the certfile.
The parameter server_side is a boolean which identifies whether
server-side or client-side behavior is desired from this socket.
The parameter cert_reqs specifies whether a certificate is required from
the other side of the connection, and whether it will be validated if
provided. It must be one of the three values CERT_NONE
(certificates ignored), CERT_OPTIONAL (not required, but validated
if provided), or CERT_REQUIRED (required and validated). If the
value of this parameter is not CERT_NONE, then the ca_certs
parameter must point to a file of CA certificates.
The ca_certs file contains a set of concatenated “certification
authority” certificates, which are used to validate certificates passed from
the other end of the connection. See the discussion of
Certificates for more information about how to arrange the
certificates in this file.
The parameter ssl_version specifies which version of the SSL protocol to
use. Typically, the server chooses a particular protocol version, and the
client must adapt to the server’s choice. Most of the versions are not
interoperable with the other versions. If not specified, for client-side
operation, the default SSL version is SSLv3; for server-side operation,
SSLv23. These version selections provide the most compatibility with other
versions.
Here’s a table showing which versions in a client (down the side) can connect
to which versions in a server (along the top):
client / server
SSLv2
SSLv3
SSLv23
TLSv1
SSLv2
yes
no
yes
no
SSLv3
yes
yes
yes
no
SSLv23
yes
no
yes
no
TLSv1
no
no
yes
yes
Note
Which connections succeed will vary depending on the version of
OpenSSL. For instance, in some older versions of OpenSSL (such
as 0.9.7l on OS X 10.4), an SSLv2 client could not connect to an
SSLv23 server. Another example: beginning with OpenSSL 1.0.0,
an SSLv23 client will not actually attempt SSLv2 connections
unless you explicitly enable SSLv2 ciphers; for example, you
might specify "ALL" or "SSLv2" as the ciphers parameter
to enable them.
The ciphers parameter sets the available ciphers for this SSL object.
It should be a string in the OpenSSL cipher list format.
The parameter do_handshake_on_connect specifies whether to do the SSL
handshake automatically after doing a socket.connect(), or whether the
application program will call it explicitly, by invoking the
SSLSocket.do_handshake() method. Calling
SSLSocket.do_handshake() explicitly gives the program control over the
blocking behavior of the socket I/O involved in the handshake.
The parameter suppress_ragged_eofs specifies how the
SSLSocket.recv() method should signal unexpected EOF from the other end
of the connection. If specified as True (the default), it returns a
normal EOF (an empty bytes object) in response to unexpected EOF errors
raised from the underlying socket; if False, it will raise the
exceptions back to the caller.
Changed in version 3.2:
Changed in version 3.2: New optional argument ciphers.
Returns True if the SSL pseudo-random number generator has been seeded with
‘enough’ randomness, and False otherwise. You can use ssl.RAND_egd()
and ssl.RAND_add() to increase the randomness of the pseudo-random
number generator.
If you are running an entropy-gathering daemon (EGD) somewhere, and path
is the pathname of a socket connection open to it, this will read 256 bytes
of randomness from the socket, and add it to the SSL pseudo-random number
generator to increase the security of generated secret keys. This is
typically only necessary on systems without better sources of randomness.
Mixes the given bytes into the SSL pseudo-random number generator. The
parameter entropy (a float) is a lower bound on the entropy contained in
string (so you can always use 0.0). See RFC 1750 for more
information on sources of entropy.
Verify that cert (in decoded format as returned by
SSLSocket.getpeercert()) matches the given hostname. The rules
applied are those for checking the identity of HTTPS servers as outlined
in RFC 2818, except that IP addresses are not currently supported.
In addition to HTTPS, this function should be suitable for checking the
identity of servers in various SSL-based protocols such as FTPS, IMAPS,
POPS and others.
CertificateError is raised on failure. On success, the function
returns nothing:
>>> cert={'subject':((('commonName','example.com'),),)}>>> ssl.match_hostname(cert,"example.com")>>> ssl.match_hostname(cert,"example.org")Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/py3k/Lib/ssl.py", line 130, in match_hostnamessl.CertificateError: hostname 'example.org' doesn't match 'example.com'
Returns a floating-point value containing a normal seconds-after-the-epoch
time value, given the time-string representing the “notBefore” or “notAfter”
date from a certificate.
Given the address addr of an SSL-protected server, as a (hostname,
port-number) pair, fetches the server’s certificate, and returns it as a
PEM-encoded string. If ssl_version is specified, uses that version of
the SSL protocol to attempt to connect to the server. If ca_certs is
specified, it should be a file containing a list of root certificates, the
same format as used for the same parameter in wrap_socket(). The call
will attempt to validate the server certificate against that set of root
certificates, and will fail if the validation attempt fails.
Possible value for SSLContext.verify_mode, or the cert_reqs
parameter to wrap_socket(). In this mode (the default), no
certificates will be required from the other side of the socket connection.
If a certificate is received from the other end, no attempt to validate it
is made.
Possible value for SSLContext.verify_mode, or the cert_reqs
parameter to wrap_socket(). In this mode no certificates will be
required from the other side of the socket connection; but if they
are provided, validation will be attempted and an SSLError
will be raised on failure.
Possible value for SSLContext.verify_mode, or the cert_reqs
parameter to wrap_socket(). In this mode, certificates are
required from the other side of the socket connection; an SSLError
will be raised if no certificate is provided, or if its validation fails.
Selects SSL version 2 or 3 as the channel encryption protocol. This is a
setting to use with servers for maximum compatibility with the other end of
an SSL connection, but it may cause the specific ciphers chosen for the
encryption to be of fairly low quality.
Selects TLS version 1 as the channel encryption protocol. This is the most
modern version, and probably the best choice for maximum protection, if both
sides can speak it.
Prevents an SSLv2 connection. This option is only applicable in
conjunction with PROTOCOL_SSLv23. It prevents the peers from
choosing SSLv2 as the protocol version.
Prevents an SSLv3 connection. This option is only applicable in
conjunction with PROTOCOL_SSLv23. It prevents the peers from
choosing SSLv3 as the protocol version.
Prevents a TLSv1 connection. This option is only applicable in
conjunction with PROTOCOL_SSLv23. It prevents the peers from
choosing TLSv1 as the protocol version.
Whether the OpenSSL library has built-in support for the Server Name
Indication extension to the SSLv3 and TLSv1 protocols (as defined in
RFC 4366). When true, you can use the server_hostname argument to
SSLContext.wrap_socket().
However, since the SSL (and TLS) protocol has its own framing atop
of TCP, the SSL sockets abstraction can, in certain respects, diverge from
the specification of normal, OS-level sockets. See especially the
notes on non-blocking sockets.
SSL sockets also have the following additional methods and attributes:
If there is no certificate for the peer on the other end of the connection,
returns None.
If the parameter binary_form is False, and a certificate was
received from the peer, this method returns a dict instance. If the
certificate was not validated, the dict is empty. If the certificate was
validated, it returns a dict with the keys subject (the principal for
which the certificate was issued), and notAfter (the time after which the
certificate should not be trusted). If a certificate contains an instance
of the Subject Alternative Name extension (see RFC 3280), there will
also be a subjectAltName key in the dictionary.
The “subject” field is a tuple containing the sequence of relative
distinguished names (RDNs) given in the certificate’s data structure for the
principal, and each RDN is a sequence of name-value pairs:
If the binary_form parameter is True, and a certificate was
provided, this method returns the DER-encoded form of the entire certificate
as a sequence of bytes, or None if the peer did not provide a
certificate. This return value is independent of validation; if validation
was required (CERT_OPTIONAL or CERT_REQUIRED), it will have
been validated, but if CERT_NONE was used to establish the
connection, the certificate, if present, will not have been validated.
Changed in version 3.2:
Changed in version 3.2: The returned dictionary includes additional items such as issuer
and notBefore.
Returns a three-value tuple containing the name of the cipher being used, the
version of the SSL protocol that defines its use, and the number of secret
bits being used. If no connection has been established, returns None.
Performs the SSL shutdown handshake, which removes the TLS layer from the
underlying socket, and returns the underlying socket object. This can be
used to go from encrypted operation over a connection to unencrypted. The
returned socket should always be used for further communication with the
other side of the connection, rather than the original socket.
The SSLContext object this SSL socket is tied to. If the SSL
socket was created using the top-level wrap_socket() function
(rather than SSLContext.wrap_socket()), this is a custom context
object created for this SSL socket.
An SSL context holds various data longer-lived than single SSL connections,
such as SSL configuration options, certificate(s) and private key(s).
It also manages a cache of SSL sessions for server-side sockets, in order
to speed up repeated connections from the same clients.
Create a new SSL context. You must pass protocol which must be one
of the PROTOCOL_* constants defined in this module.
PROTOCOL_SSLv23 is recommended for maximum interoperability.
SSLContext objects have the following methods and attributes:
Load a private key and the corresponding certificate. The certfile
string must be the path to a single file in PEM format containing the
certificate as well as any number of CA certificates needed to establish
the certificate’s authenticity. The keyfile string, if present, must
point to a file containing the private key in. Otherwise the private
key will be taken from certfile as well. See the discussion of
Certificates for more information on how the certificate
is stored in the certfile.
An SSLError is raised if the private key doesn’t
match with the certificate.
Load a set of “certification authority” (CA) certificates used to validate
other peers’ certificates when verify_mode is other than
CERT_NONE. At least one of cafile or capath must be specified.
The cafile string, if present, is the path to a file of concatenated
CA certificates in PEM format. See the discussion of
Certificates for more information about how to arrange the
certificates in this file.
The capath string, if present, is
the path to a directory containing several CA certificates in PEM format,
following an OpenSSL specific layout.
Load a set of default “certification authority” (CA) certificates from
a filesystem path defined when building the OpenSSL library. Unfortunately,
there’s no easy way to know whether this method succeeds: no error is
returned if no certificates are to be found. When the OpenSSL library is
provided as part of the operating system, though, it is likely to be
configured properly.
Set the available ciphers for sockets created with this context.
It should be a string in the OpenSSL cipher list format.
If no cipher can be selected (because compile-time options or other
configuration forbids use of all the specified ciphers), an
SSLError will be raised.
Note
when connected, the SSLSocket.cipher() method of SSL sockets will
give the currently selected cipher.
Wrap an existing Python socket sock and return an SSLSocket
object. The SSL socket is tied to the context, its settings and
certificates. The parameters server_side, do_handshake_on_connect
and suppress_ragged_eofs have the same meaning as in the top-level
wrap_socket() function.
On client connections, the optional parameter server_hostname specifies
the hostname of the service which we are connecting to. This allows a
single server to host multiple SSL-based services with distinct certificates,
quite similarly to HTTP virtual hosts. Specifying server_hostname
will raise a ValueError if the OpenSSL library doesn’t have support
for it (that is, if HAS_SNI is False). Specifying
server_hostname will also raise a ValueError if server_side
is true.
Get statistics about the SSL sessions created or managed by this context.
A dictionary is returned which maps the names of each piece of information to their
numeric values. For example, here is the total number of hits and misses
in the session cache since the context was created:
An integer representing the set of SSL options enabled on this context.
The default value is OP_ALL, but you can specify other options
such as OP_NO_SSLv2 by ORing them together.
Note
With versions of OpenSSL older than 0.9.8m, it is only possible
to set options, not to clear them. Attempting to clear an option
(by resetting the corresponding bits) will raise a ValueError.
Whether to try to verify other peers’ certificates and how to behave
if verification fails. This attribute must be one of
CERT_NONE, CERT_OPTIONAL or CERT_REQUIRED.
Certificates in general are part of a public-key / private-key system. In this
system, each principal, (which may be a machine, or a person, or an
organization) is assigned a unique two-part encryption key. One part of the key
is public, and is called the public key; the other part is kept secret, and is
called the private key. The two parts are related, in that if you encrypt a
message with one of the parts, you can decrypt it with the other part, and
only with the other part.
A certificate contains information about two principals. It contains the name
of a subject, and the subject’s public key. It also contains a statement by a
second principal, the issuer, that the subject is who he claims to be, and
that this is indeed the subject’s public key. The issuer’s statement is signed
with the issuer’s private key, which only the issuer knows. However, anyone can
verify the issuer’s statement by finding the issuer’s public key, decrypting the
statement with it, and comparing it to the other information in the certificate.
The certificate also contains information about the time period over which it is
valid. This is expressed as two fields, called “notBefore” and “notAfter”.
In the Python use of certificates, a client or server can use a certificate to
prove who they are. The other side of a network connection can also be required
to produce a certificate, and that certificate can be validated to the
satisfaction of the client or server that requires such validation. The
connection attempt can be set to raise an exception if the validation fails.
Validation is done automatically, by the underlying OpenSSL framework; the
application need not concern itself with its mechanics. But the application
does usually need to provide sets of certificates to allow this process to take
place.
Python uses files to contain certificates. They should be formatted as “PEM”
(see RFC 1422), which is a base-64 encoded form wrapped with a header line
and a footer line:
The Python files which contain certificates can contain a sequence of
certificates, sometimes called a certificate chain. This chain should start
with the specific certificate for the principal who “is” the client or server,
and then the certificate for the issuer of that certificate, and then the
certificate for the issuer of that certificate, and so on up the chain till
you get to a certificate which is self-signed, that is, a certificate which
has the same subject and issuer, sometimes called a root certificate. The
certificates should just be concatenated together in the certificate file. For
example, suppose we had a three certificate chain, from our server certificate
to the certificate of the certification authority that signed our server
certificate, to the root certificate of the agency which issued the
certification authority’s certificate:
-----BEGIN CERTIFICATE-----
... (certificate for your server)...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
... (the certificate for the CA)...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
... (the root certificate for the CA's issuer)...
-----END CERTIFICATE-----
If you are going to require validation of the other side of the connection’s
certificate, you need to provide a “CA certs” file, filled with the certificate
chains for each issuer you are willing to trust. Again, this file just contains
these chains concatenated together. For validation, Python will use the first
chain it finds in the file which matches. Some “standard” root certificates are
available from various certification authorities: CACert.org, Thawte, Verisign, Positive SSL
(used by python.org), Equifax and GeoTrust.
In general, if you are using SSL3 or TLS1, you don’t need to put the full chain
in your “CA certs” file; you only need the root certificates, and the remote
peer is supposed to furnish the other certificates necessary to chain from its
certificate to a root certificate. See RFC 4158 for more discussion of the
way in which certification chains can be built.
Often the private key is stored in the same file as the certificate; in this
case, only the certfile parameter to SSLContext.load_cert_chain()
and wrap_socket() needs to be passed. If the private key is stored
with the certificate, it should come before the first certificate in
the certificate chain:
If you are going to create a server that provides SSL-encrypted connection
services, you will need to acquire a certificate for that service. There are
many ways of acquiring appropriate certificates, such as buying one from a
certification authority. Another common practice is to generate a self-signed
certificate. The simplest way to do this is with the OpenSSL package, using
something like the following:
% openssl req -new -x509 -days 365 -nodes -out cert.pem -keyout cert.pem
Generating a 1024 bit RSA private key
.......++++++
.............................++++++
writing new private key to 'cert.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:MyState
Locality Name (eg, city) []:Some City
Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Organization, Inc.
Organizational Unit Name (eg, section) []:My Group
Common Name (eg, YOUR name) []:myserver.mygroup.myorganization.com
Email Address []:ops@myserver.mygroup.myorganization.com
%
The disadvantage of a self-signed certificate is that it is its own root
certificate, and no one else will have it in their cache of known (and trusted)
root certificates.
This example connects to an SSL server and prints the server’s certificate:
importsocket,ssl,pprints=socket.socket(socket.AF_INET,socket.SOCK_STREAM)# require a certificate from the serverssl_sock=ssl.wrap_socket(s,ca_certs="/etc/ca_certs_file",cert_reqs=ssl.CERT_REQUIRED)ssl_sock.connect(('www.verisign.com',443))pprint.pprint(ssl_sock.getpeercert())# note that closing the SSLSocket will also close the underlying socketssl_sock.close()
As of October 6, 2010, the certificate printed by this program looks like
this:
{'notAfter':'May 25 23:59:59 2012 GMT','subject':((('1.3.6.1.4.1.311.60.2.1.3','US'),),(('1.3.6.1.4.1.311.60.2.1.2','Delaware'),),(('businessCategory','V1.0, Clause 5.(b)'),),(('serialNumber','2497886'),),(('countryName','US'),),(('postalCode','94043'),),(('stateOrProvinceName','California'),),(('localityName','Mountain View'),),(('streetAddress','487 East Middlefield Road'),),(('organizationName','VeriSign, Inc.'),),(('organizationalUnitName',' Production Security Services'),),(('commonName','www.verisign.com'),))}
This other example first creates an SSL context, instructs it to verify
certificates sent by peers, and feeds it a set of recognized certificate
authorities (CA):
(it is assumed your operating system places a bundle of all CA certificates
in /etc/ssl/certs/ca-bundle.crt; if not, you’ll get an error and have
to adjust the location)
When you use the context to connect to a server, CERT_REQUIRED
validates the server certificate: it ensures that the server certificate
was signed with one of the CA certificates, and checks the signature for
correctness:
For server operation, typically you’ll need to have a server certificate, and
private key, each in a file. You’ll first create a context holding the key
and the certificate, so that clients can check your authenticity. Then
you’ll open a socket, bind it to a port, call listen() on it, and start
waiting for clients to connect:
When a client connects, you’ll call accept() on the socket to get the
new socket from the other end, and use the context’s SSLContext.wrap_socket()
method to create a server-side SSL socket for the connection:
Then you’ll read data from the connstream and do something with it till you
are finished with the client (or the client is finished with you):
defdeal_with_client(connstream):data=connstream.recv(1024)# empty data means the client is finished with uswhiledata:ifnotdo_something(connstream,data):# we'll assume do_something returns False# when we're finished with clientbreakdata=connstream.recv(1024)# finished with client
And go back to listening for new client connections (of course, a real server
would probably handle each client connection in a separate thread, or put
the sockets in non-blocking mode and use an event loop).
When working with non-blocking sockets, there are several things you need
to be aware of:
Calling select() tells you that the OS-level socket can be
read from (or written to), but it does not imply that there is sufficient
data at the upper SSL layer. For example, only part of an SSL frame might
have arrived. Therefore, you must be ready to handle SSLSocket.recv()
and SSLSocket.send() failures, and retry after another call to
select().
(of course, similar provisions apply when using other primitives such as
poll())
The SSL handshake itself will be non-blocking: the
SSLSocket.do_handshake() method has to be retried until it returns
successfully. Here is a synopsis using select() to wait for
the socket’s readiness:
CERT_NONE is the default. Since it does not authenticate the other
peer, it can be insecure, especially in client mode where most of time you
would like to ensure the authenticity of the server you’re talking to.
Therefore, when in client mode, it is highly recommended to use
CERT_REQUIRED. However, it is in itself not sufficient; you also
have to check that the server certificate, which can be obtained by calling
SSLSocket.getpeercert(), matches the desired service. For many
protocols and applications, the service can be identified by the hostname;
in this case, the match_hostname() function can be used.
In server mode, if you want to authenticate your clients using the SSL layer
(rather than using a higher-level authentication mechanism), you’ll also have
to specify CERT_REQUIRED and similarly check the client certificate.
Note
In client mode, CERT_OPTIONAL and CERT_REQUIRED are
equivalent unless anonymous ciphers are enabled (they are disabled
by default).
SSL version 2 is considered insecure and is therefore dangerous to use. If
you want maximum compatibility between clients and servers, it is recommended
to use PROTOCOL_SSLv23 as the protocol version and then disable
SSLv2 explicitly using the SSLContext.options attribute:
This module provides mechanisms to use signal handlers in Python. Some general
rules for working with signals and their handlers:
A handler for a particular signal, once set, remains installed until it is
explicitly reset (Python emulates the BSD style interface regardless of the
underlying implementation), with the exception of the handler for
SIGCHLD, which follows the underlying implementation.
There is no way to “block” signals temporarily from critical sections (since
this is not supported by all Unix flavors).
Although Python signal handlers are called asynchronously as far as the Python
user is concerned, they can only occur between the “atomic” instructions of the
Python interpreter. This means that signals arriving during long calculations
implemented purely in C (such as regular expression matches on large bodies of
text) may be delayed for an arbitrary amount of time.
When a signal arrives during an I/O operation, it is possible that the I/O
operation raises an exception after the signal handler returns. This is
dependent on the underlying Unix system’s semantics regarding interrupted system
calls.
Because the C signal handler always returns, it makes little sense to catch
synchronous errors like SIGFPE or SIGSEGV.
Python installs a small number of signal handlers by default: SIGPIPE
is ignored (so write errors on pipes and sockets can be reported as ordinary
Python exceptions) and SIGINT is translated into a
KeyboardInterrupt exception. All of these can be overridden.
Some care must be taken if both signals and threads are used in the same
program. The fundamental thing to remember in using signals and threads
simultaneously is: always perform signal() operations in the main thread
of execution. Any thread can perform an alarm(), getsignal(),
pause(), setitimer() or getitimer(); only the main thread
can set a new signal handler, and the main thread will be the only one to
receive signals (this is enforced by the Python signal module, even
if the underlying thread implementation supports sending signals to
individual threads). This means that signals can’t be used as a means of
inter-thread communication. Use locks instead.
This is one of two standard signal handling options; it will simply perform
the default function for the signal. For example, on most systems the
default action for SIGQUIT is to dump core and exit, while the
default action for SIGCHLD is to simply ignore it.
This is another standard signal handler, which will simply ignore the given
signal.
SIG*
All the signal numbers are defined symbolically. For example, the hangup signal
is defined as signal.SIGHUP; the variable names are identical to the
names used in C programs, as found in <signal.h>. The Unix man page for
‘signal()‘ lists the existing signals (on some systems this is
signal(2), on others the list is in signal(7)). Note that
not all systems define the same set of signal names; only those names defined by
the system are defined by this module.
Decrements interval timer both when the process executes and when the
system is executing on behalf of the process. Coupled with ITIMER_VIRTUAL,
this timer is usually used to profile the time spent by the application
in user and kernel space. SIGPROF is delivered upon expiration.
Raised to signal an error from the underlying setitimer() or
getitimer() implementation. Expect this error if an invalid
interval timer or a negative time is passed to setitimer().
This error is a subtype of IOError.
The signal module defines the following functions:
If time is non-zero, this function requests that a SIGALRM signal be
sent to the process in time seconds. Any previously scheduled alarm is
canceled (only one alarm can be scheduled at any time). The returned value is
then the number of seconds before any previously set alarm was to have been
delivered. If time is zero, no alarm is scheduled, and any scheduled alarm is
canceled. If the return value is zero, no alarm is currently scheduled. (See
the Unix man page alarm(2).) Availability: Unix.
Return the current signal handler for the signal signalnum. The returned value
may be a callable Python object, or one of the special values
signal.SIG_IGN, signal.SIG_DFL or None. Here,
signal.SIG_IGN means that the signal was previously ignored,
signal.SIG_DFL means that the default way of handling the signal was
previously in use, and None means that the previous signal handler was not
installed from Python.
Cause the process to sleep until a signal is received; the appropriate handler
will then be called. Returns nothing. Not on Windows. (See the Unix man page
signal(2).)
Sets given interval timer (one of signal.ITIMER_REAL,
signal.ITIMER_VIRTUAL or signal.ITIMER_PROF) specified
by which to fire after seconds (float is accepted, different from
alarm()) and after that every interval seconds. The interval
timer specified by which can be cleared by setting seconds to zero.
When an interval timer fires, a signal is sent to the process.
The signal sent is dependent on the timer being used;
signal.ITIMER_REAL will deliver SIGALRM,
signal.ITIMER_VIRTUAL sends SIGVTALRM,
and signal.ITIMER_PROF will deliver SIGPROF.
The old values are returned as a tuple: (delay, interval).
Attempting to pass an invalid interval timer will cause an
ItimerError. Availability: Unix.
Set the wakeup fd to fd. When a signal is received, a '\0' byte is
written to the fd. This can be used by a library to wakeup a poll or select
call, allowing the signal to be fully processed.
The old wakeup fd is returned. fd must be non-blocking. It is up to the
library to remove any bytes before calling poll or select again.
When threads are enabled, this function can only be called from the main thread;
attempting to call it from other threads will cause a ValueError
exception to be raised.
Change system call restart behaviour: if flag is False, system
calls will be restarted when interrupted by signal signalnum, otherwise
system calls will be interrupted. Returns nothing. Availability: Unix (see
the man page siginterrupt(3) for further information).
Note that installing a signal handler with signal() will reset the
restart behaviour to interruptible by implicitly calling
siginterrupt() with a true flag value for the given signal.
Set the handler for signal signalnum to the function handler. handler can
be a callable Python object taking two arguments (see below), or one of the
special values signal.SIG_IGN or signal.SIG_DFL. The previous
signal handler will be returned (see the description of getsignal()
above). (See the Unix man page signal(2).)
When threads are enabled, this function can only be called from the main thread;
attempting to call it from other threads will cause a ValueError
exception to be raised.
The handler is called with two arguments: the signal number and the current
stack frame (None or a frame object; for a description of frame objects,
see the description in the type hierarchy or see the
attribute descriptions in the inspect module).
On Windows, signal() can only be called with SIGABRT,
SIGFPE, SIGILL, SIGINT, SIGSEGV, or
SIGTERM. A ValueError will be raised in any other case.
Here is a minimal example program. It uses the alarm() function to limit
the time spent waiting to open a file; this is useful if the file is for a
serial device that may not be turned on, which would normally cause the
os.open() to hang indefinitely. The solution is to set a 5-second alarm
before opening the file; if the operation takes too long, the alarm signal will
be sent, and the handler raises an exception.
importsignal,osdefhandler(signum,frame):print('Signal handler called with signal',signum)raiseIOError("Couldn't open device!")# Set the signal handler and a 5-second alarmsignal.signal(signal.SIGALRM,handler)signal.alarm(5)# This open() may hang indefinitelyfd=os.open('/dev/ttyS0',os.O_RDWR)signal.alarm(0)# Disable the alarm
This module provides the basic infrastructure for writing asynchronous socket
service clients and servers.
There are only two ways to have a program on a single processor do “more than
one thing at a time.” Multi-threaded programming is the simplest and most
popular way to do it, but there is another very different technique, that lets
you have nearly all the advantages of multi-threading, without actually using
multiple threads. It’s really only practical if your program is largely I/O
bound. If your program is processor bound, then pre-emptive scheduled threads
are probably what you really need. Network servers are rarely processor
bound, however.
If your operating system supports the select() system call in its I/O
library (and nearly all do), then you can use it to juggle multiple
communication channels at once; doing other work while your I/O is taking
place in the “background.” Although this strategy can seem strange and
complex, especially at first, it is in many ways easier to understand and
control than multi-threaded programming. The asyncore module solves
many of the difficult problems for you, making the task of building
sophisticated high-performance network servers and clients a snap. For
“conversational” applications and protocols the companion asynchat
module is invaluable.
The basic idea behind both modules is to create one or more network
channels, instances of class asyncore.dispatcher and
asynchat.async_chat. Creating the channels adds them to a global
map, used by the loop() function if you do not provide it with your own
map.
Once the initial channel(s) is(are) created, calling the loop() function
activates channel service, which continues until the last channel (including
any that have been added to the map during asynchronous service) is closed.
Enter a polling loop that terminates after count passes or all open
channels have been closed. All arguments are optional. The count
parameter defaults to None, resulting in the loop terminating only when all
channels have been closed. The timeout argument sets the timeout
parameter for the appropriate select() or poll() call, measured
in seconds; the default is 30 seconds. The use_poll parameter, if true,
indicates that poll() should be used in preference to select()
(the default is False).
The map parameter is a dictionary whose items are the channels to watch.
As channels are closed they are deleted from their map. If map is
omitted, a global map is used. Channels (instances of
asyncore.dispatcher, asynchat.async_chat and subclasses
thereof) can freely be mixed in the map.
The dispatcher class is a thin wrapper around a low-level socket
object. To make it more useful, it has a few methods for event-handling
which are called from the asynchronous loop. Otherwise, it can be treated
as a normal non-blocking socket object.
The firing of low-level events at certain times or in certain connection
states tells the asynchronous loop that certain higher-level events have
taken place. For example, if we have asked for a socket to connect to
another host, we know that the connection has been made when the socket
becomes writable for the first time (at this point you know that you may
write to it with the expectation of success). The implied higher-level
events are:
Event
Description
handle_connect()
Implied by the first read or write
event
handle_close()
Implied by a read event with no data
available
handle_accepted()
Implied by a read event on a listening
socket
During asynchronous processing, each mapped channel’s readable() and
writable() methods are used to determine whether the channel’s socket
should be added to the list of channels select()ed or
poll()ed for read and write events.
Thus, the set of channel events is larger than the basic socket events. The
full set of methods that can be overridden in your subclass follows:
Called when the asynchronous loop detects that a writable socket can be
written. Often this method will implement the necessary buffering for
performance. For example:
Called when the active opener’s socket actually makes a connection. Might
send a “welcome” banner, or initiate a protocol negotiation with the
remote endpoint, for example.
Called on listening channels (passive openers) when a connection can be
established with a new remote endpoint that has issued a connect()
call for the local endpoint. Deprecated in version 3.2; use
handle_accepted() instead.
Called on listening channels (passive openers) when a connection has been
established with a new remote endpoint that has issued a connect()
call for the local endpoint. sock is a new socket object usable to
send and receive data on the connection, and addr is the address
bound to the socket on the other end of the connection.
Called each time around the asynchronous loop to determine whether a
channel’s socket should be added to the list on which read events can
occur. The default method simply returns True, indicating that by
default, all channels will be interested in read events.
Called each time around the asynchronous loop to determine whether a
channel’s socket should be added to the list on which write events can
occur. The default method simply returns True, indicating that by
default, all channels will be interested in write events.
In addition, each channel delegates or extends many of the socket methods.
Most of these are nearly identical to their socket partners.
This is identical to the creation of a normal socket, and will use the
same options for creation. Refer to the socket documentation for
information on creating sockets.
Listen for connections made to the socket. The backlog argument
specifies the maximum number of queued connections and should be at least
1; the maximum value is system-dependent (usually 5).
Bind the socket to address. The socket must not already be bound. (The
format of address depends on the address family — refer to the
socket documentation for more information.) To mark
the socket as re-usable (setting the SO_REUSEADDR option), call
the dispatcher object’s set_reuse_addr() method.
Accept a connection. The socket must be bound to an address and listening
for connections. The return value can be either None or a pair
(conn,address) where conn is a new socket object usable to send
and receive data on the connection, and address is the address bound to
the socket on the other end of the connection.
When None is returned it means the connection didn’t take place, in
which case the server should just ignore this event and keep listening
for further incoming connections.
Close the socket. All future operations on the socket object will fail.
The remote end-point will receive no more data (after queued data is
flushed). Sockets are automatically closed when they are
garbage-collected.
A file_dispatcher takes a file descriptor or file object along
with an optional map argument and wraps it for use with the poll()
or loop() functions. If provided a file object or anything with a
fileno() method, that method will be called and passed to the
file_wrapper constructor. Availability: UNIX.
A file_wrapper takes an integer file descriptor and calls os.dup() to
duplicate the handle so that the original handle may be closed independently
of the file_wrapper. This class implements sufficient methods to emulate a
socket for use by the file_dispatcher class. Availability: UNIX.
Here is a basic echo server that uses the dispatcher class to accept
connections and dispatches the incoming connections to a handler:
importasyncoreimportsocketclassEchoHandler(asyncore.dispatcher_with_send):defhandle_read(self):data=self.recv(8192)ifdata:self.send(data)classEchoServer(asyncore.dispatcher):def__init__(self,host,port):asyncore.dispatcher.__init__(self)self.create_socket(socket.AF_INET,socket.SOCK_STREAM)self.set_reuse_addr()self.bind((host,port))self.listen(5)defhandle_accepted(self,sock,addr):print('Incoming connection from %s'%repr(addr))handler=EchoHandler(sock)server=EchoServer('localhost',8080)asyncore.loop()
This module builds on the asyncore infrastructure, simplifying
asynchronous clients and servers and making it easier to handle protocols
whose elements are terminated by arbitrary strings, or are of variable length.
asynchat defines the abstract class async_chat that you
subclass, providing implementations of the collect_incoming_data() and
found_terminator() methods. It uses the same asynchronous loop as
asyncore, and the two types of channel, asyncore.dispatcher
and asynchat.async_chat, can freely be mixed in the channel map.
Typically an asyncore.dispatcher server channel generates new
asynchat.async_chat channel objects as it receives incoming
connection requests.
Like asyncore.dispatcher, async_chat defines a set of
events that are generated by an analysis of socket conditions after a
select() call. Once the polling loop has been started the
async_chat object’s methods are called by the event-processing
framework with no action on the part of the programmer.
Two class attributes can be modified, to improve performance, or possibly
even to conserve memory.
The asynchronous output buffer size (default 4096).
Unlike asyncore.dispatcher, async_chat allows you to
define a first-in-first-out queue (fifo) of producers. A producer need
have only one method, more(), which should return data to be
transmitted on the channel.
The producer indicates exhaustion (i.e. that it contains no more data) by
having its more() method return the empty string. At this point the
async_chat object removes the producer from the fifo and starts
using the next producer, if any. When the producer fifo is empty the
handle_write() method does nothing. You use the channel object’s
set_terminator() method to describe how to recognize the end of, or
an important breakpoint in, an incoming transmission from the remote
endpoint.
To build a functioning async_chat subclass your input methods
collect_incoming_data() and found_terminator() must handle the
data that the channel receives asynchronously. The methods are described
below.
Called when the incoming data stream matches the termination condition set
by set_terminator(). The default method, which must be overridden,
raises a NotImplementedError exception. The buffered input data
should be available via an instance attribute.
Pushes data on to the channel’s fifo to ensure its transmission.
This is all you need to do to have the channel write the data out to the
network, although it is possible to use your own producers in more complex
schemes to implement encryption and chunking, for example.
Takes a producer object and adds it to the producer fifo associated with
the channel. When all currently-pushed producers have been exhausted the
channel will consume this producer’s data by calling its more()
method and send the data to the remote endpoint.
Sets the terminating condition to be recognized on the channel. term
may be any of three types of value, corresponding to three different ways
to handle incoming protocol data.
term
Description
string
Will call found_terminator() when the
string is found in the input stream
integer
Will call found_terminator() when the
indicated number of characters have been
received
None
The channel continues to collect data
forever
Note that any data following the terminator will be available for reading
by the channel after found_terminator() is called.
A fifo holding data which has been pushed by the application but
not yet popped for writing to the channel. A fifo is a list used
to hold data and/or producers until they are required. If the list
argument is provided then it should contain producers or data items to be
written to the channel.
The following partial example shows how HTTP requests can be read with
async_chat. A web server might create an
http_request_handler object for each incoming client connection.
Notice that initially the channel terminator is set to match the blank line at
the end of the HTTP headers, and a flag indicates that the headers are being
read.
Once the headers have been read, if the request is of type POST (indicating
that further data are present in the input stream) then the
Content-Length: header is used to set a numeric terminator to read the
right amount of data from the channel.
The handle_request() method is called once all relevant input has been
marshalled, after setting the channel terminator to None to ensure that
any extraneous data sent by the web client are ignored.
classhttp_request_handler(asynchat.async_chat):def__init__(self,sock,addr,sessions,log):asynchat.async_chat.__init__(self,sock=sock)self.addr=addrself.sessions=sessionsself.ibuffer=[]self.obuffer=b""self.set_terminator(b"\r\n\r\n")self.reading_headers=Trueself.handling=Falseself.cgi_data=Noneself.log=logdefcollect_incoming_data(self,data):"""Buffer the data"""self.ibuffer.append(data)deffound_terminator(self):ifself.reading_headers:self.reading_headers=Falseself.parse_headers("".join(self.ibuffer))self.ibuffer=[]ifself.op.upper()==b"POST":clen=self.headers.getheader("content-length")self.set_terminator(int(clen))else:self.handling=Trueself.set_terminator(None)self.handle_request()elifnotself.handling:self.set_terminator(None)# browsers sometimes over-sendself.cgi_data=parse(self.headers,b"".join(self.ibuffer))self.handling=Trueself.ibuffer=[]self.handle_request()
The email package is a library for managing email messages, including
MIME and other RFC 2822-based message documents. It is specifically not
designed to do any sending of email messages to SMTP (RFC 2821), NNTP, or
other servers; those are functions of modules such as smtplib and
nntplib. The email package attempts to be as RFC-compliant as
possible, supporting in addition to RFC 2822, such MIME-related RFCs as
RFC 2045, RFC 2046, RFC 2047, and RFC 2231.
The primary distinguishing feature of the email package is that it splits
the parsing and generating of email messages from the internal object model
representation of email. Applications using the email package deal
primarily with objects; you can add sub-objects to messages, remove sub-objects
from messages, completely re-arrange the contents, etc. There is a separate
parser and a separate generator which handles the transformation from flat text
to the object model, and then back to flat text again. There are also handy
subclasses for some common MIME object types, and a few miscellaneous utilities
that help with such common tasks as extracting and parsing message field values,
creating RFC-compliant dates, etc.
The following sections describe the functionality of the email package.
The ordering follows a progression that should be common in applications: an
email message is read as flat text from a file or other source, the text is
parsed to produce the object structure of the email message, this structure is
manipulated, and finally, the object tree is rendered back into flat text.
It is perfectly feasible to create the object structure out of whole cloth —
i.e. completely from scratch. From there, a similar progression can be taken as
above.
Also included are detailed specifications of all the classes and modules that
the email package provides, the exception classes you might encounter
while using the email package, some auxiliary utilities, and a few
examples. For users of the older mimelib package, or previous versions
of the email package, a section on differences and porting is provided.
The central class in the email package is the Message class,
imported from the email.message module. It is the base class for the
email object model. Message provides the core functionality for
setting and querying header fields, and for accessing message bodies.
Conceptually, a Message object consists of headers and payloads.
Headers are RFC 2822 style field names and values where the field name and
value are separated by a colon. The colon is not part of either the field name
or the field value.
Headers are stored and returned in case-preserving form but are matched
case-insensitively. There may also be a single envelope header, also known as
the Unix-From header or the From_ header. The payload is either a string
in the case of simple message objects or a list of Message objects for
MIME container documents (e.g. multipart/* and
message/rfc822).
Message objects provide a mapping style interface for accessing the
message headers, and an explicit interface for accessing both the headers and
the payload. It provides convenience methods for generating a flat text
representation of the message object tree, for accessing commonly used header
parameters, and for recursively walking over the object tree.
Return the entire message flattened as a string. When optional unixfrom
is True, the envelope header is included in the returned string.
unixfrom defaults to False. Flattening the message may trigger
changes to the Message if defaults need to be filled in to
complete the transformation to a string (for example, MIME boundaries may
be generated or modified).
Note that this method is provided as a convenience and may not always
format the message the way you want. For example, by default it does
not do the mangling of lines that begin with From that is
required by the unix mbox format. For more flexibility, instantiate a
Generator instance and use its flatten()
method directly. For example:
Return True if the message’s payload is a list of sub-Message objects, otherwise return False. When
is_multipart() returns False, the payload should be a string object.
Add the given payload to the current payload, which must be None or
a list of Message objects before the call. After the call, the
payload will always be a list of Message objects. If you want to
set the payload to a scalar object (e.g. a string), use
set_payload() instead.
Return the current payload, which will be a list of
Message objects when is_multipart() is True, or a
string when is_multipart() is False. If the payload is a list
and you mutate the list object, you modify the message’s payload in place.
With optional argument i, get_payload() will return the i-th
element of the payload, counting from zero, if is_multipart() is
True. An IndexError will be raised if i is less than 0 or
greater than or equal to the number of items in the payload. If the
payload is a string (i.e. is_multipart() is False) and i is
given, a TypeError is raised.
Optional decode is a flag indicating whether the payload should be
decoded or not, according to the Content-Transfer-Encoding
header. When True and the message is not a multipart, the payload will
be decoded if this header’s value is quoted-printable or base64.
If some other encoding is used, or Content-Transfer-Encoding
header is missing, or if the payload has bogus base64 data, the payload is
returned as-is (undecoded). In all cases the returned value is binary
data. If the message is a multipart and the decode flag is True,
then None is returned.
When decode is False (the default) the body is returned as a string
without decoding the Content-Transfer-Encoding. However,
for a Content-Transfer-Encoding of 8bit, an attempt is made
to decode the original bytes using the charset specified by the
Content-Type header, using the replace error handler.
If no charset is specified, or if the charset given is not
recognized by the email package, the body is decoded using the default
ASCII charset.
Set the entire message object’s payload to payload. It is the client’s
responsibility to ensure the payload invariants. Optional charset sets
the message’s default character set; see set_charset() for details.
Set the character set of the payload to charset, which can either be a
Charset instance (see email.charset), a
string naming a character set, or None. If it is a string, it will
be converted to a Charset instance. If charset
is None, the charset parameter will be removed from the
Content-Type header (the message will not be otherwise
modified). Anything else will generate a TypeError.
If there is no existing MIME-Version header one will be
added. If there is no existing Content-Type header, one
will be added with a value of text/plain. Whether the
Content-Type header already exists or not, its charset
parameter will be set to charset.output_charset. If
charset.input_charset and charset.output_charset differ, the payload
will be re-encoded to the output_charset. If there is no existing
Content-Transfer-Encoding header, then the payload will be
transfer-encoded, if needed, using the specified
Charset, and a header with the appropriate value
will be added. If a Content-Transfer-Encoding header
already exists, the payload is assumed to already be correctly encoded
using that Content-Transfer-Encoding and is not modified.
Return the Charset instance associated with the
message’s payload.
The following methods implement a mapping-like interface for accessing the
message’s RFC 2822 headers. Note that there are some semantic differences
between these methods and a normal mapping (i.e. dictionary) interface. For
example, in a dictionary there are no duplicate keys, but here there may be
duplicate message headers. Also, in dictionaries there is no guaranteed
order to the keys returned by keys(), but in a Message object,
headers are always returned in the order they appeared in the original
message, or were added to the message later. Any header deleted and then
re-added are always appended to the end of the header list.
These semantic differences are intentional and are biased toward maximal
convenience.
Note that in all cases, any envelope header present in the message is not
included in the mapping interface.
In a model generated from bytes, any header values that (in contravention of
the RFCs) contain non-ASCII bytes will, when retrieved through this
interface, be represented as Header objects with
a charset of unknown-8bit.
Return true if the message object has a field named name. Matching is
done case-insensitively and name should not include the trailing colon.
Used for the in operator, e.g.:
Return the value of the named header field. name should not include the
colon field separator. If the header is missing, None is returned; a
KeyError is never raised.
Note that if the named field appears more than once in the message’s
headers, exactly which of those field values will be returned is
undefined. Use the get_all() method to get the values of all the
extant named headers.
Add a header to the message with field name name and value val. The
field is appended to the end of the message’s existing fields.
Note that this does not overwrite or delete any existing header with the same
name. If you want to ensure that the new header is the only one present in the
message with field name name, delete the field first, e.g.:
Return the value of the named header field. This is identical to
__getitem__() except that optional failobj is returned if the
named header is missing (defaults to None).
Extended header setting. This method is similar to __setitem__()
except that additional header parameters can be provided as keyword
arguments. _name is the header field to add and _value is the
primary value for the header.
For each item in the keyword argument dictionary _params, the key is
taken as the parameter name, with underscores converted to dashes (since
dashes are illegal in Python identifiers). Normally, the parameter will
be added as key="value" unless the value is None, in which case
only the key will be added. If the value contains non-ASCII characters,
it can be specified as a three tuple in the format
(CHARSET,LANGUAGE,VALUE), where CHARSET is a string naming the
charset to be used to encode the value, LANGUAGE can usually be set
to None or the empty string (see RFC 2231 for other possibilities),
and VALUE is the string value containing non-ASCII code points. If
a three tuple is not passed and the value contains non-ASCII characters,
it is automatically encoded in RFC 2231 format using a CHARSET
of utf-8 and a LANGUAGE of None.
Replace a header. Replace the first header found in the message that
matches _name, retaining header order and field name case. If no
matching header was found, a KeyError is raised.
Return the message’s content type. The returned string is coerced to
lower case of the form maintype/subtype. If there was no
Content-Type header in the message the default type as given
by get_default_type() will be returned. Since according to
RFC 2045, messages always have a default type, get_content_type()
will always return a value.
RFC 2045 defines a message’s default type to be text/plain
unless it appears inside a multipart/digest container, in
which case it would be message/rfc822. If the
Content-Type header has an invalid type specification,
RFC 2045 mandates that the default type be text/plain.
Return the default content type. Most messages have a default content
type of text/plain, except for messages that are subparts of
multipart/digest containers. Such subparts have a default
content type of message/rfc822.
Set the default content type. ctype should either be
text/plain or message/rfc822, although this is not
enforced. The default content type is not stored in the
Content-Type header.
Return the message’s Content-Type parameters, as a list.
The elements of the returned list are 2-tuples of key/value pairs, as
split on the '=' sign. The left hand side of the '=' is the key,
while the right hand side is the value. If there is no '=' sign in
the parameter the value is the empty string, otherwise the value is as
described in get_param() and is unquoted if optional unquote is
True (the default).
Optional failobj is the object to return if there is no
Content-Type header. Optional header is the header to
search instead of Content-Type.
Return the value of the Content-Type header’s parameter
param as a string. If the message has no Content-Type
header or if there is no such parameter, then failobj is returned
(defaults to None).
Optional header if given, specifies the message header to use instead of
Content-Type.
Parameter keys are always compared case insensitively. The return value
can either be a string, or a 3-tuple if the parameter was RFC 2231
encoded. When it’s a 3-tuple, the elements of the value are of the form
(CHARSET,LANGUAGE,VALUE). Note that both CHARSET and
LANGUAGE can be None, in which case you should consider VALUE
to be encoded in the us-ascii charset. You can usually ignore
LANGUAGE.
If your application doesn’t care whether the parameter was encoded as in
RFC 2231, you can collapse the parameter value by calling
email.utils.collapse_rfc2231_value(), passing in the return value
from get_param(). This will return a suitably decoded Unicode
string when the value is a tuple, or the original string unquoted if it
isn’t. For example:
Set a parameter in the Content-Type header. If the
parameter already exists in the header, its value will be replaced with
value. If the Content-Type header as not yet been defined
for this message, it will be set to text/plain and the new
parameter value will be appended as per RFC 2045.
Optional header specifies an alternative header to
Content-Type, and all parameters will be quoted as necessary
unless optional requote is False (the default is True).
If optional charset is specified, the parameter will be encoded
according to RFC 2231. Optional language specifies the RFC 2231
language, defaulting to the empty string. Both charset and language
should be strings.
Remove the given parameter completely from the Content-Type
header. The header will be re-written in place without the parameter or
its value. All values will be quoted as necessary unless requote is
False (the default is True). Optional header specifies an
alternative to Content-Type.
Set the main type and subtype for the Content-Type
header. type must be a string in the form maintype/subtype,
otherwise a ValueError is raised.
This method replaces the Content-Type header, keeping all
the parameters in place. If requote is False, this leaves the
existing header’s quoting as is, otherwise the parameters will be quoted
(the default).
An alternative header can be specified in the header argument. When the
Content-Type header is set a MIME-Version
header is also added.
Return the value of the filename parameter of the
Content-Disposition header of the message. If the header
does not have a filename parameter, this method falls back to looking
for the name parameter on the Content-Type header. If
neither is found, or the header is missing, then failobj is returned.
The returned string will always be unquoted as per
email.utils.unquote().
Return the value of the boundary parameter of the
Content-Type header of the message, or failobj if either
the header is missing, or has no boundary parameter. The returned
string will always be unquoted as per email.utils.unquote().
Set the boundary parameter of the Content-Type header to
boundary. set_boundary() will always quote boundary if
necessary. A HeaderParseError is raised if the message object has
no Content-Type header.
Note that using this method is subtly different than deleting the old
Content-Type header and adding a new one with the new
boundary via add_header(), because set_boundary() preserves
the order of the Content-Type header in the list of
headers. However, it does not preserve any continuation lines which may
have been present in the original Content-Type header.
Return the charset parameter of the Content-Type header,
coerced to lower case. If there is no Content-Type header, or if
that header has no charset parameter, failobj is returned.
Note that this method differs from get_charset() which returns the
Charset instance for the default encoding of the message body.
Return a list containing the character set names in the message. If the
message is a multipart, then the list will contain one element
for each subpart in the payload, otherwise, it will be a list of length 1.
Each item in the list will be a string which is the value of the
charset parameter in the Content-Type header for the
represented subpart. However, if the subpart has no
Content-Type header, no charset parameter, or is not of
the text main MIME type, then that item in the returned list
will be failobj.
The walk() method is an all-purpose generator which can be used to
iterate over all the parts and subparts of a message object tree, in
depth-first traversal order. You will typically use walk() as the
iterator in a for loop; each iteration returns the next subpart.
Here’s an example that prints the MIME type of every part of a multipart
message structure:
The format of a MIME document allows for some text between the blank line
following the headers, and the first multipart boundary string. Normally,
this text is never visible in a MIME-aware mail reader because it falls
outside the standard MIME armor. However, when viewing the raw text of
the message, or when viewing the message in a non-MIME aware reader, this
text can become visible.
The preamble attribute contains this leading extra-armor text for MIME
documents. When the Parser discovers some text
after the headers but before the first boundary string, it assigns this
text to the message’s preamble attribute. When the
Generator is writing out the plain text
representation of a MIME message, and it finds the
message has a preamble attribute, it will write this text in the area
between the headers and the first boundary. See email.parser and
email.generator for details.
Note that if the message object has no preamble, the preamble attribute
will be None.
The epilogue attribute acts the same way as the preamble attribute,
except that it contains text that appears between the last boundary and
the end of the message.
You do not need to set the epilogue to the empty string in order for the
Generator to print a newline at the end of the file.
The defects attribute contains a list of all the problems found when
parsing this message. See email.errors for a detailed description
of the possible parsing defects.
Message object structures can be created in one of two ways: they can be created
from whole cloth by instantiating Message objects and
stringing them together via attach() and set_payload() calls, or they
can be created by parsing a flat text representation of the email message.
The email package provides a standard parser that understands most email
document structures, including MIME documents. You can pass the parser a string
or a file object, and the parser will return to you the root
Message instance of the object structure. For simple,
non-MIME messages the payload of this root object will likely be a string
containing the text of the message. For MIME messages, the root object will
return True from its is_multipart() method, and the subparts can be
accessed via the get_payload() and walk() methods.
There are actually two parser interfaces available for use, the classic
Parser API and the incremental FeedParser API. The classic
Parser API is fine if you have the entire text of the message in memory
as a string, or if the entire message lives in a file on the file system.
FeedParser is more appropriate for when you’re reading the message from
a stream which might block waiting for more input (e.g. reading an email message
from a socket). The FeedParser can consume and parse the message
incrementally, and only returns the root object when you close the parser [1].
Note that the parser can be extended in limited ways, and of course you can
implement your own parser completely from scratch. There is no magical
connection between the email package’s bundled parser and the
Message class, so your custom parser can create message
object trees any way it finds necessary.
The FeedParser, imported from the email.feedparser module,
provides an API that is conducive to incremental parsing of email messages, such
as would be necessary when reading the text of an email message from a source
that can block (e.g. a socket). The FeedParser can of course be used
to parse an email message fully contained in a string or a file, but the classic
Parser API may be more convenient for such use cases. The semantics
and results of the two parser APIs are identical.
The FeedParser‘s API is simple; you create an instance, feed it a bunch
of text until there’s no more to feed it, then close the parser to retrieve the
root message object. The FeedParser is extremely accurate when parsing
standards-compliant messages, and it does a very good job of parsing
non-compliant messages, providing information about how a message was deemed
broken. It will populate a message object’s defects attribute with a list of
any problems it found in a message. See the email.errors module for the
list of defects that it can find.
class email.parser.FeedParser(_factory=email.message.Message)¶
Create a FeedParser instance. Optional _factory is a no-argument
callable that will be called whenever a new message object is needed. It
defaults to the email.message.Message class.
Feed the FeedParser some more data. data should be a string
containing one or more lines. The lines can be partial and the
FeedParser will stitch such partial lines together properly. The
lines in the string can have any of the common three line endings,
carriage return, newline, or carriage return and newline (they can even be
mixed).
Closing a FeedParser completes the parsing of all previously fed
data, and returns the root message object. It is undefined what happens
if you feed more data to a closed FeedParser.
class email.parser.BytesFeedParser(_factory=email.message.Message)¶
Works exactly like FeedParser except that the input to the
feed() method must be bytes and not string.
The Parser class, imported from the email.parser module,
provides an API that can be used to parse a message when the complete contents
of the message are available in a string or file. The email.parser
module also provides a second class, called HeaderParser which can be
used if you’re only interested in the headers of the message.
HeaderParser can be much faster in these situations, since it does not
attempt to parse the message body, instead setting the payload to the raw body
as a string. HeaderParser has the same API as the Parser
class.
class email.parser.Parser(_class=email.message.Message, strict=None)¶
The constructor for the Parser class takes an optional argument
_class. This must be a callable factory (such as a function or a class), and
it is used whenever a sub-message object needs to be created. It defaults to
Message (see email.message). The factory will
be called without arguments.
The optional strict flag is ignored.
Deprecated since version 2.4:
Deprecated since version 2.4: Because the Parser class is a backward compatible API wrapper
around the new-in-Python 2.4 FeedParser, all parsing is
effectively non-strict. You should simply stop passing a strict flag to
the Parser constructor.
Read all the data from the file-like object fp, parse the resulting
text, and return the root message object. fp must support both the
readline() and the read() methods on file-like objects.
The text contained in fp must be formatted as a block of RFC 2822
style headers and header continuation lines, optionally preceded by a
envelope header. The header block is terminated either by the end of the
data or by a blank line. Following the header block is the body of the
message (which may contain MIME-encoded subparts).
Optional headersonly is as with the parse() method.
Similar to the parse() method, except it takes a string object
instead of a file-like object. Calling this method on a string is exactly
equivalent to wrapping text in a StringIO instance first and
calling parse().
Optional headersonly is a flag specifying whether to stop parsing after
reading the headers or not. The default is False, meaning it parses
the entire contents of the file.
class email.parser.BytesParser(_class=email.message.Message, strict=None)¶
This class is exactly parallel to Parser, but handles bytes input.
The _class and strict arguments are interpreted in the same way as for
the Parser constructor. strict is supported only to make porting
code easier; it is deprecated.
Read all the data from the binary file-like object fp, parse the
resulting bytes, and return the message object. fp must support
both the readline() and the read() methods on file-like
objects.
The bytes contained in fp must be formatted as a block of RFC 2822
style headers and header continuation lines, optionally preceded by a
envelope header. The header block is terminated either by the end of the
data or by a blank line. Following the header block is the body of the
message (which may contain MIME-encoded subparts, including subparts
with a Content-Transfer-Encoding of 8bit.
Optional headersonly is a flag specifying whether to stop parsing after
reading the headers or not. The default is False, meaning it parses
the entire contents of the file.
Similar to the parse() method, except it takes a byte string object
instead of a file-like object. Calling this method on a byte string is
exactly equivalent to wrapping text in a BytesIO instance
first and calling parse().
Optional headersonly is as with the parse() method.
New in version 3.2:
New in version 3.2.
Since creating a message object structure from a string or a file object is such
a common task, four functions are provided as a convenience. They are available
in the top-level email package namespace.
Return a message object structure from a string. This is exactly equivalent to
Parser().parsestr(s). Optional _class and strict are interpreted as
with the Parser class constructor.
Return a message object structure from a byte string. This is exactly
equivalent to BytesParser().parsebytes(s). Optional _class and
strict are interpreted as with the Parser class constructor.
Return a message object structure tree from an open file object.
This is exactly equivalent to Parser().parse(fp). Optional _class
and strict are interpreted as with the Parser class constructor.
Return a message object structure tree from an open binary file
object. This is exactly equivalent to BytesParser().parse(fp).
Optional _class and strict are interpreted as with the Parser
class constructor.
New in version 3.2:
New in version 3.2.
Here’s an example of how you might use this at an interactive Python prompt:
Most non-multipart type messages are parsed as a single message
object with a string payload. These objects will return False for
is_multipart(). Their get_payload() method will return a string
object.
All multipart type messages will be parsed as a container message
object with a list of sub-message objects for their payload. The outer
container message will return True for is_multipart() and their
get_payload() method will return the list of Message
subparts.
Most messages with a content type of message/* (e.g.
message/delivery-status and message/rfc822) will also be
parsed as container object containing a list payload of length 1. Their
is_multipart() method will return True. The single element in the
list payload will be a sub-message object.
Some non-standards compliant messages may not be internally consistent about
their multipart-edness. Such messages may have a
Content-Type header of type multipart, but their
is_multipart() method may return False. If such messages were parsed
with the FeedParser, they will have an instance of the
MultipartInvariantViolationDefect class in their defects attribute
list. See email.errors for details.
As of email package version 3.0, introduced in Python 2.4, the classic
Parser was re-implemented in terms of the FeedParser, so the
semantics and results are identical between the two parsers.
One of the most common tasks is to generate the flat text of the email message
represented by a message object structure. You will need to do this if you want
to send your message via the smtplib module or the nntplib module,
or print the message on the console. Taking a message object structure and
producing a flat text document is the job of the Generator class.
Again, as with the email.parser module, you aren’t limited to the
functionality of the bundled generator; you could write one from scratch
yourself. However the bundled generator knows how to generate most email in a
standards-compliant way, should handle MIME and non-MIME email messages just
fine, and is designed so that the transformation from flat text, to a message
structure via the Parser class, and back to flat text,
is idempotent (the input is identical to the output). On the other hand, using
the Generator on a Message constructed by program may
result in changes to the Message object as defaults are
filled in.
bytes output can be generated using the BytesGenerator class.
If the message object structure contains non-ASCII bytes, this generator’s
flatten() method will emit the original bytes. Parsing a
binary message and then flattening it with BytesGenerator should be
idempotent for standards compliant messages.
class email.generator.Generator(outfp, mangle_from_=True, maxheaderlen=78)¶
The constructor for the Generator class takes a file-like object
called outfp for an argument. outfp must support the write() method
and be usable as the output file for the print() function.
Optional mangle_from_ is a flag that, when True, puts a > character in
front of any line in the body that starts exactly as From, i.e. From
followed by a space at the beginning of the line. This is the only guaranteed
portable way to avoid having such lines be mistaken for a Unix mailbox format
envelope header separator (see WHY THE CONTENT-LENGTH FORMAT IS BAD for details). mangle_from_
defaults to True, but you might want to set this to False if you are not
writing Unix mailbox format files.
Optional maxheaderlen specifies the longest length for a non-continued header.
When a header line is longer than maxheaderlen (in characters, with tabs
expanded to 8 spaces), the header will be split as defined in the
Header class. Set to zero to disable header wrapping.
The default is 78, as recommended (but not required) by RFC 2822.
Print the textual representation of the message object structure rooted at
msg to the output file specified when the Generator instance
was created. Subparts are visited depth-first and the resulting text will
be properly MIME encoded.
Optional unixfrom is a flag that forces the printing of the envelope
header delimiter before the first RFC 2822 header of the root message
object. If the root object has no envelope header, a standard one is
crafted. By default, this is set to False to inhibit the printing of
the envelope delimiter.
Note that for subparts, no envelope header is ever printed.
Optional linesep specifies the line separator character used to
terminate lines in the output. It defaults to \n because that is
the most useful value for Python application code (other library packages
expect \n separated lines). linesep=\r\n can be used to
generate output with RFC-compliant line separators.
Messages parsed with a Bytes parser that have a
Content-Transfer-Encoding of 8bit will be converted to a
use a 7bit Content-Transfer-Encoding. Non-ASCII bytes in the headers
will be RFC 2047 encoded with a charset of unknown-8bit.
Changed in version 3.2:
Changed in version 3.2: Added support for re-encoding 8bit message bodies, and the linesep
argument.
Write the string s to the underlying file object, i.e. outfp passed to
Generator‘s constructor. This provides just enough file-like API
for Generator instances to be used in the print() function.
As a convenience, see the Message methods
as_string() and str(aMessage), a.k.a.
__str__(), which simplify the generation of a
formatted string representation of a message object. For more detail, see
email.message.
class email.generator.BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78)¶
The constructor for the BytesGenerator class takes a binary
file-like object called outfp for an argument. outfp must
support a write() method that accepts binary data.
Optional mangle_from_ is a flag that, when True, puts a >
character in front of any line in the body that starts exactly as From,
i.e. From followed by a space at the beginning of the line. This is the
only guaranteed portable way to avoid having such lines be mistaken for a
Unix mailbox format envelope header separator (see WHY THE CONTENT-LENGTH
FORMAT IS BAD for details).
mangle_from_ defaults to True, but you might want to set this to
False if you are not writing Unix mailbox format files.
Optional maxheaderlen specifies the longest length for a non-continued
header. When a header line is longer than maxheaderlen (in characters,
with tabs expanded to 8 spaces), the header will be split as defined in the
Header class. Set to zero to disable header
wrapping. The default is 78, as recommended (but not required) by
RFC 2822.
Print the textual representation of the message object structure rooted
at msg to the output file specified when the BytesGenerator
instance was created. Subparts are visited depth-first and the resulting
text will be properly MIME encoded. If the input that created the msg
contained bytes with the high bit set and those bytes have not been
modified, they will be copied faithfully to the output, even if doing so
is not strictly RFC compliant. (To produce strictly RFC compliant
output, use the Generator class.)
Messages parsed with a Bytes parser that have a
Content-Transfer-Encoding of 8bit will be reconstructed
as 8bit if they have not been modified.
Optional unixfrom is a flag that forces the printing of the envelope
header delimiter before the first RFC 2822 header of the root message
object. If the root object has no envelope header, a standard one is
crafted. By default, this is set to False to inhibit the printing of
the envelope delimiter.
Note that for subparts, no envelope header is ever printed.
Optional linesep specifies the line separator character used to
terminate lines in the output. It defaults to \n because that is
the most useful value for Python application code (other library packages
expect \n separated lines). linesep=\r\n can be used to
generate output with RFC-compliant line separators.
Write the string s to the underlying file object. s is encoded using
the ASCII codec and written to the write method of the outfpoutfp passed to the BytesGenerator‘s constructor. This
provides just enough file-like API for BytesGenerator instances
to be used in the print() function.
New in version 3.2:
New in version 3.2.
The email.generator module also provides a derived class, called
DecodedGenerator which is like the Generator base class,
except that non-text parts are substituted with a format string
representing the part.
class email.generator.DecodedGenerator(outfp[, mangle_from_=True, maxheaderlen=78, fmt=None)¶
This class, derived from Generator walks through all the subparts of a
message. If the subpart is of main type text, then it prints the
decoded payload of the subpart. Optional _mangle_from_ and maxheaderlen are
as with the Generator base class.
If the subpart is not of main type text, optional fmt is a format
string that is used instead of the message payload. fmt is expanded with the
following keywords, %(keyword)s format:
type – Full MIME type of the non-text part
maintype – Main MIME type of the non-text part
subtype – Sub-MIME type of the non-text part
filename – Filename of the non-text part
description – Description associated with the non-text part
encoding – Content transfer encoding of the non-text part
The default value for fmt is None, meaning
[Non-text (%(type)s) part of message omitted, filename %(filename)s]
email: Creating email and MIME objects from scratch¶
Ordinarily, you get a message object structure by passing a file or some text to
a parser, which parses the text and returns the root message object. However
you can also build a complete message structure from scratch, or even individual
Message objects by hand. In fact, you can also take an
existing structure and add new Message objects, move them
around, etc. This makes a very convenient interface for slicing-and-dicing MIME
messages.
You can create a new object structure by creating Message
instances, adding attachments and all the appropriate headers manually. For MIME
messages though, the email package provides some convenient subclasses to
make things easier.
Here are the classes:
class email.mime.base.MIMEBase(_maintype, _subtype, **_params)¶
Module: email.mime.base
This is the base class for all the MIME-specific subclasses of
Message. Ordinarily you won’t create instances
specifically of MIMEBase, although you could. MIMEBase
is provided primarily as a convenient base class for more specific
MIME-aware subclasses.
_maintype is the Content-Type major type (e.g. text
or image), and _subtype is the Content-Type minor
type (e.g. plain or gif). _params is a parameter
key/value dictionary and is passed directly to Message.add_header().
The MIMEBase class always adds a Content-Type header
(based on _maintype, _subtype, and _params), and a
MIME-Version header (always set to 1.0).
A subclass of MIMEBase, this is an intermediate base
class for MIME messages that are not multipart. The primary
purpose of this class is to prevent the use of the attach() method,
which only makes sense for multipart messages. If attach()
is called, a MultipartConversionError exception is raised.
class email.mime.multipart.MIMEMultipart(_subtype='mixed', boundary=None, _subparts=None, **_params)¶
Module: email.mime.multipart
A subclass of MIMEBase, this is an intermediate base
class for MIME messages that are multipart. Optional _subtype
defaults to mixed, but can be used to specify the subtype of the
message. A Content-Type header of multipart/_subtype
will be added to the message object. A MIME-Version header will
also be added.
Optional boundary is the multipart boundary string. When None (the
default), the boundary is calculated when needed (for example, when the
message is serialized).
_subparts is a sequence of initial subparts for the payload. It must be
possible to convert this sequence to a list. You can always attach new subparts
to the message by using the Message.attach() method.
Additional parameters for the Content-Type header are taken from
the keyword arguments, or passed into the _params argument, which is a keyword
dictionary.
class email.mime.application.MIMEApplication(_data, _subtype='octet-stream', _encoder=email.encoders.encode_base64, **_params)¶
Module: email.mime.application
A subclass of MIMENonMultipart, the
MIMEApplication class is used to represent MIME message objects of
major type application. _data is a string containing the raw
byte data. Optional _subtype specifies the MIME subtype and defaults to
octet-stream.
Optional _encoder is a callable (i.e. function) which will perform the actual
encoding of the data for transport. This callable takes one argument, which is
the MIMEApplication instance. It should use get_payload() and
set_payload() to change the payload to encoded form. It should also add
any Content-Transfer-Encoding or other headers to the message
object as necessary. The default encoding is base64. See the
email.encoders module for a list of the built-in encoders.
_params are passed straight through to the base class constructor.
class email.mime.audio.MIMEAudio(_audiodata, _subtype=None, _encoder=email.encoders.encode_base64, **_params)¶
Module: email.mime.audio
A subclass of MIMENonMultipart, the
MIMEAudio class is used to create MIME message objects of major type
audio. _audiodata is a string containing the raw audio data. If
this data can be decoded by the standard Python module sndhdr, then the
subtype will be automatically included in the Content-Type header.
Otherwise you can explicitly specify the audio subtype via the _subtype
parameter. If the minor type could not be guessed and _subtype was not given,
then TypeError is raised.
Optional _encoder is a callable (i.e. function) which will perform the actual
encoding of the audio data for transport. This callable takes one argument,
which is the MIMEAudio instance. It should use get_payload() and
set_payload() to change the payload to encoded form. It should also add
any Content-Transfer-Encoding or other headers to the message
object as necessary. The default encoding is base64. See the
email.encoders module for a list of the built-in encoders.
_params are passed straight through to the base class constructor.
class email.mime.image.MIMEImage(_imagedata, _subtype=None, _encoder=email.encoders.encode_base64, **_params)¶
Module: email.mime.image
A subclass of MIMENonMultipart, the
MIMEImage class is used to create MIME message objects of major type
image. _imagedata is a string containing the raw image data. If
this data can be decoded by the standard Python module imghdr, then the
subtype will be automatically included in the Content-Type header.
Otherwise you can explicitly specify the image subtype via the _subtype
parameter. If the minor type could not be guessed and _subtype was not given,
then TypeError is raised.
Optional _encoder is a callable (i.e. function) which will perform the actual
encoding of the image data for transport. This callable takes one argument,
which is the MIMEImage instance. It should use get_payload() and
set_payload() to change the payload to encoded form. It should also add
any Content-Transfer-Encoding or other headers to the message
object as necessary. The default encoding is base64. See the
email.encoders module for a list of the built-in encoders.
_params are passed straight through to the MIMEBase
constructor.
class email.mime.message.MIMEMessage(_msg, _subtype='rfc822')¶
Module: email.mime.message
A subclass of MIMENonMultipart, the
MIMEMessage class is used to create MIME objects of main type
message. _msg is used as the payload, and must be an instance
of class Message (or a subclass thereof), otherwise
a TypeError is raised.
Optional _subtype sets the subtype of the message; it defaults to
rfc822.
class email.mime.text.MIMEText(_text, _subtype='plain', _charset='us-ascii')¶
Module: email.mime.text
A subclass of MIMENonMultipart, the
MIMEText class is used to create MIME objects of major type
text. _text is the string for the payload. _subtype is the
minor type and defaults to plain. _charset is the character
set of the text and is passed as a parameter to the
MIMENonMultipart constructor; it defaults
to us-ascii. No guessing or encoding is performed on the text data.
RFC 2822 is the base standard that describes the format of email messages.
It derives from the older RFC 822 standard which came into widespread use at
a time when most email was composed of ASCII characters only. RFC 2822 is a
specification written assuming email contains only 7-bit ASCII characters.
Of course, as email has been deployed worldwide, it has become
internationalized, such that language specific character sets can now be used in
email messages. The base standard still requires email messages to be
transferred using only 7-bit ASCII characters, so a slew of RFCs have been
written describing how to encode email containing non-ASCII characters into
RFC 2822-compliant format. These RFCs include RFC 2045, RFC 2046,
RFC 2047, and RFC 2231. The email package supports these standards
in its email.header and email.charset modules.
If you want to include non-ASCII characters in your email headers, say in the
Subject or To fields, you should use the
Header class and assign the field in the Message
object to an instance of Header instead of using a string for the header
value. Import the Header class from the email.header module.
For example:
Notice here how we wanted the Subject field to contain a non-ASCII
character? We did this by creating a Header instance and passing in
the character set that the byte string was encoded in. When the subsequent
Message instance was flattened, the Subject
field was properly RFC 2047 encoded. MIME-aware mail readers would show this
header using the embedded ISO-8859-1 character.
class email.header.Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict')¶
Create a MIME-compliant header that can contain strings in different character
sets.
Optional s is the initial header value. If None (the default), the
initial header value is not set. You can later append to the header with
append() method calls. s may be an instance of bytes or
str, but see the append() documentation for semantics.
Optional charset serves two purposes: it has the same meaning as the charset
argument to the append() method. It also sets the default character set
for all subsequent append() calls that omit the charset argument. If
charset is not provided in the constructor (the default), the us-ascii
character set is used both as s‘s initial charset and as the default for
subsequent append() calls.
The maximum line length can be specified explicitly via maxlinelen. For
splitting the first line to a shorter value (to account for the field header
which isn’t included in s, e.g. Subject) pass in the name of the
field in header_name. The default maxlinelen is 76, and the default value
for header_name is None, meaning it is not taken into account for the
first line of a long, split header.
Optional continuation_ws must be RFC 2822-compliant folding
whitespace, and is usually either a space or a hard tab character. This
character will be prepended to continuation lines. continuation_ws
defaults to a single space character.
Optional errors is passed straight through to the append() method.
Optional charset, if given, should be a Charset
instance (see email.charset) or the name of a character set, which
will be converted to a Charset instance. A value
of None (the default) means that the charset given in the constructor
is used.
s may be an instance of bytes or str. If it is an
instance of bytes, then charset is the encoding of that byte
string, and a UnicodeError will be raised if the string cannot be
decoded with that character set.
If s is an instance of str, then charset is a hint specifying
the character set of the characters in the string.
In either case, when producing an RFC 2822-compliant header using
RFC 2047 rules, the string will be encoded using the output codec of
the charset. If the string cannot be encoded using the output codec, a
UnicodeError will be raised.
Optional errors is passed as the errors argument to the decode call
if s is a byte string.
Encode a message header into an RFC-compliant format, possibly wrapping
long lines and encapsulating non-ASCII parts in base64 or quoted-printable
encodings.
Optional splitchars is a string containing characters which should be
given extra weight by the splitting algorithm during normal header
wrapping. This is in very rough support of RFC 2822‘s ‘higher level
syntactic breaks’: split points preceded by a splitchar are preferred
during line splitting, with the characters preferred in the order in
which they appear in the string. Space and tab may be included in the
string to indicate whether preference should be given to one over the
other as a split point when other split chars do not appear in the line
being split. Splitchars does not affect RFC 2047 encoded lines.
maxlinelen, if given, overrides the instance’s value for the maximum
line length.
linesep specifies the characters used to separate the lines of the
folded header. It defaults to the most useful value for Python
application code (\n), but \r\n can be specified in order
to produce headers with RFC-compliant line separators.
Changed in version 3.2:
Changed in version 3.2: Added the linesep argument.
The Header class also provides a number of methods to support
standard operators and built-in functions.
Returns an approximation of the Header as a string, using an
unlimited line length. All pieces are converted to unicode using the
specified encoding and joined together appropriately. Any pieces with a
charset of unknown-8bit are decoded as ASCII using the replace
error handler.
Changed in version 3.2:
Changed in version 3.2: Added handling for the unknown-8bit charset.
Decode a message header value without converting the character set. The header
value is in header.
This function returns a list of (decoded_string,charset) pairs containing
each of the decoded parts of the header. charset is None for non-encoded
parts of the header, otherwise a lower case string containing the name of the
character set specified in the encoded string.
decode_header() takes a header value string and returns a sequence of
pairs of the format (decoded_string,charset) where charset is the name of
the character set.
This function takes one of those sequence of pairs and returns a
Header instance. Optional maxlinelen, header_name, and
continuation_ws are as in the Header constructor.
This module provides a class Charset for representing character sets
and character set conversions in email messages, as well as a character set
registry and several convenience methods for manipulating this registry.
Instances of Charset are used in several other modules within the
email package.
class email.charset.Charset(input_charset=DEFAULT_CHARSET)¶
Map character sets to their email properties.
This class provides information about the requirements imposed on email for a
specific character set. It also provides convenience routines for converting
between character sets, given the availability of the applicable codecs. Given
a character set, it will do its best to provide information on how to use that
character set in an email message in an RFC-compliant way.
Certain character sets must be encoded with quoted-printable or base64 when used
in email headers or bodies. Certain character sets must be converted outright,
and are not allowed in email.
Optional input_charset is as described below; it is always coerced to lower
case. After being alias normalized it is also used as a lookup into the
registry of character sets to find out the header encoding, body encoding, and
output conversion codec to be used for the character set. For example, if
input_charset is iso-8859-1, then headers and bodies will be encoded using
quoted-printable and no output conversion codec is necessary. If
input_charset is euc-jp, then headers will be encoded with base64, bodies
will not be encoded, but output text will be converted from the euc-jp
character set to the iso-2022-jp character set.
Charset instances have the following data attributes:
The initial character set specified. Common aliases are converted to
their official email names (e.g. latin_1 is converted to
iso-8859-1). Defaults to 7-bit us-ascii.
If the character set must be encoded before it can be used in an email
header, this attribute will be set to Charset.QP (for
quoted-printable), Charset.BASE64 (for base64 encoding), or
Charset.SHORTEST for the shortest of QP or BASE64 encoding. Otherwise,
it will be None.
Same as header_encoding, but describes the encoding for the mail
message’s body, which indeed may be different than the header encoding.
Charset.SHORTEST is not allowed for body_encoding.
Some character sets must be converted before they can be used in email
headers or bodies. If the input_charset is one of them, this attribute
will contain the name of the character set output will be converted to.
Otherwise, it will be None.
The name of the Python codec used to convert Unicode to the
output_charset. If no conversion codec is necessary, this attribute
will have the same value as the input_codec.
Charset instances also have the following methods:
Return the content transfer encoding used for body encoding.
This is either the string quoted-printable or base64 depending on
the encoding used, or it is a function, in which case you should call the
function with a single argument, the Message object being encoded. The
function should then set the Content-Transfer-Encoding
header itself to whatever is appropriate.
Returns the string quoted-printable if body_encoding is QP,
returns the string base64 if body_encoding is BASE64, and
returns the string 7bit otherwise.
Header-encode a string by converting it first to bytes.
This is similar to header_encode() except that the string is fit
into maximum line lengths as given by the argument maxlengths, which
must be an iterator: each element returned from this iterator will provide
the next maximum line length.
charset is the input character set, and must be the canonical name of a
character set.
Optional header_enc and body_enc is either Charset.QP for
quoted-printable, Charset.BASE64 for base64 encoding,
Charset.SHORTEST for the shortest of quoted-printable or base64 encoding,
or None for no encoding. SHORTEST is only valid for
header_enc. The default is None for no encoding.
Optional output_charset is the character set that the output should be in.
Conversions will proceed from input charset, to Unicode, to the output charset
when the method Charset.convert() is called. The default is to output in
the same character set as the input.
Both input_charset and output_charset must have Unicode codec entries in the
module’s character set-to-codec mapping; use add_codec() to add codecs the
module does not know about. See the codecs module’s documentation for
more information.
The global character set registry is kept in the module global dictionary
CHARSETS.
Add a codec that map characters in the given character set to and from Unicode.
charset is the canonical name of a character set. codecname is the name of a
Python codec, as appropriate for the second argument to the str‘s
decode() method
When creating Message objects from scratch, you often
need to encode the payloads for transport through compliant mail servers. This
is especially true for image/* and text/* type messages
containing binary data.
The email package provides some convenient encodings in its
encoders module. These encoders are actually used by the
MIMEAudio and MIMEImage
class constructors to provide default encodings. All encoder functions take
exactly one argument, the message object to encode. They usually extract the
payload, encode it, and reset the payload to this newly encoded value. They
should also set the Content-Transfer-Encoding header as appropriate.
Encodes the payload into quoted-printable form and sets the
Content-Transfer-Encoding header to quoted-printable[1].
This is a good encoding to use when most of your payload is normal printable
data, but contains a few unprintable characters.
Encodes the payload into base64 form and sets the
Content-Transfer-Encoding header to base64. This is a good
encoding to use when most of your payload is unprintable data since it is a more
compact form than quoted-printable. The drawback of base64 encoding is that it
renders the text non-human readable.
This doesn’t actually modify the message’s payload, but it does set the
Content-Transfer-Encoding header to either 7bit or 8bit as
appropriate, based on the payload data.
This is the base class for all exceptions that the email package can
raise. It is derived from the standard Exception class and defines no
additional methods.
Raised under some error conditions when parsing the RFC 2822 headers of a
message, this class is derived from MessageParseError. It can be raised
from the Parser.parse() or Parser.parsestr() methods.
Situations where it can be raised include finding an envelope header after the
first RFC 2822 header of the message, finding a continuation line before the
first RFC 2822 header is found, or finding a line in the headers which is
neither a header or a continuation line.
Raised under some error conditions when parsing the RFC 2822 headers of a
message, this class is derived from MessageParseError. It can be raised
from the Parser.parse() or Parser.parsestr() methods.
Situations where it can be raised include not being able to find the starting or
terminating boundary in a multipart/* message when strict parsing
is used.
Raised when a payload is added to a Message object using
add_payload(), but the payload is already a scalar and the message’s
Content-Type main type is not either multipart or
missing. MultipartConversionError multiply inherits from
MessageError and the built-in TypeError.
Since Message.add_payload() is deprecated, this exception is rarely raised
in practice. However the exception may also be raised if the attach()
method is called on an instance of a class derived from
MIMENonMultipart (e.g.
MIMEImage).
Here’s the list of the defects that the FeedParser
can find while parsing messages. Note that the defects are added to the message
where the problem was found, so for example, if a message nested inside a
multipart/alternative had a malformed header, that nested message
object would have a defect, but the containing messages would not.
All defect classes are subclassed from email.errors.MessageDefect, but
this class is not an exception!
NoBoundaryInMultipartDefect – A message claimed to be a multipart,
but had no boundary parameter.
StartBoundaryNotFoundDefect – The start boundary claimed in the
Content-Type header was never found.
FirstHeaderLineIsContinuationDefect – The message had a continuation
line as its first header line.
MisplacedEnvelopeHeaderDefect - A “Unix From” header was found in the
middle of a header block.
MalformedHeaderDefect – A header was found that was missing a colon,
or was otherwise malformed.
MultipartInvariantViolationDefect – A message claimed to be a
multipart, but no subparts were found. Note that when a message has
this defect, its is_multipart() method may return false even though its
content type claims to be multipart.
Return a new string which is an unquoted version of str. If str ends and
begins with double quotes, they are stripped off. Likewise if str ends and
begins with angle brackets, they are stripped off.
Parse address – which should be the value of some address-containing field such
as To or Cc – into its constituent realname and
email address parts. Returns a tuple of that information, unless the parse
fails, in which case a 2-tuple of ('','') is returned.
The inverse of parseaddr(), this takes a 2-tuple of the form (realname,email_address) and returns the string value suitable for a To or
Cc header. If the first element of pair is false, then the
second element is returned unmodified.
This method returns a list of 2-tuples of the form returned by parseaddr().
fieldvalues is a sequence of header field values as might be returned by
Message.get_all(). Here’s a simple example that gets all the recipients
of a message:
Attempts to parse a date according to the rules in RFC 2822. however, some
mailers don’t follow that format as specified, so parsedate() tries to
guess correctly in such cases. date is a string containing an RFC 2822
date, such as "Mon,20Nov199519:12:08-0500". If it succeeds in parsing
the date, parsedate() returns a 9-tuple that can be passed directly to
time.mktime(); otherwise None will be returned. Note that indexes 6,
7, and 8 of the result tuple are not usable.
Performs the same function as parsedate(), but returns either None or
a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
time.mktime(), and the tenth is the offset of the date’s timezone from UTC
(which is the official term for Greenwich Mean Time) [1]. If the input string
has no timezone, the last element of the tuple returned is None. Note that
indexes 6, 7, and 8 of the result tuple are not usable.
Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp. It
the timezone item in the tuple is None, assume local time. Minor
deficiency: mktime_tz() interprets the first 8 elements of tuple as a
local time and then compensates for the timezone difference. This may yield a
slight error around changes in daylight savings time, though not worth worrying
about for common use.
Optional timeval if given is a floating point time value as accepted by
time.gmtime() and time.localtime(), otherwise the current time is
used.
Optional localtime is a flag that when True, interprets timeval, and
returns a date relative to the local timezone instead of UTC, properly taking
daylight savings time into account. The default is False meaning UTC is
used.
Optional usegmt is a flag that when True, outputs a date string with the
timezone as an ascii string GMT, rather than a numeric -0000. This is
needed for some protocols (such as HTTP). This only applies when localtime is
False. The default is False.
Returns a string suitable for an RFC 2822-compliant
Message-ID header. Optional idstring if given, is a string
used to strengthen the uniqueness of the message id. Optional domain if
given provides the portion of the msgid after the ‘@’. The default is the
local hostname. It is not normally necessary to override this default, but
may be useful certain cases, such as a constructing distributed system that
uses a consistent domain name across multiple hosts.
Encode the string s according to RFC 2231. Optional charset and
language, if given is the character set name and language name to use. If
neither is given, s is returned as-is. If charset is given but language
is not, the string is encoded using the empty string for language.
When a header parameter is encoded in RFC 2231 format,
Message.get_param() may return a 3-tuple containing the character set,
language, and value. collapse_rfc2231_value() turns this into a unicode
string. Optional errors is passed to the errors argument of str‘s
encode() method; it defaults to 'replace'. Optional
fallback_charset specifies the character set to use if the one in the
RFC 2231 header is not known by Python; it defaults to 'us-ascii'.
For convenience, if the value passed to collapse_rfc2231_value() is not
a tuple, it should be a string and it is returned unquoted.
Note that the sign of the timezone offset is the opposite of the sign of the
time.timezone variable for the same timezone; the latter variable follows
the POSIX standard while this module follows RFC 2822.
Iterating over a message object tree is fairly easy with the
Message.walk() method. The email.iterators module provides some
useful higher level iterations over message object trees.
This iterates over all the payloads in all the subparts of msg, returning the
string payloads line-by-line. It skips over all the subpart headers, and it
skips over any subpart with a payload that isn’t a Python string. This is
somewhat equivalent to reading the flat text representation of the message from
a file using readline(), skipping over all the intervening headers.
Optional decode is passed through to Message.get_payload().
This iterates over all the subparts of msg, returning only those subparts that
match the MIME type specified by maintype and subtype.
Note that subtype is optional; if omitted, then subpart MIME type matching is
done only with the main type. maintype is optional too; it defaults to
text.
Optional fp is a file-like object to print the output to. It must be
suitable for Python’s print() function. level is used internally.
include_default, if true, prints the default type as well.
Here are a few examples of how to use the email package to read, write,
and send simple email messages, as well as more complex MIME messages.
First, let’s see how to create and send a simple text message:
# Import smtplib for the actual sending functionimportsmtplib# Import the email modules we'll needfromemail.mime.textimportMIMEText# Open a plain text file for reading. For this example, assume that# the text file contains only ASCII characters.fp=open(textfile,'rb')# Create a text/plain messagemsg=MIMEText(fp.read())fp.close()# me == the sender's email address# you == the recipient's email addressmsg['Subject']='The contents of %s'%textfilemsg['From']=memsg['To']=you# Send the message via our own SMTP server.s=smtplib.SMTP('localhost')s.send_message(msg)s.quit()
And parsing RFC822 headers can easily be done by the parse(filename) or
parsestr(message_as_string) methods of the Parser() class:
# Import the email modules we'll needfromemail.parserimportParser# If the e-mail headers are in a file, uncomment this line:#headers = Parser().parse(open(messagefile, 'r'))# Or for parsing headers in a string, use:headers=Parser().parsestr('From: <user@example.com>\n''To: <someone_else@example.com>\n''Subject: Test message\n''\n''Body would go here\n')# Now the header items can be accessed as a dictionary:print('To: %s'%headers['to'])print('From: %s'%headers['from'])print('Subject: %s'%headers['subject'])
Here’s an example of how to send a MIME message containing a bunch of family
pictures that may be residing in a directory:
# Import smtplib for the actual sending functionimportsmtplib# Here are the email package modules we'll needfromemail.mime.imageimportMIMEImagefromemail.mime.multipartimportMIMEMultipartCOMMASPACE=', '# Create the container (outer) email message.msg=MIMEMultipart()msg['Subject']='Our family reunion'# me == the sender's email address# family = the list of all recipients' email addressesmsg['From']=memsg['To']=COMMASPACE.join(family)msg.preamble='Our family reunion'# Assume we know that the image files are all in PNG formatforfileinpngfiles:# Open the files in binary mode. Let the MIMEImage class automatically# guess the specific image type.fp=open(file,'rb')img=MIMEImage(fp.read())fp.close()msg.attach(img)# Send the email via our own SMTP server.s=smtplib.SMTP('localhost')s.send_message(msg)s.quit()
Here’s an example of how to send the entire contents of a directory as an email
message: [1]
#!/usr/bin/env python3"""Send the contents of a directory as a MIME message."""importosimportsysimportsmtplib# For guessing MIME type based on file name extensionimportmimetypesfromoptparseimportOptionParserfromemailimportencodersfromemail.messageimportMessagefromemail.mime.audioimportMIMEAudiofromemail.mime.baseimportMIMEBasefromemail.mime.imageimportMIMEImagefromemail.mime.multipartimportMIMEMultipartfromemail.mime.textimportMIMETextCOMMASPACE=', 'defmain():parser=OptionParser(usage="""\Send the contents of a directory as a MIME message.Usage: %prog [options]Unless the -o option is given, the email is sent by forwarding to your localSMTP server, which then does the normal delivery process. Your local machinemust be running an SMTP server.""")parser.add_option('-d','--directory',type='string',action='store',help="""Mail the contents of the specified directory, otherwise use the current directory. Only the regular files in the directory are sent, and we don't recurse to subdirectories.""")parser.add_option('-o','--output',type='string',action='store',metavar='FILE',help="""Print the composed message to FILE instead of sending the message to the SMTP server.""")parser.add_option('-s','--sender',type='string',action='store',metavar='SENDER',help='The value of the From: header (required)')parser.add_option('-r','--recipient',type='string',action='append',metavar='RECIPIENT',default=[],dest='recipients',help='A To: header value (at least one required)')opts,args=parser.parse_args()ifnotopts.senderornotopts.recipients:parser.print_help()sys.exit(1)directory=opts.directoryifnotdirectory:directory='.'# Create the enclosing (outer) messageouter=MIMEMultipart()outer['Subject']='Contents of directory %s'%os.path.abspath(directory)outer['To']=COMMASPACE.join(opts.recipients)outer['From']=opts.senderouter.preamble='You will not see this in a MIME-aware mail reader.\n'forfilenameinos.listdir(directory):path=os.path.join(directory,filename)ifnotos.path.isfile(path):continue# Guess the content type based on the file's extension. Encoding# will be ignored, although we should check for simple things like# gzip'd or compressed files.ctype,encoding=mimetypes.guess_type(path)ifctypeisNoneorencodingisnotNone:# No guess could be made, or the file is encoded (compressed), so# use a generic bag-of-bits type.ctype='application/octet-stream'maintype,subtype=ctype.split('/',1)ifmaintype=='text':fp=open(path)# Note: we should handle calculating the charsetmsg=MIMEText(fp.read(),_subtype=subtype)fp.close()elifmaintype=='image':fp=open(path,'rb')msg=MIMEImage(fp.read(),_subtype=subtype)fp.close()elifmaintype=='audio':fp=open(path,'rb')msg=MIMEAudio(fp.read(),_subtype=subtype)fp.close()else:fp=open(path,'rb')msg=MIMEBase(maintype,subtype)msg.set_payload(fp.read())fp.close()# Encode the payload using Base64encoders.encode_base64(msg)# Set the filename parametermsg.add_header('Content-Disposition','attachment',filename=filename)outer.attach(msg)# Now send or store the messagecomposed=outer.as_string()ifopts.output:fp=open(opts.output,'w')fp.write(composed)fp.close()else:s=smtplib.SMTP('localhost')s.sendmail(opts.sender,opts.recipients,composed)s.quit()if__name__=='__main__':main()
Here’s an example of how to unpack a MIME message like the one
above, into a directory of files:
#!/usr/bin/env python3"""Unpack a MIME message into a directory of files."""importosimportsysimportemailimporterrnoimportmimetypesfromoptparseimportOptionParserdefmain():parser=OptionParser(usage="""\Unpack a MIME message into a directory of files.Usage: %prog [options] msgfile""")parser.add_option('-d','--directory',type='string',action='store',help="""Unpack the MIME message into the named directory, which will be created if it doesn't already exist.""")opts,args=parser.parse_args()ifnotopts.directory:parser.print_help()sys.exit(1)try:msgfile=args[0]exceptIndexError:parser.print_help()sys.exit(1)try:os.mkdir(opts.directory)exceptOSErrorase:# Ignore directory exists errorife.errno!=errno.EEXIST:raisefp=open(msgfile)msg=email.message_from_file(fp)fp.close()counter=1forpartinmsg.walk():# multipart/* are just containersifpart.get_content_maintype()=='multipart':continue# Applications should really sanitize the given filename so that an# email message can't be used to overwrite important filesfilename=part.get_filename()ifnotfilename:ext=mimetypes.guess_extension(part.get_content_type())ifnotext:# Use a generic bag-of-bits extensionext='.bin'filename='part-%03d%s'%(counter,ext)counter+=1fp=open(os.path.join(opts.directory,filename),'wb')fp.write(part.get_payload(decode=True))fp.close()if__name__=='__main__':main()
Here’s an example of how to create an HTML message with an alternative plain
text version: [2]
#!/usr/bin/env python3importsmtplibfromemail.mime.multipartimportMIMEMultipartfromemail.mime.textimportMIMEText# me == my email address# you == recipient's email addressme="my@email.com"you="your@email.com"# Create message container - the correct MIME type is multipart/alternative.msg=MIMEMultipart('alternative')msg['Subject']="Link"msg['From']=memsg['To']=you# Create the body of the message (a plain-text and an HTML version).text="Hi!\nHow are you?\nHere is the link you wanted:\nhttp://www.python.org"html="""\<html> <head></head> <body> <p>Hi!<br> How are you?<br> Here is the <a href="http://www.python.org">link</a> you wanted. </p> </body></html>"""# Record the MIME types of both parts - text/plain and text/html.part1=MIMEText(text,'plain')part2=MIMEText(html,'html')# Attach parts into message container.# According to RFC 2046, the last part of a multipart message, in this case# the HTML message, is best and preferred.msg.attach(part1)msg.attach(part2)# Send the message via local SMTP server.s=smtplib.SMTP('localhost')# sendmail function takes 3 arguments: sender's address, recipient's address# and message to send - here it is sent as one string.s.sendmail(me,you,msg.as_string())s.quit()
This table describes the release history of the email package, corresponding to
the version of Python that the package was released with. For purposes of this
document, when you see a note about change or added versions, these refer to the
Python version the change was made in, not the email package version. This
table also describes the Python compatibility of each version of the package.
email version
distributed with
compatible with
1.x
Python 2.2.0 to Python 2.2.1
no longer supported
2.5
Python 2.2.2+ and Python 2.3
Python 2.1 to 2.5
3.0
Python 2.4
Python 2.3 to 2.5
4.0
Python 2.5
Python 2.3 to 2.5
5.0
Python 3.0 and Python 3.1
Python 3.0 to 3.2
5.1
Python 3.2
Python 3.0 to 3.2
Here are the major differences between email version 5.1 and
version 5.0:
It is once again possible to parse messages containing non-ASCII bytes,
and to reproduce such messages if the data containing the non-ASCII
bytes is not modified.
Given bytes input to the model, get_payload()
will by default decode a message body that has a
Content-Transfer-Encoding of 8bit using the charset
specified in the MIME headers and return the resulting string.
Given bytes input to the model, Generator will
convert message bodies that have a Content-Transfer-Encoding of
8bit to instead have a 7bit Content-Transfer-Encoding.
New class BytesGenerator produces bytes
as output, preserving any unchanged non-ASCII data that was
present in the input used to build the model, including message bodies
with a Content-Transfer-Encoding of 8bit.
Here are the major differences between email version 5.0 and version 4:
All operations are on unicode strings. Text inputs must be strings,
text outputs are strings. Outputs are limited to the ASCII character
set and so can be encoded to ASCII for transmission. Inputs are also
limited to ASCII; this is an acknowledged limitation of email 5.0 and
means it can only be used to parse email that is 7bit clean.
Here are the major differences between email version 4 and version 3:
All modules have been renamed according to PEP 8 standards. For example,
the version 3 module email.Message was renamed to email.message in
version 4.
A new subpackage email.mime was added and all the version 3
email.MIME* modules were renamed and situated into the email.mime
subpackage. For example, the version 3 module email.MIMEText was renamed
to email.mime.text.
Note that the version 3 names will continue to work until Python 2.6.
The email.mime.application module was added, which contains the
MIMEApplication class.
Methods that were deprecated in version 3 have been removed. These include
Generator.__call__(), Message.get_type(),
Message.get_main_type(), Message.get_subtype().
Fixes have been added for RFC 2231 support which can change some of the
return types for Message.get_param() and friends. Under some
circumstances, values which used to return a 3-tuple now return simple strings
(specifically, if all extended parameter segments were unencoded, there is no
language and charset designation expected, so the return type is now a simple
string). Also, %-decoding used to be done for both encoded and unencoded
segments; this decoding is now done only for encoded segments.
Here are the major differences between email version 3 and version 2:
The FeedParser class was introduced, and the Parser class
was implemented in terms of the FeedParser. All parsing therefore is
non-strict, and parsing will make a best effort never to raise an exception.
Problems found while parsing messages are stored in the message’s defect
attribute.
All aspects of the API which raised DeprecationWarnings in version 2
have been removed. These include the _encoder argument to the
MIMEText constructor, the Message.add_payload() method, the
Utils.dump_address_pair() function, and the functions Utils.decode()
and Utils.encode().
New DeprecationWarnings have been added to:
Generator.__call__(), Message.get_type(),
Message.get_main_type(), Message.get_subtype(), and the strict
argument to the Parser class. These are expected to be removed in
future versions.
Support for Pythons earlier than 2.3 has been removed.
Here are the differences between email version 2 and version 1:
The email.Header and email.Charset modules have been added.
The pickle format for Message instances has changed. Since this was
never (and still isn’t) formally defined, this isn’t considered a backward
incompatibility. However if your application pickles and unpickles
Message instances, be aware that in email version 2,
Message instances now have private variables _charset and
_default_type.
Several methods in the Message class have been deprecated, or their
signatures changed. Also, many new methods have been added. See the
documentation for the Message class for details. The changes should be
completely backward compatible.
The object structure has changed in the face of message/rfc822
content types. In email version 1, such a type would be represented by a
scalar payload, i.e. the container message’s is_multipart() returned
false, get_payload() was not a list object, but a single Message
instance.
This structure was inconsistent with the rest of the package, so the object
representation for message/rfc822 content types was changed. In
email version 2, the container does return True from
is_multipart(), and get_payload() returns a list containing a single
Message item.
Note that this is one place that backward compatibility could not be completely
maintained. However, if you’re already testing the return type of
get_payload(), you should be fine. You just need to make sure your code
doesn’t do a set_payload() with a Message instance on a container
with a content type of message/rfc822.
The Parser constructor’s strict argument was added, and its
parse() and parsestr() methods grew a headersonly argument. The
strict flag was also added to functions email.message_from_file() and
email.message_from_string().
Generator.__call__() is deprecated; use Generator.flatten()
instead. The Generator class has also grown the clone() method.
The DecodedGenerator class in the email.Generator module was
added.
The intermediate base classes MIMENonMultipart and
MIMEMultipart have been added, and interposed in the class hierarchy
for most of the other MIME-related derived classes.
The _encoder argument to the MIMEText constructor has been
deprecated. Encoding now happens implicitly based on the _charset argument.
The following functions in the email.Utils module have been deprecated:
dump_address_pairs(), decode(), and encode(). The following
functions have been added to the module: make_msgid(),
decode_rfc2231(), encode_rfc2231(), and decode_params().
The non-public function email.Iterators._structure() was added.
The email package was originally prototyped as a separate library called
mimelib. Changes have been made so that method names
are more consistent, and some methods or modules have either been added or
removed. The semantics of some of the methods have also changed. For the most
part, any functionality available in mimelib is still available in the
email package, albeit often in a different way. Backward compatibility
between the mimelib package and the email package was not a
priority.
Here is a brief description of the differences between the mimelib and
the email packages, along with hints on how to port your applications.
Of course, the most visible difference between the two packages is that the
package name has been changed to email. In addition, the top-level
package has the following differences:
The method ismultipart() was renamed to is_multipart().
The get_payload() method has grown a decode optional argument.
The method getall() was renamed to get_all().
The method addheader() was renamed to add_header().
The method gettype() was renamed to get_type().
The method getmaintype() was renamed to get_main_type().
The method getsubtype() was renamed to get_subtype().
The method getparams() was renamed to get_params(). Also, whereas
getparams() returned a list of strings, get_params() returns a list
of 2-tuples, effectively the key/value pairs of the parameters, split on the
'=' sign.
The method getparam() was renamed to get_param().
The method getcharsets() was renamed to get_charsets().
The method getfilename() was renamed to get_filename().
The method getboundary() was renamed to get_boundary().
The method setboundary() was renamed to set_boundary().
The method getdecodedpayload() was removed. To get similar
functionality, pass the value 1 to the decode flag of the get_payload()
method.
The method getpayloadastext() was removed. Similar functionality is
supported by the DecodedGenerator class in the email.generator
module.
The method getbodyastext() was removed. You can get similar
functionality by creating an iterator with typed_subpart_iterator() in the
email.iterators module.
The Parser class has no differences in its public interface. It does
have some additional smarts to recognize message/delivery-status
type messages, which it represents as a Message instance containing
separate Message subparts for each header block in the delivery status
notification [1].
The Generator class has no differences in its public interface. There
is a new class in the email.generator module though, called
DecodedGenerator which provides most of the functionality previously
available in the Message.getpayloadastext() method.
The following modules and classes have been changed:
The MIMEBase class constructor arguments _major and _minor have
changed to _maintype and _subtype respectively.
The Image class/module has been renamed to MIMEImage. The _minor
argument has been renamed to _subtype.
The Text class/module has been renamed to MIMEText. The _minor
argument has been renamed to _subtype.
The MessageRFC822 class/module has been renamed to MIMEMessage. Note
that an earlier version of mimelib called this class/module RFC822,
but that clashed with the Python standard library module rfc822 on some
case-insensitive file systems.
Also, the MIMEMessage class now represents any kind of MIME message
with main type message. It takes an optional argument _subtype
which is used to set the MIME subtype. _subtype defaults to
rfc822.
mimelib provided some utility functions in its address and
date modules. All of these functions have been moved to the
email.utils module.
The MsgReader class/module has been removed. Its functionality is most
closely supported in the body_line_iterator() function in the
email.iterators module.
Serialize obj as a JSON formatted stream to fp (a .write()-supporting
file-like object).
If skipkeys is True (default: False), then dict keys that are not
of a basic type (str, int, float, bool,
None) will be skipped instead of raising a TypeError.
The json module always produces str objects, not
bytes objects. Therefore, fp.write() must support str
input.
If check_circular is False (default: True), then the circular
reference check for container types will be skipped and a circular reference
will result in an OverflowError (or worse).
If allow_nan is False (default: True), then it will be a
ValueError to serialize out of range float values (nan,
inf, -inf) in strict compliance of the JSON specification, instead of
using the JavaScript equivalents (NaN, Infinity, -Infinity).
If indent is a non-negative integer or string, then JSON array elements and
object members will be pretty-printed with that indent level. An indent level
of 0, negative, or "" will only insert newlines. None (the default)
selects the most compact representation. Using a positive integer indent
indents that many spaces per level. If indent is a string (such at ‘t’),
that string is used to indent each level.
If separators is an (item_separator,dict_separator) tuple, then it
will be used instead of the default (',',':') separators. (',',':') is the most compact JSON representation.
default(obj) is a function that should return a serializable version of
obj or raise TypeError. The default simply raises TypeError.
To use a custom JSONEncoder subclass (e.g. one that overrides the
default() method to serialize additional types), specify it with the
cls kwarg; otherwise JSONEncoder is used.
Serialize obj to a JSON formatted str. The arguments have the
same meaning as in dump().
Note
Unlike pickle and marshal, JSON is not a framed protocol,
so trying to serialize multiple objects with repeated calls to
dump() using the same fp will result in an invalid JSON file.
Deserialize fp (a .read()-supporting file-like object containing a JSON
document) to a Python object.
object_hook is an optional function that will be called with the result of
any object literal decoded (a dict). The return value of
object_hook will be used instead of the dict. This feature can be used
to implement custom decoders (e.g. JSON-RPC class hinting).
object_pairs_hook is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of object_pairs_hook will be used instead of the
dict. This feature can be used to implement custom decoders that
rely on the order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hook takes priority.
Changed in version 3.1:
Changed in version 3.1: Added support for object_pairs_hook.
parse_float, if specified, will be called with the string of every JSON
float to be decoded. By default, this is equivalent to float(num_str).
This can be used to use another datatype or parser for JSON floats
(e.g. decimal.Decimal).
parse_int, if specified, will be called with the string of every JSON int
to be decoded. By default, this is equivalent to int(num_str). This can
be used to use another datatype or parser for JSON integers
(e.g. float).
parse_constant, if specified, will be called with one of the following
strings: '-Infinity', 'Infinity', 'NaN', 'null', 'true',
'false'. This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom JSONDecoder subclass, specify it with the cls
kwarg; otherwise JSONDecoder is used. Additional keyword arguments
will be passed to the constructor of the class.
class json.JSONDecoder(object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)¶
Simple JSON decoder.
Performs the following translations in decoding by default:
JSON
Python
object
dict
array
list
string
str
number (int)
int
number (real)
float
true
True
false
False
null
None
It also understands NaN, Infinity, and -Infinity as their
corresponding float values, which is outside the JSON spec.
object_hook, if specified, will be called with the result of every JSON
object decoded and its return value will be used in place of the given
dict. This can be used to provide custom deserializations (e.g. to
support JSON-RPC class hinting).
object_pairs_hook, if specified will be called with the result of every
JSON object decoded with an ordered list of pairs. The return value of
object_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the order
that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hook takes priority.
Changed in version 3.1:
Changed in version 3.1: Added support for object_pairs_hook.
parse_float, if specified, will be called with the string of every JSON
float to be decoded. By default, this is equivalent to float(num_str).
This can be used to use another datatype or parser for JSON floats
(e.g. decimal.Decimal).
parse_int, if specified, will be called with the string of every JSON int
to be decoded. By default, this is equivalent to int(num_str). This can
be used to use another datatype or parser for JSON integers
(e.g. float).
parse_constant, if specified, will be called with one of the following
strings: '-Infinity', 'Infinity', 'NaN', 'null', 'true',
'false'. This can be used to raise an exception if invalid JSON numbers
are encountered.
If strict is False (True is the default), then control characters
will be allowed inside strings. Control characters in this context are
those with character codes in the 0-31 range, including '\t' (tab),
'\n', '\r' and '\0'.
Decode a JSON document from s (a str beginning with a
JSON document) and return a 2-tuple of the Python representation
and the index in s where the document ended.
This can be used to decode a JSON document from a string that may have
extraneous data at the end.
class json.JSONEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶
Extensible JSON encoder for Python data structures.
Supports the following objects and types by default:
Python
JSON
dict
object
list, tuple
array
str
string
int, float
number
True
true
False
false
None
null
To extend this to recognize other objects, subclass and implement a
default() method with another method that returns a serializable object
for o if possible, otherwise it should call the superclass implementation
(to raise TypeError).
If skipkeys is False (the default), then it is a TypeError to
attempt encoding of keys that are not str, int, float or None. If
skipkeys is True, such items are simply skipped.
If ensure_ascii is True (the default), the output is guaranteed to
have all incoming non-ASCII characters escaped. If ensure_ascii is
False, these characters will be output as-is.
If check_circular is True (the default), then lists, dicts, and custom
encoded objects will be checked for circular references during encoding to
prevent an infinite recursion (which would cause an OverflowError).
Otherwise, no such check takes place.
If allow_nan is True (the default), then NaN, Infinity, and
-Infinity will be encoded as such. This behavior is not JSON
specification compliant, but is consistent with most JavaScript based
encoders and decoders. Otherwise, it will be a ValueError to encode
such floats.
If sort_keys is True (default False), then the output of dictionaries
will be sorted by key; this is useful for regression tests to ensure that
JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer (it is None by default), then JSON
array elements and object members will be pretty-printed with that indent
level. An indent level of 0 will only insert newlines. None is the most
compact representation.
If specified, separators should be an (item_separator,key_separator)
tuple. The default is (',',':'). To get the most compact JSON
representation, you should specify (',',':') to eliminate whitespace.
If specified, default is a function that gets called for objects that can’t
otherwise be serialized. It should return a JSON encodable version of the
object or raise a TypeError.
Mailcap files are used to configure how MIME-aware applications such as mail
readers and Web browsers react to files with different MIME types. (The name
“mailcap” is derived from the phrase “mail capability”.) For example, a mailcap
file might contain a line like video/mpeg;xmpeg%s. Then, if the user
encounters an email message or Web document with the MIME type
video/mpeg, %s will be replaced by a filename (usually one
belonging to a temporary file) and the xmpeg program can be
automatically started to view the file.
The mailcap format is documented in RFC 1524, “A User Agent Configuration
Mechanism For Multimedia Mail Format Information,” but is not an Internet
standard. However, mailcap files are supported on most Unix systems.
Return a 2-tuple; the first element is a string containing the command line to
be executed (which can be passed to os.system()), and the second element
is the mailcap entry for a given MIME type. If no matching MIME type can be
found, (None,None) is returned.
key is the name of the field desired, which represents the type of activity to
be performed; the default value is ‘view’, since in the most common case you
simply want to view the body of the MIME-typed data. Other possible values
might be ‘compose’ and ‘edit’, if you wanted to create a new body of the given
MIME type or alter the existing body data. See RFC 1524 for a complete list
of these fields.
filename is the filename to be substituted for %s in the command line; the
default value is '/dev/null' which is almost certainly not what you want, so
usually you’ll override it by specifying a filename.
plist can be a list containing named parameters; the default value is simply
an empty list. Each entry in the list must be a string containing the parameter
name, an equals sign ('='), and the parameter’s value. Mailcap entries can
contain named parameters like %{foo}, which will be replaced by the value
of the parameter named ‘foo’. For example, if the command line showpartial%{id}%{number}%{total} was in a mailcap file, and plist was set to
['id=1','number=2','total=3'], the resulting command line would be
'showpartial123'.
In a mailcap file, the “test” field can optionally be specified to test some
external condition (such as the machine architecture, or the window system in
use) to determine whether or not the mailcap line applies. findmatch()
will automatically check such conditions and skip the entry if the check fails.
Returns a dictionary mapping MIME types to a list of mailcap file entries. This
dictionary must be passed to the findmatch() function. An entry is stored
as a list of dictionaries, but it shouldn’t be necessary to know the details of
this representation.
The information is derived from all of the mailcap files found on the system.
Settings in the user’s mailcap file $HOME/.mailcap will override
settings in the system mailcap files /etc/mailcap,
/usr/etc/mailcap, and /usr/local/etc/mailcap.
mailbox — Manipulate mailboxes in various formats¶
This module defines two classes, Mailbox and Message, for
accessing and manipulating on-disk mailboxes and the messages they contain.
Mailbox offers a dictionary-like mapping from keys to messages.
Message extends the email.Message module’s Message
class with format-specific state and behavior. Supported mailbox formats are
Maildir, mbox, MH, Babyl, and MMDF.
The Mailbox class defines an interface and is not intended to be
instantiated. Instead, format-specific subclasses should inherit from
Mailbox and your code should instantiate a particular subclass.
The Mailbox interface is dictionary-like, with small keys
corresponding to messages. Keys are issued by the Mailbox instance
with which they will be used and are only meaningful to that Mailbox
instance. A key continues to identify a message even if the corresponding
message is modified, such as by replacing it with another message.
Messages may be added to a Mailbox instance using the set-like
method add() and removed using a del statement or the set-like
methods remove() and discard().
Mailbox interface semantics differ from dictionary semantics in some
noteworthy ways. Each time a message is requested, a new representation
(typically a Message instance) is generated based upon the current
state of the mailbox. Similarly, when a message is added to a
Mailbox instance, the provided message representation’s contents are
copied. In neither case is a reference to the message representation kept by
the Mailbox instance.
The default Mailbox iterator iterates over message representations,
not keys as the default dictionary iterator does. Moreover, modification of a
mailbox during iteration is safe and well-defined. Messages added to the
mailbox after an iterator is created will not be seen by the
iterator. Messages removed from the mailbox before the iterator yields them
will be silently skipped, though using a key from an iterator may result in a
KeyError exception if the corresponding message is subsequently
removed.
Warning
Be very cautious when modifying mailboxes that might be simultaneously
changed by some other process. The safest mailbox format to use for such
tasks is Maildir; try to avoid using single-file formats such as mbox for
concurrent writing. If you’re modifying a mailbox, you must lock it by
calling the lock() and unlock() methods before reading any
messages in the file or making any changes by adding or deleting a
message. Failing to lock the mailbox runs the risk of losing messages or
corrupting the entire mailbox.
Add message to the mailbox and return the key that has been assigned to
it.
Parameter message may be a Message instance, an
email.Message.Message instance, a string, a byte string, or a
file-like object (which should be open in binary mode). If message is
an instance of the
appropriate format-specific Message subclass (e.g., if it’s an
mboxMessage instance and this is an mbox instance), its
format-specific information is used. Otherwise, reasonable defaults for
format-specific information are used.
Delete the message corresponding to key from the mailbox.
If no such message exists, a KeyError exception is raised if the
method was called as remove() or __delitem__() but no
exception is raised if the method was called as discard(). The
behavior of discard() may be preferred if the underlying mailbox
format supports concurrent modification by other processes.
Replace the message corresponding to key with message. Raise a
KeyError exception if no message already corresponds to key.
As with add(), parameter message may be a Message
instance, an email.Message.Message instance, a string, a byte
string, or a file-like object (which should be open in binary mode). If
message is an
instance of the appropriate format-specific Message subclass
(e.g., if it’s an mboxMessage instance and this is an
mbox instance), its format-specific information is
used. Otherwise, the format-specific information of the message that
currently corresponds to key is left unchanged.
Return an iterator over representations of all messages if called as
itervalues() or __iter__() or return a list of such
representations if called as values(). The messages are represented
as instances of the appropriate format-specific Message subclass
unless a custom message factory was specified when the Mailbox
instance was initialized.
Note
The behavior of __iter__() is unlike that of dictionaries, which
iterate over keys.
Return an iterator over (key, message) pairs, where key is a key and
message is a message representation, if called as iteritems() or
return a list of such pairs if called as items(). The messages are
represented as instances of the appropriate format-specific
Message subclass unless a custom message factory was specified
when the Mailbox instance was initialized.
Return a representation of the message corresponding to key. If no such
message exists, default is returned if the method was called as
get() and a KeyError exception is raised if the method was
called as __getitem__(). The message is represented as an instance
of the appropriate format-specific Message subclass unless a
custom message factory was specified when the Mailbox instance
was initialized.
Return a representation of the message corresponding to key as an
instance of the appropriate format-specific Message subclass, or
raise a KeyError exception if no such message exists.
Return a string representation of the message corresponding to key, or
raise a KeyError exception if no such message exists. The
message is processed through email.message.Message to
convert it to a 7bit clean representation.
Return a file-like representation of the message corresponding to key,
or raise a KeyError exception if no such message exists. The
file-like object behaves as if open in binary mode. This file should be
closed once it is no longer needed.
Changed in version 3.2:
Changed in version 3.2: The file object really is a binary file; previously it was incorrectly
returned in text mode. Also, the file-like object now supports the
context manager protocol: you can use a with statement to
automatically close it.
Note
Unlike other representations of messages, file-like representations are
not necessarily independent of the Mailbox instance that
created them or of the underlying mailbox. More specific documentation
is provided by each subclass.
Return a representation of the message corresponding to key and delete
the message. If no such message exists, return default. The message is
represented as an instance of the appropriate format-specific
Message subclass unless a custom message factory was specified
when the Mailbox instance was initialized.
Return an arbitrary (key, message) pair, where key is a key and
message is a message representation, and delete the corresponding
message. If the mailbox is empty, raise a KeyError exception. The
message is represented as an instance of the appropriate format-specific
Message subclass unless a custom message factory was specified
when the Mailbox instance was initialized.
Parameter arg should be a key-to-message mapping or an iterable of
(key, message) pairs. Updates the mailbox so that, for each given
key and message, the message corresponding to key is set to
message as if by using __setitem__(). As with __setitem__(),
each key must already correspond to a message in the mailbox or else a
KeyError exception will be raised, so in general it is incorrect
for arg to be a Mailbox instance.
Note
Unlike with dictionaries, keyword arguments are not supported.
Write any pending changes to the filesystem. For some Mailbox
subclasses, changes are always written immediately and flush() does
nothing, but you should still make a habit of calling this method.
Acquire an exclusive advisory lock on the mailbox so that other processes
know not to modify it. An ExternalClashError is raised if the lock
is not available. The particular locking mechanisms used depend upon the
mailbox format. You should always lock the mailbox before making any
modifications to its contents.
class mailbox.Maildir(dirname, factory=None, create=True)¶
A subclass of Mailbox for mailboxes in Maildir format. Parameter
factory is a callable object that accepts a file-like message representation
(which behaves as if opened in binary mode) and returns a custom representation.
If factory is None, MaildirMessage is used as the default message
representation. If create is True, the mailbox is created if it does not
exist.
It is for historical reasons that dirname is named as such rather than path.
Maildir is a directory-based mailbox format invented for the qmail mail
transfer agent and now widely supported by other programs. Messages in a
Maildir mailbox are stored in separate files within a common directory
structure. This design allows Maildir mailboxes to be accessed and modified
by multiple unrelated programs without data corruption, so file locking is
unnecessary.
Maildir mailboxes contain three subdirectories, namely: tmp,
new, and cur. Messages are created momentarily in the
tmp subdirectory and then moved to the new subdirectory to
finalize delivery. A mail user agent may subsequently move the message to the
cur subdirectory and store information about the state of the message
in a special “info” section appended to its file name.
Folders of the style introduced by the Courier mail transfer agent are also
supported. Any subdirectory of the main mailbox is considered a folder if
'.' is the first character in its name. Folder names are represented by
Maildir without the leading '.'. Each folder is itself a Maildir
mailbox but should not contain other folders. Instead, a logical nesting is
indicated using '.' to delimit levels, e.g., “Archived.2005.07”.
Note
The Maildir specification requires the use of a colon (':') in certain
message file names. However, some operating systems do not permit this
character in file names, If you wish to use a Maildir-like format on such
an operating system, you should specify another character to use
instead. The exclamation point ('!') is a popular choice. For
example:
importmailboxmailbox.Maildir.colon='!'
The colon attribute may also be set on a per-instance basis.
Maildir instances have all of the methods of Mailbox in
addition to the following:
Delete the folder whose name is folder. If the folder contains any
messages, a NotEmptyError exception will be raised and the folder
will not be deleted.
Delete temporary files from the mailbox that have not been accessed in the
last 36 hours. The Maildir specification says that mail-reading programs
should do this occasionally.
Some Mailbox methods implemented by Maildir deserve special
remarks:
These methods generate unique file names based upon the current process
ID. When using multiple threads, undetected name clashes may occur and
cause corruption of the mailbox unless threads are coordinated to avoid
using these methods to manipulate the same mailbox simultaneously.
class mailbox.mbox(path, factory=None, create=True)¶
A subclass of Mailbox for mailboxes in mbox format. Parameter factory
is a callable object that accepts a file-like message representation (which
behaves as if opened in binary mode) and returns a custom representation. If
factory is None, mboxMessage is used as the default message
representation. If create is True, the mailbox is created if it does not
exist.
The mbox format is the classic format for storing mail on Unix systems. All
messages in an mbox mailbox are stored in a single file with the beginning of
each message indicated by a line whose first five characters are “From ”.
Several variations of the mbox format exist to address perceived shortcomings in
the original. In the interest of compatibility, mbox implements the
original format, which is sometimes referred to as mboxo. This means that
the Content-Length header, if present, is ignored and that any
occurrences of “From ” at the beginning of a line in a message body are
transformed to “>From ” when storing the message, although occurrences of “>From
” are not transformed to “From ” when reading the message.
Some Mailbox methods implemented by mbox deserve special
remarks:
class mailbox.MH(path, factory=None, create=True)¶
A subclass of Mailbox for mailboxes in MH format. Parameter factory
is a callable object that accepts a file-like message representation (which
behaves as if opened in binary mode) and returns a custom representation. If
factory is None, MHMessage is used as the default message
representation. If create is True, the mailbox is created if it does not
exist.
MH is a directory-based mailbox format invented for the MH Message Handling
System, a mail user agent. Each message in an MH mailbox resides in its own
file. An MH mailbox may contain other MH mailboxes (called folders) in
addition to messages. Folders may be nested indefinitely. MH mailboxes also
support sequences, which are named lists used to logically group
messages without moving them to sub-folders. Sequences are defined in a file
called .mh_sequences in each folder.
The MH class manipulates MH mailboxes, but it does not attempt to
emulate all of mh‘s behaviors. In particular, it does not modify
and is not affected by the context or .mh_profile files that
are used by mh to store its state and configuration.
MH instances have all of the methods of Mailbox in addition
to the following:
Delete the folder whose name is folder. If the folder contains any
messages, a NotEmptyError exception will be raised and the folder
will not be deleted.
Three locking mechanisms are used—dot locking and, if available, the
flock() and lockf() system calls. For MH mailboxes, locking
the mailbox means locking the .mh_sequences file and, only for the
duration of any operations that affect them, locking individual message
files.
class mailbox.Babyl(path, factory=None, create=True)¶
A subclass of Mailbox for mailboxes in Babyl format. Parameter
factory is a callable object that accepts a file-like message representation
(which behaves as if opened in binary mode) and returns a custom representation.
If factory is None, BabylMessage is used as the default message
representation. If create is True, the mailbox is created if it does not
exist.
Babyl is a single-file mailbox format used by the Rmail mail user agent
included with Emacs. The beginning of a message is indicated by a line
containing the two characters Control-Underscore ('\037') and Control-L
('\014'). The end of a message is indicated by the start of the next
message or, in the case of the last message, a line containing a
Control-Underscore ('\037') character.
Messages in a Babyl mailbox have two sets of headers, original headers and
so-called visible headers. Visible headers are typically a subset of the
original headers that have been reformatted or abridged to be more
attractive. Each message in a Babyl mailbox also has an accompanying list of
labels, or short strings that record extra information about the
message, and a list of all user-defined labels found in the mailbox is kept
in the Babyl options section.
Babyl instances have all of the methods of Mailbox in
addition to the following:
Return a list of the names of all user-defined labels used in the mailbox.
Note
The actual messages are inspected to determine which labels exist in
the mailbox rather than consulting the list of labels in the Babyl
options section, but the Babyl section is updated whenever the mailbox
is modified.
Some Mailbox methods implemented by Babyl deserve special
remarks:
In Babyl mailboxes, the headers of a message are not stored contiguously
with the body of the message. To generate a file-like representation, the
headers and body are copied together into a StringIO instance
(from the StringIO module), which has an API identical to that of a
file. As a result, the file-like object is truly independent of the
underlying mailbox but does not save memory compared to a string
representation.
class mailbox.MMDF(path, factory=None, create=True)¶
A subclass of Mailbox for mailboxes in MMDF format. Parameter factory
is a callable object that accepts a file-like message representation (which
behaves as if opened in binary mode) and returns a custom representation. If
factory is None, MMDFMessage is used as the default message
representation. If create is True, the mailbox is created if it does not
exist.
MMDF is a single-file mailbox format invented for the Multichannel Memorandum
Distribution Facility, a mail transfer agent. Each message is in the same
form as an mbox message but is bracketed before and after by lines containing
four Control-A ('\001') characters. As with the mbox format, the
beginning of each message is indicated by a line whose first five characters
are “From ”, but additional occurrences of “From ” are not transformed to
“>From ” when storing messages because the extra message separator lines
prevent mistaking such occurrences for the starts of subsequent messages.
Some Mailbox methods implemented by MMDF deserve special
remarks:
A subclass of the email.Message module’s Message. Subclasses of
mailbox.Message add mailbox-format-specific state and behavior.
If message is omitted, the new instance is created in a default, empty state.
If message is an email.Message.Message instance, its contents are
copied; furthermore, any format-specific information is converted insofar as
possible if message is a Message instance. If message is a string,
a byte string,
or a file, it should contain an RFC 2822-compliant message, which is read
and parsed. Files should be open in binary mode, but text mode files
are accepted for backward compatibility.
The format-specific state and behaviors offered by subclasses vary, but in
general it is only the properties that are not specific to a particular
mailbox that are supported (although presumably the properties are specific
to a particular mailbox format). For example, file offsets for single-file
mailbox formats and file names for directory-based mailbox formats are not
retained, because they are only applicable to the original mailbox. But state
such as whether a message has been read by the user or marked as important is
retained, because it applies to the message itself.
There is no requirement that Message instances be used to represent
messages retrieved using Mailbox instances. In some situations, the
time and memory required to generate Message representations might
not not acceptable. For such situations, Mailbox instances also
offer string and file-like representations, and a custom message factory may
be specified when a Mailbox instance is initialized.
A message with Maildir-specific behaviors. Parameter message has the same
meaning as with the Message constructor.
Typically, a mail user agent application moves all of the messages in the
new subdirectory to the cur subdirectory after the first time
the user opens and closes the mailbox, recording that the messages are old
whether or not they’ve actually been read. Each message in cur has an
“info” section added to its file name to store information about its state.
(Some mail readers may also add an “info” section to messages in
new.) The “info” section may take one of two forms: it may contain
“2,” followed by a list of standardized flags (e.g., “2,FR”) or it may
contain “1,” followed by so-called experimental information. Standard flags
for Maildir messages are as follows:
Return either “new” (if the message should be stored in the new
subdirectory) or “cur” (if the message should be stored in the cur
subdirectory).
Note
A message is typically moved from new to cur after its
mailbox has been accessed, whether or not the message is has been
read. A message msg has been read if "S"inmsg.get_flags() is
True.
Return a string specifying the flags that are currently set. If the
message complies with the standard Maildir format, the result is the
concatenation in alphabetical order of zero or one occurrence of each of
'D', 'F', 'P', 'R', 'S', and 'T'. The empty string
is returned if no flags are set or if “info” contains experimental
semantics.
Set the flag(s) specified by flag without changing other flags. To add
more than one flag at a time, flag may be a string of more than one
character. The current “info” is overwritten whether or not it contains
experimental information rather than flags.
Unset the flag(s) specified by flag without changing other flags. To
remove more than one flag at a time, flag maybe a string of more than
one character. If “info” contains experimental information rather than
flags, the current “info” is not modified.
Return a string containing the “info” for a message. This is useful for
accessing and modifying “info” that is experimental (i.e., not a list of
flags).
When a MaildirMessage instance is created based upon an
mboxMessage or MMDFMessage instance, the Status
and X-Status headers are omitted and the following conversions
take place:
A message with mbox-specific behaviors. Parameter message has the same meaning
as with the Message constructor.
Messages in an mbox mailbox are stored together in a single file. The
sender’s envelope address and the time of delivery are typically stored in a
line beginning with “From ” that is used to indicate the start of a message,
though there is considerable variation in the exact format of this data among
mbox implementations. Flags that indicate the state of the message, such as
whether it has been read or marked as important, are typically stored in
Status and X-Status headers.
Conventional flags for mbox messages are as follows:
Flag
Meaning
Explanation
R
Read
Read
O
Old
Previously detected by MUA
D
Deleted
Marked for subsequent deletion
F
Flagged
Marked as important
A
Answered
Replied to
The “R” and “O” flags are stored in the Status header, and the
“D”, “F”, and “A” flags are stored in the X-Status header. The
flags and headers typically appear in the order mentioned.
mboxMessage instances offer the following methods:
Return a string representing the “From ” line that marks the start of the
message in an mbox mailbox. The leading “From ” and the trailing newline
are excluded.
Set the “From ” line to from_, which should be specified without a
leading “From ” or trailing newline. For convenience, time_ may be
specified and will be formatted appropriately and appended to from_. If
time_ is specified, it should be a struct_time instance, a
tuple suitable for passing to time.strftime(), or True (to use
time.gmtime()).
Return a string specifying the flags that are currently set. If the
message complies with the conventional format, the result is the
concatenation in the following order of zero or one occurrence of each of
'R', 'O', 'D', 'F', and 'A'.
Set the flags specified by flags and unset all others. Parameter flags
should be the concatenation in any order of zero or more occurrences of
each of 'R', 'O', 'D', 'F', and 'A'.
Unset the flag(s) specified by flag without changing other flags. To
remove more than one flag at a time, flag maybe a string of more than
one character.
When an mboxMessage instance is created based upon a
MaildirMessage instance, a “From ” line is generated based upon the
MaildirMessage instance’s delivery date, and the following conversions
take place:
A message with MH-specific behaviors. Parameter message has the same meaning
as with the Message constructor.
MH messages do not support marks or flags in the traditional sense, but they
do support sequences, which are logical groupings of arbitrary messages. Some
mail reading programs (although not the standard mh and
nmh) use sequences in much the same way flags are used with other
formats, as follows:
When an MHMessage instance is created based upon an
mboxMessage or MMDFMessage instance, the Status
and X-Status headers are omitted and the following conversions
take place:
A message with Babyl-specific behaviors. Parameter message has the same
meaning as with the Message constructor.
Certain message labels, called attributes, are defined by convention
to have special meanings. The attributes are as follows:
Label
Explanation
unseen
Not read, but previously detected by MUA
deleted
Marked for subsequent deletion
filed
Copied to another file or mailbox
answered
Replied to
forwarded
Forwarded
edited
Modified by the user
resent
Resent
By default, Rmail displays only visible headers. The BabylMessage
class, though, uses the original headers because they are more
complete. Visible headers may be accessed explicitly if desired.
BabylMessage instances offer the following methods:
Set the message’s visible headers to be the same as the headers in
message. Parameter visible should be a Message instance, an
email.Message.Message instance, a string, or a file-like object
(which should be open in text mode).
When a BabylMessage instance’s original headers are modified, the
visible headers are not automatically modified to correspond. This method
updates the visible headers as follows: each visible header with a
corresponding original header is set to the value of the original header,
each visible header without a corresponding original header is removed,
and any of Date, From, Reply-To,
To, CC, and Subject that are
present in the original headers but not the visible headers are added to
the visible headers.
When a BabylMessage instance is created based upon a
MaildirMessage instance, the following conversions take place:
When a BabylMessage instance is created based upon an
mboxMessage or MMDFMessage instance, the Status
and X-Status headers are omitted and the following conversions
take place:
A message with MMDF-specific behaviors. Parameter message has the same meaning
as with the Message constructor.
As with message in an mbox mailbox, MMDF messages are stored with the
sender’s address and the delivery date in an initial line beginning with
“From ”. Likewise, flags that indicate the state of the message are
typically stored in Status and X-Status headers.
Conventional flags for MMDF messages are identical to those of mbox message
and are as follows:
Flag
Meaning
Explanation
R
Read
Read
O
Old
Previously detected by MUA
D
Deleted
Marked for subsequent deletion
F
Flagged
Marked as important
A
Answered
Replied to
The “R” and “O” flags are stored in the Status header, and the
“D”, “F”, and “A” flags are stored in the X-Status header. The
flags and headers typically appear in the order mentioned.
MMDFMessage instances offer the following methods, which are
identical to those offered by mboxMessage:
Return a string representing the “From ” line that marks the start of the
message in an mbox mailbox. The leading “From ” and the trailing newline
are excluded.
Set the “From ” line to from_, which should be specified without a
leading “From ” or trailing newline. For convenience, time_ may be
specified and will be formatted appropriately and appended to from_. If
time_ is specified, it should be a struct_time instance, a
tuple suitable for passing to time.strftime(), or True (to use
time.gmtime()).
Return a string specifying the flags that are currently set. If the
message complies with the conventional format, the result is the
concatenation in the following order of zero or one occurrence of each of
'R', 'O', 'D', 'F', and 'A'.
Set the flags specified by flags and unset all others. Parameter flags
should be the concatenation in any order of zero or more occurrences of
each of 'R', 'O', 'D', 'F', and 'A'.
Unset the flag(s) specified by flag without changing other flags. To
remove more than one flag at a time, flag maybe a string of more than
one character.
When an MMDFMessage instance is created based upon a
MaildirMessage instance, a “From ” line is generated based upon the
MaildirMessage instance’s delivery date, and the following conversions
take place:
Raised when a mailbox is expected but is not found, such as when instantiating a
Mailbox subclass with a path that does not exist (and with the create
parameter set to False), or when opening a folder that does not exist.
Raised when some mailbox-related condition beyond the control of the program
causes it to be unable to proceed, such as when failing to acquire a lock that
another program already holds a lock, or when a uniquely-generated file name
already exists.
A simple example of printing the subjects of all messages in a mailbox that seem
interesting:
importmailboxformessageinmailbox.mbox('~/mbox'):subject=message['subject']# Could possibly be None.ifsubjectand'python'insubject.lower():print(subject)
To copy all mail from a Babyl mailbox to an MH mailbox, converting all of the
format-specific information that can be converted:
This example sorts mail from several mailing lists into different mailboxes,
being careful to avoid mail corruption due to concurrent modification by other
programs, mail loss due to interruption of the program, or premature termination
due to malformed messages in the mailbox:
importmailboximportemail.Errorslist_names=('python-list','python-dev','python-bugs')boxes={name:mailbox.mbox('~/email/%s'%name)fornameinlist_names}inbox=mailbox.Maildir('~/Maildir',factory=None)forkeyininbox.iterkeys():try:message=inbox[key]exceptemail.Errors.MessageParseError:continue# The message is malformed. Just leave it.fornameinlist_names:list_id=message['list-id']iflist_idandnameinlist_id:# Get mailbox to usebox=boxes[name]# Write copy to disk before removing original.# If there's a crash, you might duplicate a message, but# that's better than losing a message completely.box.lock()box.add(message)box.flush()box.unlock()# Remove original messageinbox.lock()inbox.discard(key)inbox.flush()inbox.unlock()break# Found destination, so stop looking.forboxinboxes.itervalues():box.close()
The mimetypes module converts between a filename or URL and the MIME type
associated with the filename extension. Conversions are provided from filename
to MIME type and from MIME type to filename extension; encodings are not
supported for the latter conversion.
The module provides one class and a number of convenience functions. The
functions are the normal interface to this module, but some applications may be
interested in the class as well.
The functions described below provide the primary interface for this module. If
the module has not been initialized, they will call init() if they rely on
the information init() sets up.
Guess the type of a file based on its filename or URL, given by filename. The
return value is a tuple (type,encoding) where type is None if the
type can’t be guessed (missing or unknown suffix) or a string of the form
'type/subtype', usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to encode
(e.g. compress or gzip). The encoding is suitable for use
as a Content-Encoding header, not as a
Content-Transfer-Encoding header. The mappings are table driven.
Encoding suffixes are case sensitive; type suffixes are first tried case
sensitively, then case insensitively.
Optional strict is a flag specifying whether the list of known MIME types
is limited to only the official types registered with IANA are recognized.
When strict is true (the default), only the IANA types are supported; when
strict is false, some additional non-standard but commonly used MIME types
are also recognized.
Guess the extensions for a file based on its MIME type, given by type. The
return value is a list of strings giving all possible filename extensions,
including the leading dot ('.'). The extensions are not guaranteed to have
been associated with any particular data stream, but would be mapped to the MIME
type type by guess_type().
Optional strict has the same meaning as with the guess_type() function.
Guess the extension for a file based on its MIME type, given by type. The
return value is a string giving a filename extension, including the leading dot
('.'). The extension is not guaranteed to have been associated with any
particular data stream, but would be mapped to the MIME type type by
guess_type(). If no extension can be guessed for type, None is
returned.
Optional strict has the same meaning as with the guess_type() function.
Some additional functions and data items are available for controlling the
behavior of the module.
Initialize the internal data structures. If given, files must be a sequence
of file names which should be used to augment the default type map. If omitted,
the file names to use are taken from knownfiles; on Windows, the
current registry settings are loaded. Each file named in files or
knownfiles takes precedence over those named before it. Calling
init() repeatedly is allowed.
Changed in version 3.2:
Changed in version 3.2: Previously, Windows registry settings were ignored.
Load the type map given in the file filename, if it exists. The type map is
returned as a dictionary mapping filename extensions, including the leading dot
('.'), to strings of the form 'type/subtype'. If the file filename
does not exist or cannot be read, None is returned.
Add a mapping from the mimetype type to the extension ext. When the
extension is already known, the new type will replace the old one. When the type
is already known the extension will be added to the list of known extensions.
When strict is True (the default), the mapping will added to the official MIME
types, otherwise to the non-standard ones.
List of type map file names commonly installed. These files are typically named
mime.types and are installed in different locations by different
packages.
Dictionary mapping suffixes to suffixes. This is used to allow recognition of
encoded files for which the encoding and the type are indicated by the same
extension. For example, the .tgz extension is mapped to .tar.gz
to allow the encoding and type to be recognized separately.
Dictionary mapping filename extensions to non-standard, but commonly found MIME
types.
The MimeTypes class may be useful for applications which may want more
than one MIME-type database:
class mimetypes.MimeTypes(filenames=(), strict=True)¶
This class represents a MIME-types database. By default, it provides access to
the same database as the rest of this module. The initial database is a copy of
that provided by the module, and may be extended by loading additional
mime.types-style files into the database using the read() or
readfp() methods. The mapping dictionaries may also be cleared before
loading additional data if the default data is not desired.
The optional filenames parameter can be used to cause additional files to be
loaded “on top” of the default database.
Dictionary mapping suffixes to suffixes. This is used to allow recognition of
encoded files for which the encoding and the type are indicated by the same
extension. For example, the .tgz extension is mapped to .tar.gz
to allow the encoding and type to be recognized separately. This is initially a
copy of the global suffix_map defined in the module.
Dictionary mapping filename extensions to non-standard, but commonly found MIME
types. This is initially a copy of the global common_types defined in the
module.
Load MIME type information from the Windows registry. Availability: Windows.
New in version 3.2:
New in version 3.2.
base64 — RFC 3548: Base16, Base32, Base64 Data Encodings¶
This module provides data encoding and decoding as specified in RFC 3548.
This standard defines the Base16, Base32, and Base64 algorithms for encoding
and decoding arbitrary binary strings into ASCII-only byte strings that can be
safely sent by email, used as parts of URLs, or included as part of an HTTP
POST request. The encoding algorithm is not the same as the
uuencode program.
There are two interfaces provided by this module. The modern interface
supports encoding and decoding ASCII byte string objects using all three
alphabets. The legacy interface provides for encoding and decoding to and from
file-like objects as well as byte strings, but only using the Base64 standard
alphabet.
s is the string to encode. Optional altchars must be a string of at least
length 2 (additional characters are ignored) which specifies an alternative
alphabet for the + and / characters. This allows an application to e.g.
generate URL or filesystem safe Base64 strings. The default is None, for
which the standard Base64 alphabet is used.
s is the byte string to decode. Optional altchars must be a string of
at least length 2 (additional characters are ignored) which specifies the
alternative alphabet used instead of the + and / characters.
The decoded string is returned. A binascii.Error is raised if s is
incorrectly padded.
If validate is False (the default), non-base64-alphabet characters are
discarded prior to the padding check. If validate is True,
non-base64-alphabet characters in the input result in a
binascii.Error.
Encode byte string s using a URL-safe alphabet, which substitutes - instead of
+ and _ instead of / in the standard Base64 alphabet. The result
can still contain =.
s is the byte string to decode. Optional casefold is a flag specifying
whether a lowercase alphabet is acceptable as input. For security purposes,
the default is False.
RFC 3548 allows for optional mapping of the digit 0 (zero) to the letter O
(oh), and for optional mapping of the digit 1 (one) to either the letter I (eye)
or letter L (el). The optional argument map01 when not None, specifies
which letter the digit 1 should be mapped to (when map01 is not None, the
digit 0 is always mapped to the letter O). For security purposes the default is
None, so that 0 and 1 are not allowed in the input.
The decoded byte string is returned. A TypeError is raised if s were
incorrectly padded or if there are non-alphabet characters present in the
string.
s is the string to decode. Optional casefold is a flag specifying whether a
lowercase alphabet is acceptable as input. For security purposes, the default
is False.
The decoded byte string is returned. A TypeError is raised if s were
incorrectly padded or if there are non-alphabet characters present in the
string.
Decode the contents of the binary input file and write the resulting binary
data to the output file. input and output must be file objects. input will be read until input.read() returns an empty
bytes object.
Decode the byte string s, which must contain one or more lines of base64
encoded data, and return a byte string containing the resulting binary data.
decodestring is a deprecated alias.
Encode the contents of the binary input file and write the resulting base64
encoded data to the output file. input and output must be file
objects. input will be read until input.read() returns
an empty bytes object. encode() returns the encoded data plus a trailing
newline character (b'\n').
Encode the byte string s, which can contain arbitrary binary data, and
return a byte string containing one or more lines of base64-encoded data.
encodebytes() returns a string containing one or more lines of
base64-encoded data always including an extra trailing newline (b'\n').
encodestring is a deprecated alias.
An example usage of the module:
>>> importbase64>>> encoded=base64.b64encode(b'data to be encoded')>>> encodedb'ZGF0YSB0byBiZSBlbmNvZGVk'>>> data=base64.b64decode(encoded)>>> datab'data to be encoded'
Convert a binary file with filename input to binhex file output. The
output parameter can either be a filename or a file-like object (any object
supporting a write() and close() method).
Decode a binhex file input. input may be a filename or a file-like object
supporting read() and close() methods. The resulting file is written
to a file named output, unless the argument is None in which case the
output filename is read from the binhex file.
Exception raised when something can’t be encoded using the binhex format (for
example, a filename is too long to fit in the filename field), or when input is
not properly encoded binhex data.
The binascii module contains a number of methods to convert between
binary and various ASCII-encoded binary representations. Normally, you will not
use these functions directly but use wrapper modules like uu,
base64, or binhex instead. The binascii module contains
low-level functions written in C for greater speed that are used by the
higher-level modules.
Note
Encoding and decoding functions do not accept Unicode strings. Only bytestring
and bytearray objects can be processed.
The binascii module defines the following functions:
Convert a single line of uuencoded data back to binary and return the binary
data. Lines normally contain 45 (binary) bytes, except for the last line. Line
data may be followed by whitespace.
Convert binary data to a line of ASCII characters, the return value is the
converted line, including a newline char. The length of data should be at most
45.
Convert binary data to a line of ASCII characters in base64 coding. The return
value is the converted line, including a newline char. The length of data
should be at most 57 to adhere to the base64 standard.
Convert a block of quoted-printable data back to binary and return the binary
data. More than one line may be passed at a time. If the optional argument
header is present and true, underscores will be decoded as spaces.
Changed in version 3.2:
Changed in version 3.2: Accept only bytestring or bytearray objects as input.
Convert binary data to a line(s) of ASCII characters in quoted-printable
encoding. The return value is the converted line(s). If the optional argument
quotetabs is present and true, all tabs and spaces will be encoded. If the
optional argument istext is present and true, newlines are not encoded but
trailing whitespace will be encoded. If the optional argument header is
present and true, spaces will be encoded as underscores per RFC1522. If the
optional argument header is present and false, newline characters will be
encoded as well; otherwise linefeed conversion might corrupt the binary data
stream.
Convert binhex4 formatted ASCII data to binary, without doing RLE-decompression.
The string should contain a complete number of binary bytes, or (in case of the
last portion of the binhex4 data) have the remaining bits zero.
Perform RLE-decompression on the data, as per the binhex4 standard. The
algorithm uses 0x90 after a byte as a repeat indicator, followed by a count.
A count of 0 specifies a byte value of 0x90. The routine returns the
decompressed data, unless data input data ends in an orphaned repeat indicator,
in which case the Incomplete exception is raised.
Changed in version 3.2:
Changed in version 3.2: Accept only bytestring or bytearray objects as input.
Perform hexbin4 binary-to-ASCII translation and return the resulting string. The
argument should already be RLE-coded, and have a length divisible by 3 (except
possibly the last fragment).
Compute CRC-32, the 32-bit checksum of data, starting with an initial crc. This
is consistent with the ZIP file checksum. Since the algorithm is designed for
use as a checksum algorithm, it is not suitable for use as a general hash
algorithm. Use as follows:
print(binascii.crc32(b"hello world"))# Or, in two pieces:crc=binascii.crc32(b"hello")crc=binascii.crc32(b" world",crc)&0xffffffffprint('crc32 = {:#010x}'.format(crc))
Note
To generate the same numeric value across all Python versions and
platforms use crc32(data) & 0xffffffff. If you are only using
the checksum in packed binary format this is not necessary as the
return value is the correct 32bit binary representation
regardless of sign.
Return the hexadecimal representation of the binary data. Every byte of
data is converted into the corresponding 2-digit hex representation. The
resulting string is therefore twice as long as the length of data.
Return the binary data represented by the hexadecimal string hexstr. This
function is the inverse of b2a_hex(). hexstr must contain an even number
of hexadecimal digits (which can be upper or lower case), otherwise a
TypeError is raised.
Changed in version 3.2:
Changed in version 3.2: Accept only bytestring or bytearray objects as input.
This module performs quoted-printable transport encoding and decoding, as
defined in RFC 1521: “MIME (Multipurpose Internet Mail Extensions) Part One:
Mechanisms for Specifying and Describing the Format of Internet Message Bodies”.
The quoted-printable encoding is designed for data where there are relatively
few nonprintable characters; the base64 encoding scheme available via the
base64 module is more compact if there are many such characters, as when
sending a graphics file.
Decode the contents of the input file and write the resulting decoded binary
data to the output file. input and output must be file objects. input will be read until input.readline() returns an
empty string. If the optional argument header is present and true, underscore
will be decoded as space. This is used to decode “Q”-encoded headers as
described in RFC 1522: “MIME (Multipurpose Internet Mail Extensions)
Part Two: Message Header Extensions for Non-ASCII Text”.
Encode the contents of the input file and write the resulting quoted-printable
data to the output file. input and output must be file objects. input will be read until input.readline() returns an
empty string. quotetabs is a flag which controls whether to encode embedded
spaces and tabs; when true it encodes such embedded whitespace, and when
false it leaves them unencoded. Note that spaces and tabs appearing at the
end of lines are always encoded, as per RFC 1521. header is a flag
which controls if spaces are encoded as underscores as per RFC 1522.
Like encode(), except that it accepts a source string and returns the
corresponding encoded string. quotetabs and header are optional
(defaulting to False), and are passed straight through to encode().
This module encodes and decodes files in uuencode format, allowing arbitrary
binary data to be transferred over ASCII-only connections. Wherever a file
argument is expected, the methods accept a file-like object. For backwards
compatibility, a string containing a pathname is also accepted, and the
corresponding file will be opened for reading and writing; the pathname '-'
is understood to mean the standard input or output. However, this interface is
deprecated; it’s better for the caller to open the file itself, and be sure
that, when required, the mode is 'rb' or 'wb' on Windows.
This code was contributed by Lance Ellinghouse, and modified by Jack Jansen.
Uuencode file in_file into file out_file. The uuencoded file will have
the header specifying name and mode as the defaults for the results of
decoding the file. The default defaults are taken from in_file, or '-'
and 0o666 respectively.
This call decodes uuencoded file in_file placing the result on file
out_file. If out_file is a pathname, mode is used to set the permission
bits if the file must be created. Defaults for out_file and mode are taken
from the uuencode header. However, if the file specified in the header already
exists, a uu.Error is raised.
decode() may print a warning to standard error if the input was produced
by an incorrect uuencoder and Python could recover from that error. Setting
quiet to a true value silences this warning.
Subclass of Exception, this can be raised by uu.decode() under
various situations, such as described above, but also including a badly
formatted header, or truncated input file.
Python supports a variety of modules to work with various forms of structured
data markup. This includes modules to work with the Standard Generalized Markup
Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces
for working with the Extensible Markup Language (XML).
It is important to note that modules in the xml package require that
there be at least one SAX-compliant XML parser available. The Expat parser is
included with Python, so the xml.parsers.expat module will always be
available.
The documentation for the xml.dom and xml.sax packages are the
definition of the Python bindings for the DOM and SAX interfaces.
Convert the characters &, < and > in string s to HTML-safe
sequences. Use this if you need to display text that might contain such
characters in HTML. If the optional flag quote is true, the characters
(") and (') are also translated; this helps for inclusion in an HTML
attribute value delimited by quotes, as in <ahref="...">.
Create a parser instance. If strict is True (the default), invalid
html results in HTMLParseError exceptions [1]. If
strict is False, the parser uses heuristics to make a best guess at
the intention of any invalid html it encounters, similar to the way most
browsers do.
An HTMLParser instance is fed HTML data and calls handler functions when tags
begin and end. The HTMLParser class is meant to be overridden by the
user to provide a desired behavior.
This parser does not check that end tags match start tags or call the end-tag
handler for elements which are closed implicitly by closing an outer element.
Exception raised by the HTMLParser class when it encounters an error
while parsing. This exception provides three attributes: msg is a brief
message explaining the error, lineno is the number of the line on which
the broken construct was detected, and offset is the number of
characters into the line at which the construct starts.
Feed some text to the parser. It is processed insofar as it consists of
complete elements; incomplete data is buffered until more data is fed or
close() is called.
Force processing of all buffered data as if it were followed by an end-of-file
mark. This method may be redefined by a derived class to define additional
processing at the end of the input, but the redefined version should always call
the HTMLParser base class method close().
Return the text of the most recently opened start tag. This should not normally
be needed for structured processing, but may be useful in dealing with HTML “as
deployed” or for re-generating input with minimal changes (whitespace between
attributes can be preserved, etc.).
This method is called to handle the start of a tag. It is intended to be
overridden by a derived class; the base class implementation does nothing.
The tag argument is the name of the tag converted to lower case. The attrs
argument is a list of (name,value) pairs containing the attributes found
inside the tag’s <> brackets. The name will be translated to lower case,
and quotes in the value have been removed, and character and entity references
have been replaced. For instance, for the tag <AHREF="http://www.cwi.nl/">, this method would be called as
handle_starttag('a',[('href','http://www.cwi.nl/')]).
All entity references from html.entities are replaced in the attribute
values.
Similar to handle_starttag(), but called when the parser encounters an
XHTML-style empty tag (<a.../>). This method may be overridden by
subclasses which require this particular lexical information; the default
implementation simple calls handle_starttag() and handle_endtag().
This method is called to handle the end tag of an element. It is intended to be
overridden by a derived class; the base class implementation does nothing. The
tag argument is the name of the tag converted to lower case.
This method is called to process a character reference of the form &#ref;.
It is intended to be overridden by a derived class; the base class
implementation does nothing.
This method is called to process a general entity reference of the form
&name; where name is an general entity reference. It is intended to be
overridden by a derived class; the base class implementation does nothing.
This method is called when a comment is encountered. The comment argument is
a string containing the text between the -- and -- delimiters, but not
the delimiters themselves. For example, the comment <!--text--> will cause
this method to be called with the argument 'text'. It is intended to be
overridden by a derived class; the base class implementation does nothing.
Method called when an SGML doctype declaration is read by the parser.
The decl parameter will be the entire contents of the declaration inside
the <!...> markup. It is intended to be overridden by a derived class;
the base class implementation does nothing.
Method called when an unrecognized SGML declaration is read by the parser.
The data parameter will be the entire contents of the declaration inside
the <!...> markup. It is sometimes useful to be overridden by a
derived class; the base class implementation raises an HTMLParseError.
Method called when a processing instruction is encountered. The data
parameter will contain the entire processing instruction. For example, for the
processing instruction <?proccolor='red'>, this method would be called as
handle_pi("proccolor='red'"). It is intended to be overridden by a derived
class; the base class implementation does nothing.
Note
The HTMLParser class uses the SGML syntactic rules for processing
instructions. An XHTML processing instruction using the trailing '?' will
cause the '?' to be included in data.
As a basic example, below is a very basic HTML parser that uses the
HTMLParser class to print out tags as they are encountered:
>>> fromhtml.parserimportHTMLParser>>>>>> classMyHTMLParser(HTMLParser):... defhandle_starttag(self,tag,attrs):... print("Encountered a {} start tag".format(tag))... defhandle_endtag(self,tag):... print("Encountered a {} end tag".format(tag))...>>> page="""<html><h1>Title</h1><p>I'm a paragraph!</p></html>""">>>>>> myparser=MyHTMLParser()>>> myparser.feed(page)Encountered a html start tagEncountered a h1 start tagEncountered a h1 end tagEncountered a p start tagEncountered a p end tagEncountered a html end tag
For backward compatibility reasons strict mode does not raise
exceptions for all non-compliant HTML. That is, some invalid HTML
is tolerated even in strict mode.
This module defines three dictionaries, name2codepoint, codepoint2name,
and entitydefs. entitydefs is used to provide the entitydefs
attribute of the html.parser.HTMLParser class. The definition provided
here contains all the entities defined by XHTML 1.0 that can be handled using
simple textual substitution in the Latin-1 character set (ISO-8859-1).
The xml.parsers.expat module is a Python interface to the Expat
non-validating XML parser. The module provides a single extension type,
xmlparser, that represents the current state of an XML parser. After
an xmlparser object has been created, various attributes of the object
can be set to handler functions. When an XML document is then fed to the
parser, the handler functions are called for the character data and markup in
the XML document.
This module uses the pyexpat module to provide access to the Expat
parser. Direct use of the pyexpat module is deprecated.
This module provides one exception and one type object:
Creates and returns a new xmlparser object. encoding, if specified,
must be a string naming the encoding used by the XML data. Expat doesn’t
support as many encodings as Python does, and its repertoire of encodings can’t
be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
encoding[1] is given it will override the implicit or explicit encoding of the
document.
Expat can optionally do XML namespace processing for you, enabled by providing a
value for namespace_separator. The value must be a one-character string; a
ValueError will be raised if the string has an illegal length (None
is considered the same as omission). When namespace processing is enabled,
element type names and attribute names that belong to a namespace will be
expanded. The element name passed to the element handlers
StartElementHandler and EndElementHandler will be the
concatenation of the namespace URI, the namespace separator character, and the
local part of the name. If the namespace separator is a zero byte (chr(0))
then the namespace URI and the local part will be concatenated without any
separator.
For example, if namespace_separator is set to a space character ('') and
the following document is parsed:
Parses the contents of the string data, calling the appropriate handler
functions to process the parsed data. isfinal must be true on the final call
to this method. data can be the empty string at any time.
Sets the base to be used for resolving relative URIs in system identifiers in
declarations. Resolving relative identifiers is left to the application: this
value will be passed through as the base argument to the
ExternalEntityRefHandler(), NotationDeclHandler(), and
UnparsedEntityDeclHandler() functions.
Returns the input data that generated the current event as a string. The data is
in the encoding of the entity which contains the text. When called while an
event handler is not active, the return value is None.
Create a “child” parser which can be used to parse an external parsed entity
referred to by content parsed by the parent parser. The context parameter
should be the string passed to the ExternalEntityRefHandler() handler
function, described below. The child parser is created with the
ordered_attributes and specified_attributes set to the values of
this parser.
Control parsing of parameter entities (including the external DTD subset).
Possible flag values are XML_PARAM_ENTITY_PARSING_NEVER,
XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE and
XML_PARAM_ENTITY_PARSING_ALWAYS. Return true if setting the flag
was successful.
Passing a false value for flag will cancel a previous call that passed a true
value, but otherwise has no effect.
This method can only be called before the Parse() or ParseFile()
methods are called; calling it after either of those have been called causes
ExpatError to be raised with the code attribute set to
errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING].
The size of the buffer used when buffer_text is true.
A new buffer size can be set by assigning a new integer value
to this attribute.
When the size is changed, the buffer will be flushed.
Setting this to true causes the xmlparser object to buffer textual
content returned by Expat to avoid multiple calls to the
CharacterDataHandler() callback whenever possible. This can improve
performance substantially since Expat normally breaks character data into chunks
at every line ending. This attribute is false by default, and may be changed at
any time.
If buffer_text is enabled, the number of bytes stored in the buffer.
These bytes represent UTF-8 encoded text. This attribute has no meaningful
interpretation when buffer_text is false.
Setting this attribute to a non-zero integer causes the attributes to be
reported as a list rather than a dictionary. The attributes are presented in
the order found in the document text. For each attribute, two list entries are
presented: the attribute name and the attribute value. (Older versions of this
module also used this format.) By default, this attribute is false; it may be
changed at any time.
If set to a non-zero integer, the parser will report only those attributes which
were specified in the document instance and not those which were derived from
attribute declarations. Applications which set this need to be especially
careful to use what additional information is available from the declarations as
needed to comply with the standards for the behavior of XML processors. By
default, this attribute is false; it may be changed at any time.
The following attributes contain values relating to the most recent error
encountered by an xmlparser object, and will only have correct values
once a call to Parse() or ParseFile() has raised a
xml.parsers.expat.ExpatError exception.
Numeric code specifying the problem. This value can be passed to the
ErrorString() function, or compared to one of the constants defined in the
errors object.
The following attributes contain values relating to the current parse location
in an xmlparser object. During a callback reporting a parse event they
indicate the location of the first of the sequence of characters that generated
the event. When called outside of a callback, the position indicated will be
just past the last parse event (regardless of whether there was an associated
callback).
Here is the list of handlers that can be set. To set a handler on an
xmlparser object o, use o.handlername=func. handlername must
be taken from the following list, and func must be a callable object accepting
the correct number of arguments. The arguments are all strings, unless
otherwise stated.
Called when the XML declaration is parsed. The XML declaration is the
(optional) declaration of the applicable version of the XML recommendation, the
encoding of the document text, and an optional “standalone” declaration.
version and encoding will be strings, and standalone will be 1 if the
document is declared standalone, 0 if it is declared not to be standalone,
or -1 if the standalone clause was omitted. This is only available with
Expat version 1.95.0 or newer.
Called when Expat begins parsing the document type declaration (<!DOCTYPE...). The doctypeName is provided exactly as presented. The systemId and
publicId parameters give the system and public identifiers if specified, or
None if omitted. has_internal_subset will be true if the document
contains and internal document declaration subset. This requires Expat version
1.2 or newer.
Called for each declared attribute for an element type. If an attribute list
declaration declares three attributes, this handler is called three times, once
for each attribute. elname is the name of the element to which the
declaration applies and attname is the name of the attribute declared. The
attribute type is a string passed as type; the possible values are
'CDATA', 'ID', 'IDREF', ... default gives the default value for
the attribute used when the attribute is not specified by the document instance,
or None if there is no default value (#IMPLIED values). If the
attribute is required to be given in the document instance, required will be
true. This requires Expat version 1.95.0 or newer.
Called for the start of every element. name is a string containing the
element name, and attributes is a dictionary mapping attribute names to their
values.
Called for character data. This will be called for normal character data, CDATA
marked content, and ignorable whitespace. Applications which must distinguish
these cases can use the StartCdataSectionHandler,
EndCdataSectionHandler, and ElementDeclHandler callbacks to
collect the required information.
Called for unparsed (NDATA) entity declarations. This is only present for
version 1.2 of the Expat library; for more recent versions, use
EntityDeclHandler instead. (The underlying function in the Expat
library has been declared obsolete.)
Called for all entity declarations. For parameter and internal entities,
value will be a string giving the declared contents of the entity; this will
be None for external entities. The notationName parameter will be
None for parsed entities, and the name of the notation for unparsed
entities. is_parameter_entity will be true if the entity is a parameter entity
or false for general entities (most applications only need to be concerned with
general entities). This is only available starting with version 1.95.0 of the
Expat library.
Called for notation declarations. notationName, base, and systemId, and
publicId are strings if given. If the public identifier is omitted,
publicId will be None.
Called when an element contains a namespace declaration. Namespace declarations
are processed before the StartElementHandler is called for the element
on which declarations are placed.
Called when the closing tag is reached for an element that contained a
namespace declaration. This is called once for each namespace declaration on
the element in the reverse of the order for which the
StartNamespaceDeclHandler was called to indicate the start of each
namespace declaration’s scope. Calls to this handler are made after the
corresponding EndElementHandler for the end of the element.
Called at the start of a CDATA section. This and EndCdataSectionHandler
are needed to be able to identify the syntactical start and end for CDATA
sections.
Called for any characters in the XML document for which no applicable handler
has been specified. This means characters that are part of a construct which
could be reported, but for which no handler has been supplied.
This is the same as the DefaultHandler(), but doesn’t inhibit expansion
of internal entities. The entity reference will not be passed to the default
handler.
Called if the XML document hasn’t been declared as being a standalone document.
This happens when there is an external subset or a reference to a parameter
entity, but the XML declaration does not set standalone to yes in an XML
declaration. If this handler returns 0, then the parser will raise an
XML_ERROR_NOT_STANDALONE error. If this handler is not set, no
exception is raised by the parser for this condition.
Called for references to external entities. base is the current base, as set
by a previous call to SetBase(). The public and system identifiers,
systemId and publicId, are strings if given; if the public identifier is not
given, publicId will be None. The context value is opaque and should
only be used as described below.
For external entities to be parsed, this handler must be implemented. It is
responsible for creating the sub-parser using
ExternalEntityParserCreate(context), initializing it with the appropriate
callbacks, and parsing the entity. This handler should return an integer; if it
returns 0, the parser will raise an
XML_ERROR_EXTERNAL_ENTITY_HANDLING error, otherwise parsing will
continue.
If this handler is not provided, external entities are reported by the
DefaultHandler callback, if provided.
Content modules are described using nested tuples. Each tuple contains four
values: the type, the quantifier, the name, and a tuple of children. Children
are simply additional content module descriptions.
The values of the first two fields are constants defined in the
xml.parsers.expat.model module. These constants can be collected in two
groups: the model type group and the quantifier group.
The constants in the model type group are:
xml.parsers.expat.model.XML_CTYPE_ANY
The element named by the model name was declared to have a content model of
ANY.
xml.parsers.expat.model.XML_CTYPE_CHOICE
The named element allows a choice from a number of options; this is used for
content models such as (A|B|C).
xml.parsers.expat.model.XML_CTYPE_EMPTY
Elements which are declared to be EMPTY have this model type.
xml.parsers.expat.model.XML_CTYPE_MIXED
xml.parsers.expat.model.XML_CTYPE_NAME
xml.parsers.expat.model.XML_CTYPE_SEQ
Models which represent a series of models which follow one after the other are
indicated with this model type. This is used for models such as (A,B,C).
The constants in the quantifier group are:
xml.parsers.expat.model.XML_CQUANT_NONE
No modifier is given, so it can appear exactly once, as for A.
xml.parsers.expat.model.XML_CQUANT_OPT
The model is optional: it can appear once or not at all, as for A?.
xml.parsers.expat.model.XML_CQUANT_PLUS
The model must occur one or more times (like A+).
xml.parsers.expat.model.XML_CQUANT_REP
The model must occur zero or more times, as for A*.
The following constants are provided in the xml.parsers.expat.errors
module. These constants are useful in interpreting some of the attributes of
the ExpatError exception objects raised when an error has occurred.
Since for backwards compatibility reasons, the constants’ value is the error
message and not the numeric error code, you do this by comparing its
code attribute with
errors.codes[errors.XML_ERROR_CONSTANT_NAME].
The parser determined that the document was not “standalone” though it declared
itself to be in the XML declaration, and the NotStandaloneHandler was
set and returned 0.
An operation was requested that requires DTD support to be compiled in, but
Expat was configured without DTD support. This should never be reported by a
standard build of the xml.parsers.expat module.
A behavioral change was requested after parsing started that can only be changed
before parsing has started. This is (currently) only raised by
UseForeignDTD().
The requested operation was made on a parser which was finished parsing input,
but isn’t allowed. This includes attempts to provide additional input or to
stop the parser.
The Document Object Model, or “DOM,” is a cross-language API from the World Wide
Web Consortium (W3C) for accessing and modifying XML documents. A DOM
implementation presents an XML document as a tree structure, or allows client
code to build such a structure from scratch. It then gives access to the
structure through a set of objects which provided well-known interfaces.
The DOM is extremely useful for random-access applications. SAX only allows you
a view of one bit of the document at a time. If you are looking at one SAX
element, you have no access to another. If you are looking at a text node, you
have no access to a containing element. When you write a SAX application, you
need to keep track of your program’s position in the document somewhere in your
own code. SAX does not do it for you. Also, if you need to look ahead in the
XML document, you are just out of luck.
Some applications are simply impossible in an event driven model with no access
to a tree. Of course you could build some sort of tree yourself in SAX events,
but the DOM allows you to avoid writing that code. The DOM is a standard tree
representation for XML data.
The Document Object Model is being defined by the W3C in stages, or “levels” in
their terminology. The Python mapping of the API is substantially based on the
DOM Level 2 recommendation.
DOM applications typically start by parsing some XML into a DOM. How this is
accomplished is not covered at all by DOM Level 1, and Level 2 provides only
limited improvements: There is a DOMImplementation object class which
provides access to Document creation methods, but no way to access an
XML reader/parser/Document builder in an implementation-independent way. There
is also no well-defined way to access these methods without an existing
Document object. In Python, each DOM implementation will provide a
function getDOMImplementation(). DOM Level 3 adds a Load/Store
specification, which defines an interface to the reader, but this is not yet
available in the Python standard library.
Once you have a DOM document object, you can access the parts of your XML
document through its properties and methods. These properties are defined in
the DOM specification; this portion of the reference manual describes the
interpretation of the specification in Python.
The specification provided by the W3C defines the DOM API for Java, ECMAScript,
and OMG IDL. The Python mapping defined here is based in large part on the IDL
version of the specification, but strict compliance is not required (though
implementations are free to support the strict mapping from IDL). See section
Conformance for a detailed discussion of mapping requirements.
Register the factory function with the name name. The factory function
should return an object which implements the DOMImplementation
interface. The factory function can return the same object every time, or a new
one for each call, as appropriate for the specific implementation (e.g. if that
implementation supports some customization).
Return a suitable DOM implementation. The name is either well-known, the
module name of a DOM implementation, or None. If it is not None, imports
the corresponding module and returns a DOMImplementation object if the
import succeeds. If no name is given, and if the environment variable
PYTHON_DOM is set, this variable is used to find the implementation.
If name is not given, this examines the available implementations to find one
with the required feature set. If no implementation can be found, raise an
ImportError. The features list must be a sequence of (feature,version) pairs which are passed to the hasFeature() method on available
DOMImplementation objects.
The value used to indicate that no namespace is associated with a node in the
DOM. This is typically found as the namespaceURI of a node, or used as
the namespaceURI parameter to a namespaces-specific method.
In addition, xml.dom contains a base Node class and the DOM
exception classes. The Node class provided by this module does not
implement any of the methods or attributes defined by the DOM specification;
concrete DOM implementations must provide those. The Node class
provided as part of this module does provide the constants used for the
nodeType attribute on concrete Node objects; they are located
within the class rather than at the module level to conform with the DOM
specifications.
The definitive documentation for the DOM is the DOM specification from the W3C.
Note that DOM attributes may also be manipulated as nodes instead of as simple
strings. It is fairly rare that you must do this, however, so this usage is not
yet documented.
The DOMImplementation interface provides a way for applications to
determine the availability of particular features in the DOM they are using.
DOM Level 2 added the ability to create new Document and
DocumentType objects using the DOMImplementation as well.
Return a new Document object (the root of the DOM), with a child
Element object having the given namespaceUri and qualifiedName. The
doctype must be a DocumentType object created by
createDocumentType(), or None. In the Python DOM API, the first two
arguments can also be None in order to indicate that no Element
child is to be created.
Return a new DocumentType object that encapsulates the given
qualifiedName, publicId, and systemId strings, representing the
information contained in an XML document type declaration.
An integer representing the node type. Symbolic constants for the types are on
the Node object: ELEMENT_NODE, ATTRIBUTE_NODE,
TEXT_NODE, CDATA_SECTION_NODE, ENTITY_NODE,
PROCESSING_INSTRUCTION_NODE, COMMENT_NODE,
DOCUMENT_NODE, DOCUMENT_TYPE_NODE, NOTATION_NODE.
This is a read-only attribute.
The parent of the current node, or None for the document node. The value is
always a Node object or None. For Element nodes, this
will be the parent element, except for the root element, in which case it will
be the Document object. For Attr nodes, this is always
None. This is a read-only attribute.
The node that immediately precedes this one with the same parent. For
instance the element with an end-tag that comes just before the self
element’s start-tag. Of course, XML documents are made up of more than just
elements so the previous sibling could be text, a comment, or something else.
If this node is the first child of the parent, this attribute will be
None. This is a read-only attribute.
The node that immediately follows this one with the same parent. See also
previousSibling. If this is the last child of the parent, this
attribute will be None. This is a read-only attribute.
This has a different meaning for each node type; see the DOM specification for
details. You can always get the information you would get here from another
property such as the tagName property for elements or the name
property for attributes. For all node types, the value of this attribute will be
either a string or None. This is a read-only attribute.
This has a different meaning for each node type; see the DOM specification for
details. The situation is similar to that with nodeName. The value is
a string or None.
Returns true if other refers to the same node as this node. This is especially
useful for DOM implementations which use any sort of proxy architecture (because
more than one object can refer to the same node).
Note
This is based on a proposed DOM Level 3 API which is still in the “working
draft” stage, but this particular interface appears uncontroversial. Changes
from the W3C will not necessarily affect this method in the Python DOM interface
(though any new W3C API for this would also be supported).
Insert a new child node before an existing child. It must be the case that
refChild is a child of this node; if not, ValueError is raised.
newChild is returned. If refChild is None, it inserts newChild at the
end of the children’s list.
Remove a child node. oldChild must be a child of this node; if not,
ValueError is raised. oldChild is returned on success. If oldChild
will not be used further, its unlink() method should be called.
Join adjacent text nodes so that all stretches of text are stored as single
Text instances. This simplifies processing text from a DOM tree for
many applications.
A NodeList represents a sequence of nodes. These objects are used in
two ways in the DOM Core recommendation: the Element objects provides
one as its list of child nodes, and the getElementsByTagName() and
getElementsByTagNameNS() methods of Node return objects with this
interface to represent query results.
The DOM Level 2 recommendation defines one method and one attribute for these
objects:
Return the i‘th item from the sequence, if there is one, or None. The
index i is not allowed to be less then zero or greater than or equal to the
length of the sequence.
In addition, the Python DOM interface requires that some additional support is
provided to allow NodeList objects to be used as Python sequences. All
NodeList implementations must include support for __len__() and
__getitem__(); this allows iteration over the NodeList in
for statements and proper support for the len() built-in
function.
If a DOM implementation supports modification of the document, the
NodeList implementation must also support the __setitem__() and
__delitem__() methods.
Information about the notations and entities declared by a document (including
the external subset if the parser uses it and can provide the information) is
available from a DocumentType object. The DocumentType for a
document is available from the Document object’s doctype
attribute; if there is no DOCTYPE declaration for the document, the
document’s doctype attribute will be set to None instead of an
instance of this interface.
DocumentType is a specialization of Node, and adds the
following attributes:
A string giving the complete internal subset from the document. This does not
include the brackets which enclose the subset. If the document has no internal
subset, this should be None.
This is a NamedNodeMap giving the definitions of external entities.
For entity names defined more than once, only the first definition is provided
(others are ignored as required by the XML recommendation). This may be
None if the information is not provided by the parser, or if no entities are
defined.
This is a NamedNodeMap giving the definitions of notations. For
notation names defined more than once, only the first definition is provided
(others are ignored as required by the XML recommendation). This may be
None if the information is not provided by the parser, or if no notations
are defined.
A Document represents an entire XML document, including its constituent
elements, attributes, processing instructions, comments etc. Remember that it
inherits properties from Node.
Create and return a new element node. The element is not inserted into the
document when it is created. You need to explicitly insert it with one of the
other methods such as insertBefore() or appendChild().
Create and return a new element with a namespace. The tagName may have a
prefix. The element is not inserted into the document when it is created. You
need to explicitly insert it with one of the other methods such as
insertBefore() or appendChild().
Create and return a text node containing the data passed as a parameter. As
with the other creation methods, this one does not insert the node into the
tree.
Create and return a comment node containing the data passed as a parameter. As
with the other creation methods, this one does not insert the node into the
tree.
Create and return a processing instruction node containing the target and
data passed as parameters. As with the other creation methods, this one does
not insert the node into the tree.
Create and return an attribute node. This method does not associate the
attribute node with any particular element. You must use
setAttributeNode() on the appropriate Element object to use the
newly created attribute instance.
Create and return an attribute node with a namespace. The tagName may have a
prefix. This method does not associate the attribute node with any particular
element. You must use setAttributeNode() on the appropriate
Element object to use the newly created attribute instance.
Search for all descendants (direct children, children’s children, etc.) with a
particular namespace URI and localname. The localname is the part of the
namespace after the prefix.
Return the value of the attribute named by name as a string. If no such
attribute exists, an empty string is returned, as if the attribute had no value.
Return the value of the attribute named by namespaceURI and localName as a
string. If no such attribute exists, an empty string is returned, as if the
attribute had no value.
Add a new attribute node to the element, replacing an existing attribute if
necessary if the name attribute matches. If a replacement occurs, the
old attribute node will be returned. If newAttr is already in use,
InuseAttributeErr will be raised.
Add a new attribute node to the element, replacing an existing attribute if
necessary if the namespaceURI and localName attributes match.
If a replacement occurs, the old attribute node will be returned. If newAttr
is already in use, InuseAttributeErr will be raised.
Return an attribute with a particular index. The order you get the attributes
in is arbitrary but will be consistent for the life of a DOM. Each item is an
attribute node. Get its value with the value attribute.
There are also experimental methods that give this class more mapping behavior.
You can use them or you can use the standardized getAttribute*() family
of methods on the Element objects.
The Text interface represents text in the XML document. If the parser
and DOM implementation support the DOM’s XML extension, portions of the text
enclosed in CDATA marked sections are stored in CDATASection objects.
These two interfaces are identical, but provide different values for the
nodeType attribute.
These interfaces extend the Node interface. They cannot have child
nodes.
The use of a CDATASection node does not indicate that the node
represents a complete CDATA marked section, only that the content of the node
was part of a CDATA section. A single CDATA section may be represented by more
than one node in the document tree. There is no way to determine whether two
adjacent CDATASection nodes represent different CDATA marked sections.
The DOM Level 2 recommendation defines a single exception, DOMException,
and a number of constants that allow applications to determine what sort of
error occurred. DOMException instances carry a code attribute
that provides the appropriate value for the specific exception.
The Python DOM interface provides the constants, but also expands the set of
exceptions so that a specific exception exists for each of the exception codes
defined by the DOM. The implementations must raise the appropriate specific
exception, each of which carries the appropriate value for the code
attribute.
Raised when a specified range of text does not fit into a string. This is not
known to be used in the Python DOM implementations, but may be received from DOM
implementations not written in Python.
This exception is raised when a string parameter contains a character that is
not permitted in the context it’s being used in by the XML 1.0 recommendation.
For example, attempting to create an Element node with a space in the
element type name will cause this error to be raised.
If an attempt is made to change any object in a way that is not permitted with
regard to the Namespaces in XML
recommendation, this exception is raised.
Exception when a node does not exist in the referenced context. For example,
NamedNodeMap.removeNamedItem() will raise this if the node passed in does
not exist in the map.
Raised when a node is inserted in a different document than it currently belongs
to, and the implementation does not support migrating the node from one document
to the other.
The exception codes defined in the DOM recommendation map to the exceptions
described above according to this table:
This section describes the conformance requirements and relationships between
the Python DOM API, the W3C DOM recommendations, and the OMG IDL mapping for
Python.
The mapping from OMG IDL to Python defines accessor functions for IDL
attribute declarations in much the way the Java mapping does.
Mapping the IDL declarations
yields three accessor functions: a “get” method for someValue
(_get_someValue()), and “get” and “set” methods for anotherValue
(_get_anotherValue() and _set_anotherValue()). The mapping, in
particular, does not require that the IDL attributes are accessible as normal
Python attributes: object.someValue is not required to work, and may
raise an AttributeError.
The Python DOM API, however, does require that normal attribute access work.
This means that the typical surrogates generated by Python IDL compilers are not
likely to work, and wrapper objects may be needed on the client if the DOM
objects are accessed via CORBA. While this does require some additional
consideration for CORBA DOM clients, the implementers with experience using DOM
over CORBA from Python do not consider this a problem. Attributes that are
declared readonly may not restrict write access in all DOM
implementations.
In the Python DOM API, accessor functions are not required. If provided, they
should take the form defined by the Python IDL mapping, but these methods are
considered unnecessary since the attributes are accessible directly from Python.
“Set” accessors should never be provided for readonly attributes.
The IDL definitions do not fully embody the requirements of the W3C DOM API,
such as the notion of certain objects, such as the return value of
getElementsByTagName(), being “live”. The Python DOM API does not require
implementations to enforce such requirements.
xml.dom.minidom is a light-weight implementation of the Document Object
Model interface. It is intended to be simpler than the full DOM and also
significantly smaller.
DOM applications typically start by parsing some XML into a DOM. With
xml.dom.minidom, this is done through the parse functions:
fromxml.dom.minidomimportparse,parseStringdom1=parse('c:\\temp\\mydata.xml')# parse an XML file by namedatasource=open('c:\\temp\\mydata.xml')dom2=parse(datasource)# parse an open filedom3=parseString('<myxml>Some data<empty/> some more data</myxml>')
The parse() function can take either a filename or an open file object.
Return a Document from the given input. filename_or_file may be
either a file name, or a file-like object. parser, if given, must be a SAX2
parser object. This function will change the document handler of the parser and
activate namespace support; other parser configuration (like setting an entity
resolver) must have been done in advance.
If you have XML in a string, you can use the parseString() function
instead:
Return a Document that represents the string. This method creates a
StringIO object for the string and passes that on to parse().
Both functions return a Document object representing the content of the
document.
What the parse() and parseString() functions do is connect an XML
parser with a “DOM builder” that can accept parse events from any SAX parser and
convert them into a DOM tree. The name of the functions are perhaps misleading,
but are easy to grasp when learning the interfaces. The parsing of the document
will be completed before these functions return; it’s simply that these
functions do not provide a parser implementation themselves.
You can also create a Document by calling a method on a “DOM
Implementation” object. You can get this object either by calling the
getDOMImplementation() function in the xml.dom package or the
xml.dom.minidom module. Once you have a Document, you
can add child nodes to it to populate the DOM:
Once you have a DOM document object, you can access the parts of your XML
document through its properties and methods. These properties are defined in
the DOM specification. The main property of the document object is the
documentElement property. It gives you the main element in the XML
document: the one that holds all others. Here is an example program:
When you are finished with a DOM tree, you may optionally call the
unlink() method to encourage early cleanup of the now-unneeded
objects. unlink() is a xml.dom.minidom-specific
extension to the DOM API that renders the node and its descendants are
essentially useless. Otherwise, Python’s garbage collector will
eventually take care of the objects in the tree.
The definition of the DOM API for Python is given as part of the xml.dom
module documentation. This section lists the differences between the API and
xml.dom.minidom.
Break internal references within the DOM so that it will be garbage collected on
versions of Python without cyclic GC. Even when cyclic GC is available, using
this can make large amounts of memory available sooner, so calling this on DOM
objects as soon as they are no longer needed is good practice. This only needs
to be called on the Document object, but may be called on child nodes
to discard children of that node.
You can avoid calling this method explicitly by using the with
statement. The following code will automatically unlink dom when the
with block is exited:
withxml.dom.minidom.parse(datasource)asdom:...# Work with dom.
Write XML to the writer object. The writer should have a write() method
which matches that of the file object interface. The indent parameter is the
indentation of the current node. The addindent parameter is the incremental
indentation to use for subnodes of the current one. The newl parameter
specifies the string to use to terminate newlines.
For the Document node, an additional keyword argument encoding can
be used to specify the encoding field of the XML header.
Return a string or byte string containing the XML represented by
the DOM node.
With an explicit encoding[1] argument, the result is a byte
string in the specified encoding. It is recommended that you
always specify an encoding; you may use any encoding you like, but
an argument of “utf-8” is the most common choice, avoiding
UnicodeError exceptions in case of unrepresentable text
data.
With no encoding argument, the result is a Unicode string, and the
XML declaration in the resulting string does not specify an
encoding. Encoding this string in an encoding other than UTF-8 is
likely incorrect, since UTF-8 is the default encoding of XML.
Return a pretty-printed version of the document. indent specifies the
indentation string and defaults to a tabulator; newl specifies the string
emitted at the end of each line and defaults to \n.
The encoding argument behaves like the corresponding argument of
toxml().
This example program is a fairly realistic example of a simple program. In this
particular case, we do not take much advantage of the flexibility of the DOM.
importxml.dom.minidomdocument="""\<slideshow><title>Demo slideshow</title><slide><title>Slide title</title><point>This is a demo</point><point>Of a program for processing slides</point></slide><slide><title>Another demo slide</title><point>It is important</point><point>To have more than</point><point>one slide</point></slide></slideshow>"""dom=xml.dom.minidom.parseString(document)defgetText(nodelist):rc=[]fornodeinnodelist:ifnode.nodeType==node.TEXT_NODE:rc.append(node.data)return''.join(rc)defhandleSlideshow(slideshow):print("<html>")handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])slides=slideshow.getElementsByTagName("slide")handleToc(slides)handleSlides(slides)print("</html>")defhandleSlides(slides):forslideinslides:handleSlide(slide)defhandleSlide(slide):handleSlideTitle(slide.getElementsByTagName("title")[0])handlePoints(slide.getElementsByTagName("point"))defhandleSlideshowTitle(title):print("<title>%s</title>"%getText(title.childNodes))defhandleSlideTitle(title):print("<h2>%s</h2>"%getText(title.childNodes))defhandlePoints(points):print("<ul>")forpointinpoints:handlePoint(point)print("</ul>")defhandlePoint(point):print("<li>%s</li>"%getText(point.childNodes))defhandleToc(slides):forslideinslides:title=slide.getElementsByTagName("title")[0]print("<p>%s</p>"%getText(title.childNodes))handleSlideshow(dom)
The xml.dom.minidom module is essentially a DOM 1.0-compatible DOM with
some DOM 2 features (primarily namespace features).
Usage of the DOM interface in Python is straight-forward. The following mapping
rules apply:
Interfaces are accessed through instance objects. Applications should not
instantiate the classes themselves; they should use the creator functions
available on the Document object. Derived interfaces support all
operations (and attributes) from the base interfaces, plus any new operations.
Operations are used as methods. Since the DOM uses only in
parameters, the arguments are passed in normal order (from left to right).
There are no optional arguments. void operations return None.
IDL attributes map to instance attributes. For compatibility with the OMG IDL
language mapping for Python, an attribute foo can also be accessed through
accessor methods _get_foo() and _set_foo(). readonly
attributes must not be changed; this is not enforced at runtime.
The types shortint, unsignedint, unsignedlonglong, and
boolean all map to Python integer objects.
The type DOMString maps to Python strings. xml.dom.minidom supports
either bytes or strings, but will normally produce strings.
Values of type DOMString may also be None where allowed to have the IDL
null value by the DOM specification from the W3C.
const declarations map to variables in their respective scope (e.g.
xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE); they must not be changed.
NodeList objects are implemented using Python’s built-in list type.
These objects provide the interface defined in the DOM specification, but with
earlier versions of Python they do not support the official API. They are,
however, much more “Pythonic” than the interface defined in the W3C
recommendations.
The following interfaces have no implementation in xml.dom.minidom:
DOMTimeStamp
DocumentType
DOMImplementation
CharacterData
CDATASection
Notation
Entity
EntityReference
DocumentFragment
Most of these reflect information in the XML document that is not of general
utility to most DOM users.
The xml.sax package provides a number of modules which implement the
Simple API for XML (SAX) interface for Python. The package itself provides the
SAX exceptions and the convenience functions which will be most used by users of
the SAX API.
Create and return a SAX XMLReader object. The first parser found will
be used. If parser_list is provided, it must be a sequence of strings which
name modules that have a function named create_parser(). Modules listed
in parser_list will be used before modules in the default list of parsers.
Create a SAX parser and use it to parse a document. The document, passed in as
filename_or_stream, can be a filename or a file object. The handler
parameter needs to be a SAX ContentHandler instance. If
error_handler is given, it must be a SAX ErrorHandler instance; if
omitted, SAXParseException will be raised on all errors. There is no
return value; all work must be done by the handler passed in.
Similar to parse(), but parses from a buffer string received as a
parameter.
A typical SAX application uses three kinds of objects: readers, handlers and
input sources. “Reader” in this context is another term for parser, i.e. some
piece of code that reads the bytes or characters from the input source, and
produces a sequence of events. The events then get distributed to the handler
objects, i.e. the reader invokes a method on the handler. A SAX application
must therefore obtain a reader object, create or open the input sources, create
the handlers, and connect these objects all together. As the final step of
preparation, the reader is called to parse the input. During parsing, methods on
the handler objects are called based on structural and syntactic events from the
input data.
For these objects, only the interfaces are relevant; they are normally not
instantiated by the application itself. Since Python does not have an explicit
notion of interface, they are formally introduced as classes, but applications
may use implementations which do not inherit from the provided classes. The
InputSource, Locator, Attributes,
AttributesNS, and XMLReader interfaces are defined in the
module xml.sax.xmlreader. The handler interfaces are defined in
xml.sax.handler. For convenience, InputSource (which is often
instantiated directly) and the handler classes are also available from
xml.sax. These interfaces are described below.
In addition to these classes, xml.sax provides the following exception
classes.
Encapsulate an XML error or warning. This class can contain basic error or
warning information from either the XML parser or the application: it can be
subclassed to provide additional functionality or to add localization. Note
that although the handlers defined in the ErrorHandler interface
receive instances of this exception, it is not required to actually raise the
exception — it is also useful as a container for information.
When instantiated, msg should be a human-readable description of the error.
The optional exception parameter, if given, should be None or an exception
that was caught by the parsing code and is being passed along as information.
This is the base class for the other SAX exception classes.
Subclass of SAXException raised on parse errors. Instances of this class
are passed to the methods of the SAX ErrorHandler interface to provide
information about the parse error. This class supports the SAX Locator
interface as well as the SAXException interface.
Subclass of SAXException raised when a SAX XMLReader is
confronted with an unrecognized feature or property. SAX applications and
extensions may use this class for similar purposes.
Subclass of SAXException raised when a SAX XMLReader is asked to
enable a feature that is not supported, or to set a property to a value that the
implementation does not support. SAX applications and extensions may use this
class for similar purposes.
This site is the focal point for the definition of the SAX API. It provides a
Java implementation and online documentation. Links to implementations and
historical information are also available.
The SAX API defines four kinds of handlers: content handlers, DTD handlers,
error handlers, and entity resolvers. Applications normally only need to
implement those interfaces whose events they are interested in; they can
implement the interfaces in a single object or in multiple objects. Handler
implementations should inherit from the base classes provided in the module
xml.sax.handler, so that all methods get default implementations.
This is the main callback interface in SAX, and the one most important to
applications. The order of events in this interface mirrors the order of the
information in the document.
Basic interface for resolving entities. If you create an object implementing
this interface, then register the object with your Parser, the parser will call
the method in your object to resolve all external entities.
Interface used by the parser to present error and warning messages to the
application. The methods of this object control whether errors are immediately
converted to exceptions or are handled in some other way.
In addition to these classes, xml.sax.handler provides symbolic constants
for the feature and property names.
Users are expected to subclass ContentHandler to support their
application. The following methods are called by the parser on the appropriate
events in the input document:
Called by the parser to give the application a locator for locating the origin
of document events.
SAX parsers are strongly encouraged (though not absolutely required) to supply a
locator: if it does so, it must supply the locator to the application by
invoking this method before invoking any of the other methods in the
DocumentHandler interface.
The locator allows the application to determine the end position of any
document-related event, even if the parser is not reporting an error. Typically,
the application will use this information for reporting its own errors (such as
character content that does not match an application’s business rules). The
information returned by the locator is probably not sufficient for use with a
search engine.
Note that the locator will return correct information only during the invocation
of the events in this interface. The application should not attempt to use it at
any other time.
The SAX parser will invoke this method only once, and it will be the last method
invoked during the parse. The parser shall not invoke this method until it has
either abandoned parsing (because of an unrecoverable error) or reached the end
of input.
Begin the scope of a prefix-URI Namespace mapping.
The information from this event is not necessary for normal Namespace
processing: the SAX XML reader will automatically replace prefixes for element
and attribute names when the feature_namespaces feature is enabled (the
default).
There are cases, however, when applications need to use prefixes in character
data or in attribute values, where they cannot safely be expanded automatically;
the startPrefixMapping() and endPrefixMapping() events supply the
information to the application to expand prefixes in those contexts itself, if
necessary.
Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the element type as a
string and the attrs parameter holds an object of the Attributes
interface (see The Attributes Interface) containing the attributes of
the element. The object passed as attrs may be re-used by the parser; holding
on to a reference to it is not a reliable way to keep a copy of the attributes.
To keep a copy of the attributes, use the copy() method of the attrs
object.
Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a (uri,localname) tuple, the qname parameter contains the raw XML 1.0 name used in
the source document, and the attrs parameter holds an instance of the
AttributesNS interface (see The AttributesNS Interface)
containing the attributes of the element. If no namespace is associated with
the element, the uri component of name will be None. The object passed
as attrs may be re-used by the parser; holding on to a reference to it is not
a reliable way to keep a copy of the attributes. To keep a copy of the
attributes, use the copy() method of the attrs object.
Parsers may set the qname parameter to None, unless the
feature_namespace_prefixes feature is activated.
The Parser will call this method to report each chunk of character data. SAX
parsers may return all contiguous character data in a single chunk, or they may
split it into several chunks; however, all of the characters in any single event
must come from the same external entity so that the Locator provides useful
information.
content may be a string or bytes instance; the expat reader module
always produces strings.
Note
The earlier SAX 1 interface provided by the Python XML Special Interest Group
used a more Java-like interface for this method. Since most parsers used from
Python did not take advantage of the older interface, the simpler signature was
chosen to replace it. To convert old code to the new interface, use content
instead of slicing content with the old offset and length parameters.
Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk of ignorable
whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating
parsers may also use this method if they are capable of parsing and using
content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may
split it into several chunks; however, all of the characters in any single event
must come from the same external entity, so that the Locator provides useful
information.
The Parser will invoke this method once for each processing instruction found:
note that processing instructions may occur before or after the main document
element.
A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a
text declaration (XML 1.0, section 4.3.1) using this method.
The Parser will invoke this method once for each entity skipped. Non-validating
processors may skip entities if they have not seen the declarations (because,
for example, the entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
feature_external_ges and the feature_external_pes properties.
Resolve the system identifier of an entity and return either the system
identifier to read from as a string, or an InputSource to read from. The default
implementation returns systemId.
Objects with this interface are used to receive error and warning information
from the XMLReader. If you create an object that implements this
interface, then register the object with your XMLReader, the parser
will call the methods in your object to report all warnings and errors. There
are three levels of errors available: warnings, (possibly) recoverable errors,
and unrecoverable errors. All methods take a SAXParseException as the
only parameter. Errors and warnings may be converted to an exception by raising
the passed-in exception object.
Called when the parser encounters a recoverable error. If this method does not
raise an exception, parsing may continue, but further document information
should not be expected by the application. Allowing the parser to continue may
allow additional errors to be discovered in the input document.
Called when the parser presents minor warning information to the application.
Parsing is expected to continue when this method returns, and document
information will continue to be passed to the application. Raising an exception
in this method will cause parsing to end.
The module xml.sax.saxutils contains a number of classes and functions
that are commonly useful when creating SAX applications, either in direct use,
or as base classes.
You can escape other strings of data by passing a dictionary as the optional
entities parameter. The keys and values must all be strings; each key will be
replaced with its corresponding value. The characters '&', '<' and
'>' are always escaped, even if entities is provided.
Unescape '&', '<', and '>' in a string of data.
You can unescape other strings of data by passing a dictionary as the optional
entities parameter. The keys and values must all be strings; each key will be
replaced with its corresponding value. '&', '<', and '>'
are always unescaped, even if entities is provided.
Similar to escape(), but also prepares data to be used as an
attribute value. The return value is a quoted version of data with any
additional required replacements. quoteattr() will select a quote
character based on the content of data, attempting to avoid encoding any
quote characters in the string. If both single- and double-quote characters
are already in data, the double-quote characters will be encoded and data
will be wrapped in double-quotes. The resulting string can be used directly
as an attribute value:
>>> print("<element attr=%s>"%quoteattr("ab ' cd \" ef"))<element attr="ab ' cd " ef">
This function is useful when generating attribute values for HTML or any SGML
using the reference concrete syntax.
class xml.sax.saxutils.XMLGenerator(out=None, encoding='iso-8859-1', short_empty_elements=False)¶
This class implements the ContentHandler interface by writing SAX
events back into an XML document. In other words, using an XMLGenerator
as the content handler will reproduce the original document being parsed. out
should be a file-like object which will default to sys.stdout. encoding is
the encoding of the output stream which defaults to 'iso-8859-1'.
short_empty_elements controls the formatting of elements that contain no
content: if False (the default) they are emitted as a pair of start/end
tags, if set to True they are emitted as a single self-closed tag.
This class is designed to sit between an XMLReader and the client
application’s event handlers. By default, it does nothing but pass requests up
to the reader and events on to the handlers unmodified, but subclasses can
override specific methods to modify the event stream or the configuration
requests as they pass through.
This function takes an input source and an optional base URL and returns a fully
resolved InputSource object ready for reading. The input source can be
given as a string, a file-like object, or an InputSource object;
parsers will use this function to implement the polymorphic source argument to
their parse() method.
SAX parsers implement the XMLReader interface. They are implemented in
a Python module, which must provide a function create_parser(). This
function is invoked by xml.sax.make_parser() with no arguments to create
a new parser object.
In some cases, it is desirable not to parse an input source at once, but to feed
chunks of the document as they get available. Note that the reader will normally
not read the entire file, but read it in chunks as well; still parse()
won’t return until the entire document is processed. So these interfaces should
be used if the blocking behaviour of parse() is not desirable.
When the parser is instantiated it is ready to begin accepting data from the
feed method immediately. After parsing has been finished with a call to close
the reset method must be called to make the parser ready to accept new data,
either from feed or using the parse method.
Note that these methods must not be called during parsing, that is, after
parse has been called and before it returns.
By default, the class also implements the parse method of the XMLReader
interface using the feed, close and reset methods of the IncrementalParser
interface as a convenience to SAX 2.0 driver writers.
Interface for associating a SAX event with a document location. A locator object
will return valid results only during calls to DocumentHandler methods; at any
other time, the results are unpredictable. If information is not available,
methods may return None.
class xml.sax.xmlreader.InputSource(system_id=None)¶
Encapsulation of the information needed by the XMLReader to read
entities.
This class may include information about the public identifier, system
identifier, byte stream (possibly with character encoding information) and/or
the character stream of an entity.
Applications will create objects of this class for use in the
XMLReader.parse() method and for returning from
EntityResolver.resolveEntity.
An InputSource belongs to the application, the XMLReader is
not allowed to modify InputSource objects passed to it from the
application, although it may make copies and modify those.
This is an implementation of the Attributes interface (see section
The Attributes Interface). This is a dictionary-like object which
represents the element attributes in a startElement() call. In addition
to the most useful dictionary operations, it supports a number of other
methods as described by the interface. Objects of this class should be
instantiated by readers; attrs must be a dictionary-like object containing
a mapping from attribute names to attribute values.
class xml.sax.xmlreader.AttributesNSImpl(attrs, qnames)¶
Namespace-aware variant of AttributesImpl, which will be passed to
startElementNS(). It is derived from AttributesImpl, but
understands attribute names as two-tuples of namespaceURI and
localname. In addition, it provides a number of methods expecting qualified
names as they appear in the original document. This class implements the
AttributesNS interface (see section The AttributesNS Interface).
Process an input source, producing SAX events. The source object can be a
system identifier (a string identifying the input source – typically a file
name or an URL), a file-like object, or an InputSource object. When
parse() returns, the input is completely processed, and the parser object
can be discarded or reset. As a limitation, the current implementation only
accepts byte streams; processing of character streams is for further study.
Set the current EntityResolver. If no EntityResolver is set,
attempts to resolve an external entity will result in opening the system
identifier for the entity, and fail if it is not available.
Allow an application to set the locale for errors and warnings.
SAX parsers are not required to provide localization for errors and warnings; if
they cannot support the requested locale, however, they must raise a SAX
exception. Applications may request a locale change in the middle of a parse.
Return the current setting for feature featurename. If the feature is not
recognized, SAXNotRecognizedException is raised. The well-known
featurenames are listed in the module xml.sax.handler.
Set the featurename to value. If the feature is not recognized,
SAXNotRecognizedException is raised. If the feature or its setting is not
supported by the parser, SAXNotSupportedException is raised.
Return the current setting for property propertyname. If the property is not
recognized, a SAXNotRecognizedException is raised. The well-known
propertynames are listed in the module xml.sax.handler.
Set the propertyname to value. If the property is not recognized,
SAXNotRecognizedException is raised. If the property or its setting is
not supported by the parser, SAXNotSupportedException is raised.
Assume the end of the document. That will check well-formedness conditions that
can be checked only at the end, invoke handlers, and may clean up resources
allocated during parsing.
This method is called after close has been called to reset the parser so that it
is ready to parse new documents. The results of calling parse or feed after
close without calling reset are undefined.
Set the byte stream (a Python file-like object which does not perform
byte-to-character conversion) for this input source.
The SAX parser will ignore this if there is also a character stream specified,
but it will use a byte stream in preference to opening a URI connection itself.
If the application knows the character encoding of the byte stream, it should
set it with the setEncoding method.
Set the character stream for this input source. (The stream must be a Python 1.6
Unicode-wrapped file-like that performs conversion to strings.)
If there is a character stream specified, the SAX parser will ignore any byte
stream and will not attempt to open a URI connection to the system identifier.
Attributes objects implement a portion of the mapping protocol,
including the methods copy(), get(), __contains__(),
items(), keys(), and values(). The following methods
are also provided:
This interface is a subtype of the Attributes interface (see section
The Attributes Interface). All methods supported by that interface are also
available on AttributesNS objects.
The Element type is a flexible container object, designed to store
hierarchical data structures in memory. The type can be described as a cross
between a list and a dictionary.
Each element has a number of properties associated with it:
a tag which is a string identifying what kind of data this element represents
(the element type, in other words).
a number of attributes, stored in a Python dictionary.
a text string.
an optional tail string.
a number of child elements, stored in a Python sequence
To create an element instance, use the Element constructor or the
SubElement() factory function.
The ElementTree class can be used to wrap an element structure, and
convert it from and to XML.
A C implementation of this API is available as xml.etree.cElementTree.
See http://effbot.org/zone/element-index.htm for tutorials and links to other
docs. Fredrik Lundh’s page is also the location of the development version of
the xml.etree.ElementTree.
Changed in version 3.2:
Changed in version 3.2: The ElementTree API is updated to 1.3. For more information, see
Introducing ElementTree 1.3.
Comment element factory. This factory function creates a special element
that will be serialized as an XML comment by the standard serializer. The
comment string can be either a bytestring or a Unicode string. text is a
string containing the comment string. Returns an element instance
representing a comment.
Parses an XML document from a sequence of string fragments. sequence is a
list or other sequence containing XML data fragments. parser is an
optional parser instance. If not given, the standard XMLParser
parser is used. Returns an Element instance.
Parses an XML section into an element tree incrementally, and reports what’s
going on to the user. source is a filename or file object containing
XML data. events is a list of events to report back. If omitted, only “end”
events are reported. parser is an optional parser instance. If not
given, the standard XMLParser parser is used. Returns an
iterator providing (event,elem) pairs.
Note
iterparse() only guarantees that it has seen the “>”
character of a starting tag when it emits a “start” event, so the
attributes are defined, but the contents of the text and tail attributes
are undefined at that point. The same applies to the element children;
they may or may not be present.
If you need a fully populated element, look for “end” events instead.
Parses an XML section into an element tree. source is a filename or file
object containing XML data. parser is an optional parser instance. If
not given, the standard XMLParser parser is used. Returns an
ElementTree instance.
PI element factory. This factory function creates a special element that
will be serialized as an XML processing instruction. target is a string
containing the PI target. text is a string containing the PI contents, if
given. Returns an element instance, representing a processing instruction.
Registers a namespace prefix. The registry is global, and any existing
mapping for either the given prefix or the namespace URI will be removed.
prefix is a namespace prefix. uri is a namespace uri. Tags and
attributes in this namespace will be serialized with the given prefix, if at
all possible.
Subelement factory. This function creates an element instance, and appends
it to an existing element.
The element name, attribute names, and attribute values can be either
bytestrings or Unicode strings. parent is the parent element. tag is
the subelement name. attrib is an optional dictionary, containing element
attributes. extra contains additional attributes, given as keyword
arguments. Returns an element instance.
Generates a string representation of an XML element, including all
subelements. element is an Element instance. encoding[1] is
the output encoding (default is US-ASCII). Use encoding="unicode" to
generate a Unicode string. method is either "xml",
"html" or "text" (default is "xml"). Returns an (optionally)
encoded string containing the XML data.
Generates a string representation of an XML element, including all
subelements. element is an Element instance. encoding[1] is
the output encoding (default is US-ASCII). Use encoding="unicode" to
generate a Unicode string. method is either "xml",
"html" or "text" (default is "xml"). Returns a list of
(optionally) encoded strings containing the XML data. It does not guarantee
any specific sequence, except that "".join(tostringlist(element))==tostring(element).
Parses an XML section from a string constant. This function can be used to
embed “XML literals” in Python code. text is a string containing XML
data. parser is an optional parser instance. If not given, the standard
XMLParser parser is used. Returns an Element instance.
Parses an XML section from a string constant, and also returns a dictionary
which maps from element id:s to elements. text is a string containing XML
data. parser is an optional parser instance. If not given, the standard
XMLParser parser is used. Returns a tuple containing an
Element instance and a dictionary.
class xml.etree.ElementTree.Element(tag, attrib={}, **extra)¶
Element class. This class defines the Element interface, and provides a
reference implementation of this interface.
The element name, attribute names, and attribute values can be either
bytestrings or Unicode strings. tag is the element name. attrib is
an optional dictionary, containing element attributes. extra contains
additional attributes, given as keyword arguments.
The text attribute can be used to hold additional data associated with
the element. As the name implies this attribute is usually a string but
may be any application-specific object. If the element is created from
an XML file the attribute will contain any text found between the element
tags.
The tail attribute can be used to hold additional data associated with
the element. This attribute is usually a string but may be any
application-specific object. If the element is created from an XML file
the attribute will contain any text found after the element’s end tag and
before the next tag.
A dictionary containing the element’s attributes. Note that while the
attrib value is always a real mutable Python dictionary, an ElementTree
implementation may choose to use another internal representation, and
create the dictionary only if someone asks for it. To take advantage of
such implementations, use the dictionary methods below whenever possible.
The following dictionary-like methods work on the element attributes.
Finds text for the first subelement matching match. match may be
a tag name or path. Returns the text content of the first matching
element, or default if no element was found. Note that if the matching
element has no text content an empty string is returned.
Creates a tree iterator with the current element as the root.
The iterator iterates over this element and all elements below it, in
document (depth first) order. If tag is not None or '*', only
elements whose tag equals tag are returned from the iterator. If the
tree structure is modified during iteration, the result is undefined.
Removes subelement from the element. Unlike the find* methods this
method compares elements based on the instance identity, not on tag value
or contents.
Caution: Elements with no subelements will test as False. This behavior
will change in future versions. Use specific len(elem) or elemisNone test instead.
element=root.find('foo')ifnotelement:# careful!print("element not found, or element has no subelements")ifelementisNone:print("element not found")
Replaces the root element for this tree. This discards the current
contents of the tree, and replaces it with the given element. Use with
care. element is an element instance.
Finds the first toplevel element matching match. match may be a tag
name or path. Same as getroot().find(match). Returns the first matching
element, or None if no element was found.
Finds all matching subelements, by tag name or path. Same as
getroot().findall(match). match may be a tag name or path. Returns a
list containing all matching elements, in document order.
Finds the element text for the first toplevel element with given tag.
Same as getroot().findtext(match). match may be a tag name or path.
default is the value to return if the element was not found. Returns
the text content of the first matching element, or the default value no
element was found. Note that if the element is found, but has no text
content, this method returns an empty string.
Creates and returns a tree iterator for the root element. The iterator
loops over all elements in this tree, in section order. tag is the tag
to look for (default is to return all elements)
Finds all matching subelements, by tag name or path. Same as
getroot().iterfind(match). Returns an iterable yielding all matching
elements in document order.
Loads an external XML section into this element tree. source is a file
name or file object. parser is an optional parser instance.
If not given, the standard XMLParser parser is used. Returns the section
root element.
Writes the element tree to a file, as XML. file is a file name, or a
file object opened for writing. encoding[1] is the output encoding
(default is US-ASCII). Use encoding="unicode" to write a Unicode string.
xml_declaration controls if an XML declaration
should be added to the file. Use False for never, True for always, None
for only if not US-ASCII or UTF-8 or Unicode (default is None). method is
either "xml", "html" or "text" (default is "xml").
Returns an (optionally) encoded string.
This is the XML file that is going to be manipulated:
<html>
<head>
<title>Example page</title>
</head>
<body>
<p>Moved to <a href="http://example.org/">example.org</a>
or <a href="http://example.com/">example.com</a>.</p>
</body>
</html>
Example of changing the attribute “target” of every link in first paragraph:
>>> fromxml.etree.ElementTreeimportElementTree>>> tree=ElementTree()>>> tree.parse("index.xhtml")<Element 'html' at 0xb77e6fac>>>> p=tree.find("body/p")# Finds first occurrence of tag p in body>>> p<Element 'p' at 0xb77ec26c>>>> links=list(p.iter("a"))# Returns list of all links>>> links[<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]>>> foriinlinks:# Iterates through all found links... i.attrib["target"]="blank">>> tree.write("output.xhtml")
class xml.etree.ElementTree.QName(text_or_uri, tag=None)¶
QName wrapper. This can be used to wrap a QName attribute value, in order
to get proper namespace handling on output. text_or_uri is a string
containing the QName value, in the form {uri}local, or, if the tag argument
is given, the URI part of a QName. If tag is given, the first argument is
interpreted as an URI, and this argument is interpreted as a local name.
QName instances are opaque.
class xml.etree.ElementTree.TreeBuilder(element_factory=None)¶
Generic element structure builder. This builder converts a sequence of
start, data, and end method calls to a well-formed element structure. You
can use this class to build an element structure using a custom XML parser,
or a parser for some other XML-like format. The element_factory is called
to create new Element instances when given.
Handles a doctype declaration. name is the doctype name. pubid is
the public identifier. system is the system identifier. This method
does not exist on the default TreeBuilder class.
class xml.etree.ElementTree.XMLParser(html=0, target=None, encoding=None)¶
Element structure builder for XML source data, based on the expat
parser. html are predefined HTML entities. This flag is not supported by
the current implementation. target is the target object. If omitted, the
builder uses an instance of the standard TreeBuilder class. encoding[1]
is optional. If given, the value overrides the encoding specified in the
XML file.
XMLParser.feed() calls target‘s start() method
for each opening tag, its end() method for each closing tag,
and data is processed by method data(). XMLParser.close()
calls target‘s method close().
XMLParser can be used not only for building a tree structure.
This is an example of counting the maximum depth of an XML file:
>>> fromxml.etree.ElementTreeimportXMLParser>>> classMaxDepth:# The target object of the parser... maxDepth=0... depth=0... defstart(self,tag,attrib):# Called for each opening tag.... self.depth+=1... ifself.depth>self.maxDepth:... self.maxDepth=self.depth... defend(self,tag):# Called for each closing tag.... self.depth-=1... defdata(self,data):... pass# We do not need to do anything with data.... defclose(self):# Called when all data has been parsed.... returnself.maxDepth...>>> target=MaxDepth()>>> parser=XMLParser(target=target)>>> exampleXml="""... <a>... <b>... </b>... <b>... <c>... <d>... </d>... </c>... </b>... </a>""">>> parser.feed(exampleXml)>>> parser.close()4
The modules described in this chapter implement Internet protocols and support
for related technology. They are all implemented in Python. Most of these
modules require the presence of the system-dependent module socket, which
is currently supported on most popular platforms. Here is an overview:
The webbrowser module provides a high-level interface to allow displaying
Web-based documents to users. Under most circumstances, simply calling the
open() function from this module will do the right thing.
Under Unix, graphical browsers are preferred under X11, but text-mode browsers
will be used if graphical browsers are not available or an X11 display isn’t
available. If text-mode browsers are used, the calling process will block until
the user exits the browser.
If the environment variable BROWSER exists, it is interpreted to
override the platform default list of browsers, as a os.pathsep-separated
list of browsers to try in order. When the value of a list part contains the
string %s, then it is interpreted as a literal browser command line to be
used with the argument URL substituted for %s; if the part does not contain
%s, it is simply interpreted as the name of the browser to launch. [1]
For non-Unix platforms, or when a remote browser is available on Unix, the
controlling process will not wait for the user to finish with the browser, but
allow the remote browser to maintain its own windows on the display. If remote
browsers are not available on Unix, the controlling process will launch a new
browser and wait.
The script webbrowser can be used as a command-line interface for the
module. It accepts an URL as the argument. It accepts the following optional
parameters: -n opens the URL in a new browser window, if possible;
-t opens the URL in a new browser page (“tab”). The options are,
naturally, mutually exclusive.
Display url using the default browser. If new is 0, the url is opened
in the same browser window if possible. If new is 1, a new browser window
is opened if possible. If new is 2, a new browser page (“tab”) is opened
if possible. If autoraise is True, the window is raised if possible
(note that under many window managers this will occur regardless of the
setting of this variable).
Note that on some platforms, trying to open a filename using this function,
may work and start the operating system’s associated program. However, this
is neither supported nor portable.
Return a controller object for the browser type using. If using is
None, return a controller for a default browser appropriate to the
caller’s environment.
Register the browser type name. Once a browser type is registered, the
get() function can return a controller for that browser type. If
instance is not provided, or is None, constructor will be called without
parameters to create an instance when needed. If instance is provided,
constructor will never be called, and may be None.
This entry point is only useful if you plan to either set the BROWSER
variable or call get() with a nonempty argument matching the name of a
handler you declare.
A number of browser types are predefined. This table gives the type names that
may be passed to the get() function and the corresponding instantiations
for the controller classes, all defined in this module.
Type Name
Class Name
Notes
'mozilla'
Mozilla('mozilla')
'firefox'
Mozilla('mozilla')
'netscape'
Mozilla('netscape')
'galeon'
Galeon('galeon')
'epiphany'
Galeon('epiphany')
'skipstone'
BackgroundBrowser('skipstone')
'kfmclient'
Konqueror()
(1)
'konqueror'
Konqueror()
(1)
'kfm'
Konqueror()
(1)
'mosaic'
BackgroundBrowser('mosaic')
'opera'
Opera()
'grail'
Grail()
'links'
GenericBrowser('links')
'elinks'
Elinks('elinks')
'lynx'
GenericBrowser('lynx')
'w3m'
GenericBrowser('w3m')
'windows-default'
WindowsDefault
(2)
'internet-config'
InternetConfig
(3)
'macosx'
MacOSX('default')
(4)
Notes:
“Konqueror” is the file manager for the KDE desktop environment for Unix, and
only makes sense to use if KDE is running. Some way of reliably detecting KDE
would be nice; the KDEDIR variable is not sufficient. Note also that
the name “kfm” is used even when using the konqueror command with KDE
2 — the implementation selects the best strategy for running Konqueror.
Only on Windows platforms.
Only on Mac OS platforms; requires the standard MacPython ic module.
Only on Mac OS X platform.
Here are some simple examples:
url='http://www.python.org/'# Open URL in a new tab, if a browser window is already open.webbrowser.open_new_tab(url+'doc/')# Open URL in new window, raising the window if possible.webbrowser.open_new(url)
Display url using the browser handled by this controller. If new is 1, a new
browser window is opened if possible. If new is 2, a new browser page (“tab”)
is opened if possible.
A CGI script is invoked by an HTTP server, usually to process user input
submitted through an HTML <FORM> or <ISINDEX> element.
Most often, CGI scripts live in the server’s special cgi-bin directory.
The HTTP server places all sorts of information about the request (such as the
client’s hostname, the requested URL, the query string, and lots of other
goodies) in the script’s shell environment, executes the script, and sends the
script’s output back to the client.
The script’s input is connected to the client too, and sometimes the form data
is read this way; at other times the form data is passed via the “query string”
part of the URL. This module is intended to take care of the different cases
and provide a simpler interface to the Python script. It also provides a number
of utilities that help in debugging scripts, and the latest addition is support
for file uploads from a form (if your browser supports it).
The output of a CGI script should consist of two sections, separated by a blank
line. The first section contains a number of headers, telling the client what
kind of data is following. Python code to generate a minimal header section
looks like this:
print("Content-Type: text/html")# HTML is followingprint()# blank line, end of headers
The second section is usually HTML, which allows the client software to display
nicely formatted text with header, in-line images, etc. Here’s Python code that
prints a simple piece of HTML:
print("<TITLE>CGI script output</TITLE>")print("<H1>This is my first CGI script</H1>")print("Hello, world!")
When you write a new script, consider adding these lines:
importcgitbcgitb.enable()
This activates a special exception handler that will display detailed reports in
the Web browser if any errors occur. If you’d rather not show the guts of your
program to users of your script, you can have the reports saved to files
instead, with code like this:
importcgitbcgitb.enable(display=0,logdir="/tmp")
It’s very helpful to use this feature during script development. The reports
produced by cgitb provide information that can save you a lot of time in
tracking down bugs. You can always remove the cgitb line later when you
have tested your script and are confident that it works correctly.
To get at submitted form data, use the FieldStorage class. Instantiate
it exactly once, without arguments. This reads the form contents from standard
input or the environment (depending on the value of various environment
variables set according to the CGI standard). Since it may consume standard
input, it should be instantiated only once.
The FieldStorage instance can be indexed like a Python dictionary.
It allows membership testing with the in operator, and also supports
the standard dictionary method keys() and the built-in function
len(). Form fields containing empty strings are ignored and do not appear
in the dictionary; to keep such values, provide a true value for the optional
keep_blank_values keyword parameter when creating the FieldStorage
instance.
For instance, the following code (which assumes that the
Content-Type header and blank line have already been printed)
checks that the fields name and addr are both set to a non-empty
string:
form=cgi.FieldStorage()if"name"notinformor"addr"notinform:print("<H1>Error</H1>")print("Please fill in the name and addr fields.")returnprint("<p>name:",form["name"].value)print("<p>addr:",form["addr"].value)...furtherformprocessinghere...
Here the fields, accessed through form[key], are themselves instances of
FieldStorage (or MiniFieldStorage, depending on the form
encoding). The value attribute of the instance yields the string value
of the field. The getvalue() method returns this string value directly;
it also accepts an optional second argument as a default to return if the
requested key is not present.
If the submitted form data contains more than one field with the same name, the
object retrieved by form[key] is not a FieldStorage or
MiniFieldStorage instance but a list of such instances. Similarly, in
this situation, form.getvalue(key) would return a list of strings. If you
expect this possibility (when your HTML form contains multiple fields with the
same name), use the getlist() function, which always returns a list of
values (so that you do not need to special-case the single item case). For
example, this code concatenates any number of username fields, separated by
commas:
If a field represents an uploaded file, accessing the value via the
value attribute or the getvalue() method reads the entire file in
memory as a string. This may not be what you want. You can test for an uploaded
file by testing either the filename attribute or the file
attribute. You can then read the data at leisure from the file
attribute:
fileitem=form["userfile"]iffileitem.file:# It's an uploaded file; count lineslinecount=0whileTrue:line=fileitem.file.readline()ifnotline:breaklinecount=linecount+1
If an error is encountered when obtaining the contents of an uploaded file
(for example, when the user interrupts the form submission by clicking on
a Back or Cancel button) the done attribute of the object for the
field will be set to the value -1.
The file upload draft standard entertains the possibility of uploading multiple
files from one field (using a recursive multipart/* encoding).
When this occurs, the item will be a dictionary-like FieldStorage item.
This can be determined by testing its type attribute, which should be
multipart/form-data (or perhaps another MIME type matching
multipart/*). In this case, it can be iterated over recursively
just like the top-level form object.
When a form is submitted in the “old” format (as the query string or as a single
data part of type application/x-www-form-urlencoded), the items will
actually be instances of the class MiniFieldStorage. In this case, the
list, file, and filename attributes are always None.
A form submitted via POST that also has a query string will contain both
FieldStorage and MiniFieldStorage items.
The previous section explains how to read CGI form data using the
FieldStorage class. This section describes a higher level interface
which was added to this class to allow one to do it in a more readable and
intuitive way. The interface doesn’t make the techniques described in previous
sections obsolete — they are still useful to process file uploads efficiently,
for example.
The interface consists of two simple methods. Using the methods you can process
form data in a generic way, without the need to worry whether only one or more
values were posted under one name.
In the previous section, you learned to write following code anytime you
expected a user to post more than one value under one name:
item = form.getvalue("item")
if isinstance(item, list):
# The user is requesting more than one item.
else:
# The user is requesting only one item.
This situation is common for example when a form contains a group of multiple
checkboxes with the same name:
In most situations, however, there’s only one form control with a particular
name in a form and then you expect and need only one value associated with this
name. So you write a script containing for example this code:
user=form.getvalue("user").upper()
The problem with the code is that you should never expect that a client will
provide valid input to your scripts. For example, if a curious user appends
another user=foo pair to the query string, then the script would crash,
because in this situation the getvalue("user") method call returns a list
instead of a string. Calling the upper() method on a list is not valid
(since lists do not have a method of this name) and results in an
AttributeError exception.
Therefore, the appropriate way to read form data values was to always use the
code which checks whether the obtained value is a single value or a list of
values. That’s annoying and leads to less readable scripts.
A more convenient approach is to use the methods getfirst() and
getlist() provided by this higher level interface.
This method always returns only one value associated with form field name.
The method returns only the first value in case that more values were posted
under such name. Please note that the order in which the values are received
may vary from browser to browser and should not be counted on. [1] If no such
form field or value exists then the method returns the value specified by the
optional parameter default. This parameter defaults to None if not
specified.
This method always returns a list of values associated with form field name.
The method returns an empty list if no such form field or value exists for
name. It returns a list consisting of one item if only one such value exists.
Using these methods you can write nice compact code:
importcgiform=cgi.FieldStorage()user=form.getfirst("user","").upper()# This way it's safe.foriteminform.getlist("item"):do_something(item)
Parse a query in the environment or from a file (the file defaults to
sys.stdin). The keep_blank_values and strict_parsing parameters are
passed to urllib.parse.parse_qs() unchanged.
Parse input of type multipart/form-data (for file uploads).
Arguments are fp for the input file and pdict for a dictionary containing
other parameters in the Content-Type header.
Returns a dictionary just like urllib.parse.parse_qs() keys are the field names, each
value is a list of values for that field. This is easy to use but not much good
if you are expecting megabytes to be uploaded — in that case, use the
FieldStorage class instead which is much more flexible.
Note that this does not parse nested multipart parts — use
FieldStorage for that.
Convert the characters '&', '<' and '>' in string s to HTML-safe
sequences. Use this if you need to display text that might contain such
characters in HTML. If the optional flag quote is true, the quotation mark
character (") is also translated; this helps for inclusion in an HTML
attribute value delimited by double quotes, as in <ahref="...">. Note
that single quotes are never translated.
Deprecated since version 3.2:
Deprecated since version 3.2: This function is unsafe because quote is false by default, and therefore
deprecated. Use html.escape() instead.
There’s one important rule: if you invoke an external program (via the
os.system() or os.popen() functions. or others with similar
functionality), make very sure you don’t pass arbitrary strings received from
the client to the shell. This is a well-known security hole whereby clever
hackers anywhere on the Web can exploit a gullible CGI script to invoke
arbitrary shell commands. Even parts of the URL or field names cannot be
trusted, since the request doesn’t have to come from your form!
To be on the safe side, if you must pass a string gotten from a form to a shell
command, you should make sure the string contains only alphanumeric characters,
dashes, underscores, and periods.
Read the documentation for your HTTP server and check with your local system
administrator to find the directory where CGI scripts should be installed;
usually this is in a directory cgi-bin in the server tree.
Make sure that your script is readable and executable by “others”; the Unix file
mode should be 0o755 octal (use chmod0755filename). Make sure that the
first line of the script contains #! starting in column 1 followed by the
pathname of the Python interpreter, for instance:
#!/usr/local/bin/python
Make sure the Python interpreter exists and is executable by “others”.
Make sure that any files your script needs to read or write are readable or
writable, respectively, by “others” — their mode should be 0o644 for
readable and 0o666 for writable. This is because, for security reasons, the
HTTP server executes your script as user “nobody”, without any special
privileges. It can only read (write, execute) files that everybody can read
(write, execute). The current directory at execution time is also different (it
is usually the server’s cgi-bin directory) and the set of environment variables
is also different from what you get when you log in. In particular, don’t count
on the shell’s search path for executables (PATH) or the Python module
search path (PYTHONPATH) to be set to anything interesting.
If you need to load modules from a directory which is not on Python’s default
module search path, you can change the path in your script, before importing
other modules. For example:
Unfortunately, a CGI script will generally not run when you try it from the
command line, and a script that works perfectly from the command line may fail
mysteriously when run from the server. There’s one reason why you should still
test your script from the command line: if it contains a syntax error, the
Python interpreter won’t execute it at all, and the HTTP server will most likely
send a cryptic error to the client.
Assuming your script has no syntax errors, yet it does not work, you have no
choice but to read the next section.
First of all, check for trivial installation errors — reading the section
above on installing your CGI script carefully can save you a lot of time. If
you wonder whether you have understood the installation procedure correctly, try
installing a copy of this module file (cgi.py) as a CGI script. When
invoked as a script, the file will dump its environment and the contents of the
form in HTML form. Give it the right mode etc, and send it a request. If it’s
installed in the standard cgi-bin directory, it should be possible to
send it a request by entering a URL into your browser of the form:
If this gives an error of type 404, the server cannot find the script – perhaps
you need to install it in a different directory. If it gives another error,
there’s an installation problem that you should fix before trying to go any
further. If you get a nicely formatted listing of the environment and form
content (in this example, the fields should be listed as “addr” with value “At
Home” and “name” with value “Joe Blow”), the cgi.py script has been
installed correctly. If you follow the same procedure for your own script, you
should now be able to debug it.
The next step could be to call the cgi module’s test() function
from your script: replace its main code with the single statement
cgi.test()
This should produce the same results as those gotten from installing the
cgi.py file itself.
When an ordinary Python script raises an unhandled exception (for whatever
reason: of a typo in a module name, a file that can’t be opened, etc.), the
Python interpreter prints a nice traceback and exits. While the Python
interpreter will still do this when your CGI script raises an exception, most
likely the traceback will end up in one of the HTTP server’s log files, or be
discarded altogether.
Fortunately, once you have managed to get your script to execute some code,
you can easily send tracebacks to the Web browser using the cgitb module.
If you haven’t done so already, just add the lines:
importcgitbcgitb.enable()
to the top of your script. Then try running it again; when a problem occurs,
you should see a detailed report that will likely make apparent the cause of the
crash.
If you suspect that there may be a problem in importing the cgitb module,
you can use an even more robust approach (which only uses built-in modules):
This relies on the Python interpreter to print the traceback. The content type
of the output is set to plain text, which disables all HTML processing. If your
script works, the raw HTML will be displayed by your client. If it raises an
exception, most likely after the first two lines have been printed, a traceback
will be displayed. Because no HTML interpretation is going on, the traceback
will be readable.
Most HTTP servers buffer the output from CGI scripts until the script is
completed. This means that it is not possible to display a progress report on
the client’s display while the script is running.
Check the installation instructions above.
Check the HTTP server’s log files. (tail-flogfile in a separate window
may be useful!)
Always check a script for syntax errors first, by doing something like
pythonscript.py.
If your script does not have any syntax errors, try adding importcgitb;cgitb.enable() to the top of the script.
When invoking external programs, make sure they can be found. Usually, this
means using absolute path names — PATH is usually not set to a very
useful value in a CGI script.
When reading or writing external files, make sure they can be read or written
by the userid under which your CGI script will be running: this is typically the
userid under which the web server is running, or some explicitly specified
userid for a web server’s suexec feature.
Don’t try to give a CGI script a set-uid mode. This doesn’t work on most
systems, and is a security liability as well.
Note that some recent versions of the HTML specification do state what
order the field values should be supplied in, but knowing whether a request
was received from a conforming browser, or even from a browser at all, is
tedious and error-prone.
The cgitb module provides a special exception handler for Python scripts.
(Its name is a bit misleading. It was originally designed to display extensive
traceback information in HTML for CGI scripts. It was later generalized to also
display this information in plain text.) After this module is activated, if an
uncaught exception occurs, a detailed, formatted report will be displayed. The
report includes a traceback showing excerpts of the source code for each level,
as well as the values of the arguments and local variables to currently running
functions, to help you debug the problem. Optionally, you can save this
information to a file instead of sending it to the browser.
To enable this feature, simply add this to the top of your CGI script:
importcgitbcgitb.enable()
The options to the enable() function control whether the report is
displayed in the browser and whether the report is logged to a file for later
analysis.
This function causes the cgitb module to take over the interpreter’s
default handling for exceptions by setting the value of sys.excepthook.
The optional argument display defaults to 1 and can be set to 0 to
suppress sending the traceback to the browser. If the argument logdir is
present, the traceback reports are written to files. The value of logdir
should be a directory where these files will be placed. The optional argument
context is the number of lines of context to display around the current line
of source code in the traceback; this defaults to 5. If the optional
argument format is "html", the output is formatted as HTML. Any other
value forces plain text output. The default value is "html".
This function handles an exception using the default settings (that is, show a
report in the browser, but don’t log to a file). This can be used when you’ve
caught an exception and want to report it using cgitb. The optional
info argument should be a 3-tuple containing an exception type, exception
value, and traceback object, exactly like the tuple returned by
sys.exc_info(). If the info argument is not supplied, the current
exception is obtained from sys.exc_info().
wsgiref — WSGI Utilities and Reference Implementation¶
The Web Server Gateway Interface (WSGI) is a standard interface between web
server software and web applications written in Python. Having a standard
interface makes it easy to use an application that supports WSGI with a number
of different web servers.
Only authors of web servers and programming frameworks need to know every detail
and corner case of the WSGI design. You don’t need to understand every detail
of WSGI just to install a WSGI application or to write a web application using
an existing framework.
wsgiref is a reference implementation of the WSGI specification that can
be used to add WSGI support to a web server or framework. It provides utilities
for manipulating WSGI environment variables and response headers, base classes
for implementing WSGI servers, a demo HTTP server that serves WSGI applications,
and a validation tool that checks WSGI servers and applications for conformance
to the WSGI specification (PEP 3333).
See http://www.wsgi.org for more information about WSGI, and links to tutorials
and other resources.
This module provides a variety of utility functions for working with WSGI
environments. A WSGI environment is a dictionary containing HTTP request
variables as described in PEP 3333. All of the functions taking an environ
parameter expect a WSGI-compliant dictionary to be supplied; please see
PEP 3333 for a detailed specification.
Return a guess for whether wsgi.url_scheme should be “http” or “https”, by
checking for a HTTPS environment variable in the environ dictionary. The
return value is a string.
This function is useful when creating a gateway that wraps CGI or a CGI-like
protocol such as FastCGI. Typically, servers providing such protocols will
include a HTTPS variable with a value of “1” “yes”, or “on” when a request
is received via SSL. So, this function returns “https” if such a value is
found, and “http” otherwise.
Return the full request URI, optionally including the query string, using the
algorithm found in the “URL Reconstruction” section of PEP 3333. If
include_query is false, the query string is not included in the resulting URI.
Similar to request_uri(), except that the PATH_INFO and
QUERY_STRING variables are ignored. The result is the base URI of the
application object addressed by the request.
Shift a single name from PATH_INFO to SCRIPT_NAME and return the name.
The environ dictionary is modified in-place; use a copy if you need to keep
the original PATH_INFO or SCRIPT_NAME intact.
If there are no remaining path segments in PATH_INFO, None is returned.
Typically, this routine is used to process each portion of a request URI path,
for example to treat the path as a series of dictionary keys. This routine
modifies the passed-in environment to make it suitable for invoking another WSGI
application that is located at the target URI. For example, if there is a WSGI
application at /foo, and the request URI path is /foo/bar/baz, and the
WSGI application at /foo calls shift_path_info(), it will receive the
string “bar”, and the environment will be updated to be suitable for passing to
a WSGI application at /foo/bar. That is, SCRIPT_NAME will change from
/foo to /foo/bar, and PATH_INFO will change from /bar/baz to
/baz.
When PATH_INFO is just a “/”, this routine returns an empty string and
appends a trailing slash to SCRIPT_NAME, even though empty path segments are
normally ignored, and SCRIPT_NAME doesn’t normally end in a slash. This is
intentional behavior, to ensure that an application can tell the difference
between URIs ending in /x from ones ending in /x/ when using this
routine to do object traversal.
Update environ with trivial defaults for testing purposes.
This routine adds various parameters required for WSGI, including HTTP_HOST,
SERVER_NAME, SERVER_PORT, REQUEST_METHOD, SCRIPT_NAME,
PATH_INFO, and all of the PEP 3333-defined wsgi.* variables. It
only supplies default values, and does not replace any existing settings for
these variables.
This routine is intended to make it easier for unit tests of WSGI servers and
applications to set up dummy environments. It should NOT be used by actual WSGI
servers or applications, since the data is fake!
Example usage:
fromwsgiref.utilimportsetup_testing_defaultsfromwsgiref.simple_serverimportmake_server# A relatively simple WSGI application. It's going to print out the# environment dictionary after being updated by setup_testing_defaultsdefsimple_app(environ,start_response):setup_testing_defaults(environ)status='200 OK'headers=[('Content-type','text/plain; charset=utf-8')]start_response(status,headers)ret=[("%s: %s\n"%(key,value)).encode("utf-8")forkey,valueinenviron.items()]returnrethttpd=make_server('',8000,simple_app)print("Serving on port 8000...")httpd.serve_forever()
In addition to the environment functions above, the wsgiref.util module
also provides these miscellaneous utilities:
Return true if ‘header_name’ is an HTTP/1.1 “Hop-by-Hop” header, as defined by
RFC 2616.
class wsgiref.util.FileWrapper(filelike, blksize=8192)¶
A wrapper to convert a file-like object to an iterator. The resulting objects
support both __getitem__() and __iter__() iteration styles, for
compatibility with Python 2.1 and Jython. As the object is iterated over, the
optional blksize parameter will be repeatedly passed to the filelike
object’s read() method to obtain bytestrings to yield. When read()
returns an empty bytestring, iteration is ended and is not resumable.
If filelike has a close() method, the returned object will also have a
close() method, and it will invoke the filelike object’s close()
method when called.
Example usage:
fromioimportStringIOfromwsgiref.utilimportFileWrapper# We're using a StringIO-buffer for as the file-like objectfilelike=StringIO("This is an example file-like object"*10)wrapper=FileWrapper(filelike,blksize=5)forchunkinwrapper:print(chunk)
Create a mapping-like object wrapping headers, which must be a list of header
name/value tuples as described in PEP 3333.
Headers objects support typical mapping operations including
__getitem__(), get(), __setitem__(), setdefault(),
__delitem__() and __contains__(). For each of
these methods, the key is the header name (treated case-insensitively), and the
value is the first value associated with that header name. Setting a header
deletes any existing values for that header, then adds a new value at the end of
the wrapped header list. Headers’ existing order is generally maintained, with
new headers added to the end of the wrapped list.
Unlike a dictionary, Headers objects do not raise an error when you try
to get or delete a key that isn’t in the wrapped header list. Getting a
nonexistent header just returns None, and deleting a nonexistent header does
nothing.
Headers objects also support keys(), values(), and
items() methods. The lists returned by keys() and items() can
include the same key more than once if there is a multi-valued header. The
len() of a Headers object is the same as the length of its
items(), which is the same as the length of the wrapped header list. In
fact, the items() method just returns a copy of the wrapped header list.
Calling bytes() on a Headers object returns a formatted bytestring
suitable for transmission as HTTP response headers. Each header is placed on a
line with its value, separated by a colon and a space. Each line is terminated
by a carriage return and line feed, and the bytestring is terminated with a
blank line.
In addition to their mapping interface and formatting features, Headers
objects also have the following methods for querying and adding multi-valued
headers, and for adding headers with MIME parameters:
Return a list of all the values for the named header.
The returned list will be sorted in the order they appeared in the original
header list or were added to this instance, and may contain duplicates. Any
fields deleted and re-inserted are always appended to the header list. If no
fields exist with the given name, returns an empty list.
Add a (possibly multi-valued) header, with optional MIME parameters specified
via keyword arguments.
name is the header field to add. Keyword arguments can be used to set MIME
parameters for the header field. Each parameter must be a string or None.
Underscores in parameter names are converted to dashes, since dashes are illegal
in Python identifiers, but many MIME parameter names include dashes. If the
parameter value is a string, it is added to the header value parameters in the
form name="value". If it is None, only the parameter name is added.
(This is used for MIME parameters without a value.) Example usage:
This module implements a simple HTTP server (based on http.server)
that serves WSGI applications. Each server instance serves a single WSGI
application on a given host and port. If you want to serve multiple
applications on a single host and port, you should create a WSGI application
that parses PATH_INFO to select which application to invoke for each
request. (E.g., using the shift_path_info() function from
wsgiref.util.)
Create a new WSGI server listening on host and port, accepting connections
for app. The return value is an instance of the supplied server_class, and
will process requests using the specified handler_class. app must be a WSGI
application object, as defined by PEP 3333.
Example usage:
fromwsgiref.simple_serverimportmake_server,demo_apphttpd=make_server('',8000,demo_app)print("Serving HTTP on port 8000...")# Respond to requests until process is killedhttpd.serve_forever()# Alternative: serve one request, then exithttpd.handle_request()
This function is a small but complete WSGI application that returns a text page
containing the message “Hello world!” and a list of the key/value pairs provided
in the environ parameter. It’s useful for verifying that a WSGI server (such
as wsgiref.simple_server) is able to run a simple WSGI application
correctly.
class wsgiref.simple_server.WSGIServer(server_address, RequestHandlerClass)¶
Create a WSGIServer instance. server_address should be a
(host,port) tuple, and RequestHandlerClass should be the subclass of
http.server.BaseHTTPRequestHandler that will be used to process
requests.
You do not normally need to call this constructor, as the make_server()
function can handle all the details for you.
WSGIServer is a subclass of http.server.HTTPServer, so all
of its methods (such as serve_forever() and handle_request()) are
available. WSGIServer also provides these WSGI-specific methods:
Normally, however, you do not need to use these additional methods, as
set_app() is normally called by make_server(), and the
get_app() exists mainly for the benefit of request handler instances.
class wsgiref.simple_server.WSGIRequestHandler(request, client_address, server)¶
Create an HTTP handler for the given request (i.e. a socket), client_address
(a (host,port) tuple), and server (WSGIServer instance).
You do not need to create instances of this class directly; they are
automatically created as needed by WSGIServer objects. You can,
however, subclass this class and supply it as a handler_class to the
make_server() function. Some possibly relevant methods for overriding in
subclasses:
Returns a dictionary containing the WSGI environment for a request. The default
implementation copies the contents of the WSGIServer object’s
base_environ dictionary attribute and then adds various headers derived
from the HTTP request. Each call to this method should return a new dictionary
containing all of the relevant CGI environment variables as specified in
PEP 3333.
Process the HTTP request. The default implementation creates a handler instance
using a wsgiref.handlers class to implement the actual WSGI application
interface.
When creating new WSGI application objects, frameworks, servers, or middleware,
it can be useful to validate the new code’s conformance using
wsgiref.validate. This module provides a function that creates WSGI
application objects that validate communications between a WSGI server or
gateway and a WSGI application object, to check both sides for protocol
conformance.
Note that this utility does not guarantee complete PEP 3333 compliance; an
absence of errors from this module does not necessarily mean that errors do not
exist. However, if this module does produce an error, then it is virtually
certain that either the server or application is not 100% compliant.
This module is based on the paste.lint module from Ian Bicking’s “Python
Paste” library.
Wrap application and return a new WSGI application object. The returned
application will forward all requests to the original application, and will
check that both the application and the server invoking it are conforming to
the WSGI specification and to RFC 2616.
Any detected nonconformance results in an AssertionError being raised;
note, however, that how these errors are handled is server-dependent. For
example, wsgiref.simple_server and other servers based on
wsgiref.handlers (that don’t override the error handling methods to do
something else) will simply output a message that an error has occurred, and
dump the traceback to sys.stderr or some other error stream.
This wrapper may also generate output using the warnings module to
indicate behaviors that are questionable but which may not actually be
prohibited by PEP 3333. Unless they are suppressed using Python command-line
options or the warnings API, any such warnings will be written to
sys.stderr (notwsgi.errors, unless they happen to be the same
object).
Example usage:
fromwsgiref.validateimportvalidatorfromwsgiref.simple_serverimportmake_server# Our callable object which is intentionally not compliant to the# standard, so the validator is going to breakdefsimple_app(environ,start_response):status='200 OK'# HTTP Statusheaders=[('Content-type','text/plain')]# HTTP Headersstart_response(status,headers)# This is going to break because we need to return a list, and# the validator is going to inform usreturnb"Hello World"# This is the application wrapped in a validatorvalidator_app=validator(simple_app)httpd=make_server('',8000,validator_app)print("Listening on port 8000....")httpd.serve_forever()
This module provides base handler classes for implementing WSGI servers and
gateways. These base classes handle most of the work of communicating with a
WSGI application, as long as they are given a CGI-like environment, along with
input, output, and error streams.
CGI-based invocation via sys.stdin, sys.stdout, sys.stderr and
os.environ. This is useful when you have a WSGI application and want to run
it as a CGI script. Simply invoke CGIHandler().run(app), where app is
the WSGI application object you wish to invoke.
This class is a subclass of BaseCGIHandler that sets wsgi.run_once
to true, wsgi.multithread to false, and wsgi.multiprocess to true, and
always uses sys and os to obtain the necessary CGI streams and
environment.
A specialized alternative to CGIHandler, for use when deploying on
Microsoft’s IIS web server, without having set the config allowPathInfo
option (IIS>=7) or metabase allowPathInfoForScriptMappings (IIS<7).
By default, IIS gives a PATH_INFO that duplicates the SCRIPT_NAME at
the front, causing problems for WSGI applications that wish to implement
routing. This handler strips any such duplicated path.
IIS can be configured to pass the correct PATH_INFO, but this causes
another bug where PATH_TRANSLATED is wrong. Luckily this variable is
rarely used and is not guaranteed by WSGI. On IIS<7, though, the
setting can only be made on a vhost level, affecting all other script
mappings, many of which break when exposed to the PATH_TRANSLATED bug.
For this reason IIS<7 is almost never deployed with the fix. (Even IIS7
rarely uses it because there is still no UI for it.)
There is no way for CGI code to tell whether the option was set, so a
separate handler class is provided. It is used in the same way as
CGIHandler, i.e., by calling IISCGIHandler().run(app), where
app is the WSGI application object you wish to invoke.
New in version 3.2:
New in version 3.2.
class wsgiref.handlers.BaseCGIHandler(stdin, stdout, stderr, environ, multithread=True, multiprocess=False)¶
Similar to CGIHandler, but instead of using the sys and
os modules, the CGI environment and I/O streams are specified explicitly.
The multithread and multiprocess values are used to set the
wsgi.multithread and wsgi.multiprocess flags for any applications run by
the handler instance.
This class is a subclass of SimpleHandler intended for use with
software other than HTTP “origin servers”. If you are writing a gateway
protocol implementation (such as CGI, FastCGI, SCGI, etc.) that uses a
Status: header to send an HTTP status, you probably want to subclass this
instead of SimpleHandler.
class wsgiref.handlers.SimpleHandler(stdin, stdout, stderr, environ, multithread=True, multiprocess=False)¶
Similar to BaseCGIHandler, but designed for use with HTTP origin
servers. If you are writing an HTTP server implementation, you will probably
want to subclass this instead of BaseCGIHandler
This class is a subclass of BaseHandler. It overrides the
__init__(), get_stdin(), get_stderr(), add_cgi_vars(),
_write(), and _flush() methods to support explicitly setting the
environment and streams via the constructor. The supplied environment and
streams are stored in the stdin, stdout, stderr, and
environ attributes.
This is an abstract base class for running WSGI applications. Each instance
will handle a single HTTP request, although in principle you could create a
subclass that was reusable for multiple requests.
BaseHandler instances have only one method intended for external use:
All of the other BaseHandler methods are invoked by this method in the
process of running the application, and thus exist primarily to allow
customizing the process.
The following methods MUST be overridden in a subclass:
Buffer the bytes data for transmission to the client. It’s okay if this
method actually transmits the data; BaseHandler just separates write
and flush operations for greater efficiency when the underlying system actually
has such a distinction.
Insert CGI variables for the current request into the environ attribute.
Here are some other methods and attributes you may wish to override. This list
is only a summary, however, and does not include every method that can be
overridden. You should consult the docstrings and source code for additional
information before attempting to create a customized BaseHandler
subclass.
Attributes and methods for customizing the WSGI environment:
The value to be used for the wsgi.multithread environment variable. It
defaults to true in BaseHandler, but may have a different default (or
be set by the constructor) in the other subclasses.
The value to be used for the wsgi.multiprocess environment variable. It
defaults to true in BaseHandler, but may have a different default (or
be set by the constructor) in the other subclasses.
The default environment variables to be included in every request’s WSGI
environment. By default, this is a copy of os.environ at the time that
wsgiref.handlers was imported, but subclasses can either create their own
at the class or instance level. Note that the dictionary should be considered
read-only, since the default value is shared between multiple classes and
instances.
If the origin_server attribute is set, this attribute’s value is used to
set the default SERVER_SOFTWARE WSGI environment variable, and also to set a
default Server: header in HTTP responses. It is ignored for handlers (such
as BaseCGIHandler and CGIHandler) that are not HTTP origin
servers.
Return the URL scheme being used for the current request. The default
implementation uses the guess_scheme() function from wsgiref.util
to guess whether the scheme should be “http” or “https”, based on the current
request’s environ variables.
Set the environ attribute to a fully-populated WSGI environment. The
default implementation uses all of the above methods and attributes, plus the
get_stdin(), get_stderr(), and add_cgi_vars() methods and the
wsgi_file_wrapper attribute. It also inserts a SERVER_SOFTWARE key
if not present, as long as the origin_server attribute is a true value
and the server_software attribute is set.
Methods and attributes for customizing exception handling:
Log the exc_info tuple in the server log. exc_info is a (type,value,traceback) tuple. The default implementation simply writes the traceback to
the request’s wsgi.errors stream and flushes it. Subclasses can override
this method to change the format or retarget the output, mail the traceback to
an administrator, or whatever other action may be deemed suitable.
This method is a WSGI application to generate an error page for the user. It is
only invoked if an error occurs before headers are sent to the client.
This method can access the current error information using sys.exc_info(),
and should pass that information to start_response when calling it (as
described in the “Error Handling” section of PEP 3333).
The default implementation just uses the error_status,
error_headers, and error_body attributes to generate an output
page. Subclasses can override this to produce more dynamic error output.
Note, however, that it’s not recommended from a security perspective to spit out
diagnostics to any old user; ideally, you should have to do something special to
enable diagnostic output, which is why the default implementation doesn’t
include any.
The HTTP headers used for error responses. This should be a list of WSGI
response headers ((name,value) tuples), as described in PEP 3333. The
default list just sets the content type to text/plain.
The error response body. This should be an HTTP response body bytestring. It
defaults to the plain text, “A server error occurred. Please contact the
administrator.”
Methods and attributes for PEP 3333‘s “Optional Platform-Specific File
Handling” feature:
Override to implement platform-specific file transmission. This method is
called only if the application’s return value is an instance of the class
specified by the wsgi_file_wrapper attribute. It should return a true
value if it was able to successfully transmit the file, so that the default
transmission code will not be executed. The default implementation of this
method just returns a false value.
This attribute should be set to a true value if the handler’s _write() and
_flush() are being used to communicate directly to the client, rather than
via a CGI-like gateway protocol that wants the HTTP status in a special
Status: header.
Transcode CGI variables from os.environ to PEP 3333 “bytes in unicode”
strings, returning a new dictionary. This function is used by
CGIHandler and IISCGIHandler in place of directly using
os.environ, which is not necessarily WSGI-compliant on all platforms
and web servers using Python 3 – specifically, ones where the OS’s
actual environment is Unicode (i.e. Windows), or ones where the environment
is bytes, but the system encoding used by Python to decode it is anything
other than ISO-8859-1 (e.g. Unix systems using UTF-8).
If you are implementing a CGI-based handler of your own, you probably want
to use this routine instead of just copying values out of os.environ
directly.
fromwsgiref.simple_serverimportmake_server# Every WSGI application must have an application object - a callable# object that accepts two arguments. For that purpose, we're going to# use a function (note that you're not limited to a function, you can# use a class for example). The first argument passed to the function# is a dictionary containing CGI-style envrironment variables and the# second variable is the callable object (see PEP 333).defhello_world_app(environ,start_response):status='200 OK'# HTTP Statusheaders=[('Content-type','text/plain; charset=utf-8')]# HTTP Headersstart_response(status,headers)# The returned object is going to be printedreturn[b"Hello World"]httpd=make_server('',8000,hello_world_app)print("Serving on port 8000...")# Serve until process is killedhttpd.serve_forever()
The urllib.request module defines functions and classes which help in
opening URLs (mostly HTTP) in a complex world — basic and digest
authentication, redirections, cookies and more.
The urllib.request module defines the following functions:
Open the URL url, which can be either a string or a
Request object.
data may be a bytes object specifying additional data to send to the
server, or None if no such data is needed. data may also be an
iterable object and in that case Content-Length value must be specified in
the headers. Currently HTTP requests are the only ones that use data; the
HTTP request will be a POST instead of a GET when the data parameter is
provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The
urllib.parse.urlencode() function takes a mapping or sequence of
2-tuples and returns a string in this format. urllib.request module uses
HTTP/1.1 and includes Connection:close header in its HTTP requests.
The optional timeout parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified,
the global default timeout setting will be used). This actually
only works for HTTP, HTTPS and FTP connections.
The optional cafile and capath parameters specify a set of trusted
CA certificates for HTTPS requests. cafile should point to a single
file containing a bundle of CA certificates, whereas capath should
point to a directory of hashed certificate files. More information can
be found in ssl.SSLContext.load_verify_locations().
Warning
If neither cafile nor capath is specified, an HTTPS request
will not do any verification of the server’s certificate.
This function returns a file-like object with two additional methods from
the urllib.response module
geturl() — return the URL of the resource retrieved,
commonly used to determine if a redirect was followed
Note that None may be returned if no handler handles the request (though
the default installed global OpenerDirector uses
UnknownHandler to ensure this never happens).
In addition, default installed ProxyHandler makes sure the requests
are handled through the proxy when they are set.
The legacy urllib.urlopen function from Python 2.6 and earlier has been
discontinued; urlopen() corresponds to the old urllib2.urlopen.
Proxy handling, which was done by passing a dictionary parameter to
urllib.urlopen, can be obtained by using ProxyHandler objects.
Changed in version 3.2:
Changed in version 3.2: cafile and capath were added.
Changed in version 3.2:
Changed in version 3.2: HTTPS virtual hosts are now supported if possible (that is, if
ssl.HAS_SNI is true).
New in version 3.2:
New in version 3.2: data can be an iterable object.
Install an OpenerDirector instance as the default global opener.
Installing an opener is only necessary if you want urlopen to use that opener;
otherwise, simply call OpenerDirector.open() instead of urlopen().
The code does not check for a real OpenerDirector, and any class with
the appropriate interface will work.
Convert the pathname path from the local syntax for a path to the form used in
the path component of a URL. This does not produce a complete URL. The return
value will already be quoted using the quote() function.
Convert the path component path from a percent-encoded URL to the local syntax for a
path. This does not accept a complete URL. This function uses unquote()
to decode path.
This helper function returns a dictionary of scheme to proxy server URL
mappings. It scans the environment for variables named <scheme>_proxy
for all operating systems first, and when it cannot find it, looks for proxy
information from Mac OSX System Configuration for Mac OS X and Windows
Systems Registry for Windows.
The following classes are provided:
class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False)¶
This class is an abstraction of a URL request.
url should be a string containing a valid URL.
data may be a string specifying additional data to send to the
server, or None if no such data is needed. Currently HTTP
requests are the only ones that use data; the HTTP request will
be a POST instead of a GET when the data parameter is provided.
data should be a buffer in the standard
application/x-www-form-urlencoded format. The
urllib.parse.urlencode() function takes a mapping or sequence
of 2-tuples and returns a string in this format.
headers should be a dictionary, and will be treated as if
add_header() was called with each key and value as arguments.
This is often used to “spoof” the User-Agent header, which is
used by a browser to identify itself – some HTTP servers only
allow requests coming from common browsers as opposed to scripts.
For example, Mozilla Firefox may identify itself as "Mozilla/5.0(X11;U;Linuxi686)Gecko/20071127Firefox/2.0.0.11", while
urllib‘s default user agent string is
"Python-urllib/2.6" (on Python 2.6).
The final two arguments are only of interest for correct handling
of third-party HTTP cookies:
origin_req_host should be the request-host of the origin
transaction, as defined by RFC 2965. It defaults to
http.cookiejar.request_host(self). This is the host name or IP
address of the original request that was initiated by the user.
For example, if the request is for an image in an HTML document,
this should be the request-host of the request for the page
containing the image.
unverifiable should indicate whether the request is unverifiable,
as defined by RFC 2965. It defaults to False. An unverifiable
request is one whose URL the user did not have the option to
approve. For example, if the request is for an image in an HTML
document, and the user had no option to approve the automatic
fetching of the image, this should be true.
Cause requests to go through a proxy. If proxies is given, it must be a
dictionary mapping protocol names to URLs of proxies. The default is to read the
list of proxies from the environment variables <protocol>_proxy.
If no proxy environment variables are set, in a Windows environment, proxy
settings are obtained from the registry’s Internet Settings section and in a
Mac OS X environment, proxy information is retrieved from the OS X System
Configuration Framework.
To disable autodetected proxy pass an empty dictionary.
Keep a database of (realm,uri)->(user,password) mappings.
class urllib.request.HTTPPasswordMgrWithDefaultRealm¶
Keep a database of (realm,uri)->(user,password) mappings. A realm of
None is considered a catch-all realm, which is searched if no other realm
fits.
class urllib.request.AbstractBasicAuthHandler(password_mgr=None)¶
This is a mixin class that helps with HTTP authentication, both to the remote
host and to a proxy. password_mgr, if given, should be something that is
compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
class urllib.request.HTTPBasicAuthHandler(password_mgr=None)¶
Handle authentication with the remote host. password_mgr, if given, should be
something that is compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
class urllib.request.ProxyBasicAuthHandler(password_mgr=None)¶
Handle authentication with the proxy. password_mgr, if given, should be
something that is compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
class urllib.request.AbstractDigestAuthHandler(password_mgr=None)¶
This is a mixin class that helps with HTTP authentication, both to the remote
host and to a proxy. password_mgr, if given, should be something that is
compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
class urllib.request.HTTPDigestAuthHandler(password_mgr=None)¶
Handle authentication with the remote host. password_mgr, if given, should be
something that is compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
class urllib.request.ProxyDigestAuthHandler(password_mgr=None)¶
Handle authentication with the proxy. password_mgr, if given, should be
something that is compatible with HTTPPasswordMgr; refer to section
HTTPPasswordMgr Objects for information on the interface that must be
supported.
The following methods describe Request‘s public interface,
and so all may be overridden in subclasses. It also defines several
public attributes that can be used by clients to inspect the parsed
request.
Set the Request data to data. This is ignored by all handlers except
HTTP handlers — and there it should be a byte string, and will change the
request to be POST rather than GET.
Add another header to the request. Headers are currently ignored by all
handlers except HTTP handlers, where they are added to the list of headers sent
to the server. Note that there cannot be more than one header with the same
name, and later calls will overwrite previous calls in case the key collides.
Currently, this is no loss of HTTP functionality, since all headers which have
meaning when used more than once have a (header-specific) way of gaining the
same functionality using only one header.
Prepare the request by connecting to a proxy server. The host and type will
replace those of the instance, and the instance’s selector will be the original
URL given in the constructor.
handler should be an instance of BaseHandler. The following methods
are searched, and added to the possible chains (note that HTTP errors are a
special case).
protocol_open() — signal that the handler knows how to open protocol
URLs.
http_error_type() — signal that the handler knows how to handle HTTP
errors with HTTP error code type.
protocol_error() — signal that the handler knows how to handle errors
from (non-http) protocol.
protocol_request() — signal that the handler knows how to pre-process
protocol requests.
protocol_response() — signal that the handler knows how to
post-process protocol responses.
Open the given url (which can be a request object or a string), optionally
passing the given data. Arguments, return values and exceptions raised are
the same as those of urlopen() (which simply calls the open()
method on the currently installed global OpenerDirector). The
optional timeout parameter specifies a timeout in seconds for blocking
operations like the connection attempt (if not specified, the global default
timeout setting will be used). The timeout feature actually works only for
HTTP, HTTPS and FTP connections).
Handle an error of the given protocol. This will call the registered error
handlers for the given protocol with the given arguments (which are protocol
specific). The HTTP protocol is a special case which uses the HTTP response
code to determine the specific error handler; refer to the http_error_*()
methods of the handler classes.
Return values and exceptions raised are the same as those of urlopen().
OpenerDirector objects open URLs in three stages:
The order in which these methods are called within each stage is determined by
sorting the handler instances.
Every handler with a method named like protocol_request() has that
method called to pre-process the request.
Handlers with a method named like protocol_open() are called to handle
the request. This stage ends when a handler either returns a non-None
value (ie. a response), or raises an exception (usually URLError).
Exceptions are allowed to propagate.
In fact, the above algorithm is first tried for methods named
default_open(). If all such methods return None, the algorithm
is repeated for methods named like protocol_open(). If all such methods
return None, the algorithm is repeated for methods named
unknown_open().
Note that the implementation of these methods may involve calls of the parent
OpenerDirector instance’s open() and
error() methods.
Every handler with a method named like protocol_response() has that
method called to post-process the response.
BaseHandler objects provide a couple of methods that are directly
useful, and others that are meant to be used by derived classes. These are
intended for direct use:
The following attribute and methods should only be used by classes derived from
BaseHandler.
Note
The convention has been adopted that subclasses defining
protocol_request() or protocol_response() methods are named
*Processor; all others are named *Handler.
This method is not defined in BaseHandler, but subclasses should
define it if they want to catch all URLs.
This method, if implemented, will be called by the parent
OpenerDirector. It should return a file-like object as described in
the return value of the open() of OpenerDirector, or None.
It should raise URLError, unless a truly exceptional thing happens (for
example, MemoryError should not be mapped to URLError).
This method will be called before any protocol-specific open method.
BaseHandler.protocol_open(req)
This method is not defined in BaseHandler, but subclasses should
define it if they want to handle URLs with the given protocol.
This method, if defined, will be called by the parent OpenerDirector.
Return values should be the same as for default_open().
This method is not defined in BaseHandler, but subclasses should
define it if they want to catch all URLs with no specific registered handler to
open it.
This method is not defined in BaseHandler, but subclasses should
override it if they intend to provide a catch-all for otherwise unhandled HTTP
errors. It will be called automatically by the OpenerDirector getting
the error, and should not normally be called in other circumstances.
req will be a Request object, fp will be a file-like object with
the HTTP error body, code will be the three-digit code of the error, msg
will be the user-visible explanation of the code and hdrs will be a mapping
object with the headers of the error.
Return values and exceptions raised should be the same as those of
urlopen().
nnn should be a three-digit HTTP error code. This method is also not defined
in BaseHandler, but will be called, if it exists, on an instance of a
subclass, when an HTTP error with code nnn occurs.
Subclasses should override this method to handle specific HTTP errors.
Arguments, return values and exceptions raised should be the same as for
http_error_default().
BaseHandler.protocol_request(req)
This method is not defined in BaseHandler, but subclasses should
define it if they want to pre-process requests of the given protocol.
This method, if defined, will be called by the parent OpenerDirector.
req will be a Request object. The return value should be a
Request object.
BaseHandler.protocol_response(req, response)
This method is not defined in BaseHandler, but subclasses should
define it if they want to post-process responses of the given protocol.
This method, if defined, will be called by the parent OpenerDirector.
req will be a Request object. response will be an object
implementing the same interface as the return value of urlopen(). The
return value should implement the same interface as the return value of
urlopen().
Some HTTP redirections require action from this module’s client code. If this
is the case, HTTPError is raised. See RFC 2616 for details of the
precise meanings of the various redirection codes.
An HTTPError exception raised as a security consideration if the
HTTPRedirectHandler is presented with a redirected url which is not an HTTP,
HTTPS or FTP url.
Return a Request or None in response to a redirect. This is called
by the default implementations of the http_error_30*() methods when a
redirection is received from the server. If a redirection should take place,
return a new Request to allow http_error_30*() to perform the
redirect to newurl. Otherwise, raise HTTPError if no other handler
should try to handle this URL, or return None if you can’t but another
handler might.
Note
The default implementation of this method does not strictly follow RFC 2616,
which says that 301 and 302 responses to POST requests must not be
automatically redirected without confirmation by the user. In reality, browsers
do allow automatic redirection of these responses, changing the POST to a
GET, and the default implementation reproduces this behavior.
The ProxyHandler will have a method protocol_open() for every
protocol which has a proxy in the proxies dictionary given in the
constructor. The method will modify requests to go through the proxy, by
calling request.set_proxy(), and call the next handler in the chain to
actually execute the protocol.
uri can be either a single URI, or a sequence of URIs. realm, user and
passwd must be strings. This causes (user,passwd) to be used as
authentication tokens when authentication for realm and a super-URI of any of
the given URIs is given.
Handle an authentication request by getting a user/password pair, and re-trying
the request. authreq should be the name of the header where the information
about the realm is included in the request, host specifies the URL and path to
authenticate for, req should be the (failed) Request object, and
headers should be the error headers.
host is either an authority (e.g. "python.org") or a URL containing an
authority component (e.g. "http://python.org/"). In either case, the
authority must not contain a userinfo component (so, "python.org" and
"python.org:80" are fine, "joe:password@python.org" is not).
authreq should be the name of the header where the information about the realm
is included in the request, host should be the host to authenticate to, req
should be the (failed) Request object, and headers should be the
error headers.
For 200 error codes, the response object is returned immediately.
For non-200 error codes, this simply passes the job on to the
protocol_error_code() handler methods, via OpenerDirector.error().
Eventually, HTTPDefaultErrorHandler will raise an
HTTPError if no other handler handles the error.
This example gets the python.org main page and displays the first 300 bytes of
it.
>>> importurllib.request>>> f=urllib.request.urlopen('http://www.python.org/')>>> print(f.read(300))b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<htmlxmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n<title>Python Programming '
Note that urlopen returns a bytes object. This is because there is no way
for urlopen to automatically determine the encoding of the byte stream
it receives from the http server. In general, a program will decode
the returned bytes object to string once it determines or guesses
the appropriate encoding.
The following W3C document, http://www.w3.org/International/O-charset , lists
the various ways in which a (X)HTML or a XML document could have specified its
encoding information.
As python.org website uses utf-8 encoding as specified in it’s meta tag, we
will use same for decoding the bytes object.
>>> importurllib.request>>> f=urllib.request.urlopen('http://www.python.org/')>>> print(f.read(100).decode('utf-8'))<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtm
In the following example, we are sending a data-stream to the stdin of a CGI
and reading the data it returns to us. Note that this example will only work
when the Python installation supports SSL.
>>> importurllib.request>>> req=urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',... data=b'This data is passed to stdin of the CGI')>>> f=urllib.request.urlopen(req)>>> print(f.read().decode('utf-8'))Got Data: "This data is passed to stdin of the CGI"
The code for the sample CGI used in the above example is:
importurllib.request# Create an OpenerDirector with support for Basic HTTP Authentication...auth_handler=urllib.request.HTTPBasicAuthHandler()auth_handler.add_password(realm='PDQ Application',uri='https://mahler:8092/site-updates.py',user='klem',passwd='kadidd!ehopper')opener=urllib.request.build_opener(auth_handler)# ...and install it globally so it can be used with urlopen.urllib.request.install_opener(opener)urllib.request.urlopen('http://www.example.com/login.html')
build_opener() provides many handlers by default, including a
ProxyHandler. By default, ProxyHandler uses the environment
variables named <scheme>_proxy, where <scheme> is the URL scheme
involved. For example, the http_proxy environment variable is read to
obtain the HTTP proxy’s URL.
This example replaces the default ProxyHandler with one that uses
programmatically-supplied proxy URLs, and adds proxy authorization support with
ProxyBasicAuthHandler.
proxy_handler=urllib.request.ProxyHandler({'http':'http://www.example.com:3128/'})proxy_auth_handler=urllib.request.ProxyBasicAuthHandler()proxy_auth_handler.add_password('realm','host','username','password')opener=urllib.request.build_opener(proxy_handler,proxy_auth_handler)# This time, rather than install the OpenerDirector, we use it directly:opener.open('http://www.example.com/login.html')
Adding HTTP headers:
Use the headers argument to the Request constructor, or:
Also, remember that a few standard headers (Content-Length,
Content-Type and Host) are added when the
Request is passed to urlopen() (or OpenerDirector.open()).
Here is an example session that uses the GET method to retrieve a URL
containing parameters:
The following functions and classes are ported from the Python 2 module
urllib (as opposed to urllib2). They might become deprecated at
some point in the future.
Copy a network object denoted by a URL to a local file, if necessary. If the URL
points to a local file, or a valid cached copy of the object exists, the object
is not copied. Return a tuple (filename,headers) where filename is the
local file name under which the object can be found, and headers is whatever
the info() method of the object returned by urlopen() returned (for
a remote object, possibly cached). Exceptions are the same as for
urlopen().
The second argument, if present, specifies the file location to copy to (if
absent, the location will be a tempfile with a generated name). The third
argument, if present, is a hook function that will be called once on
establishment of the network connection and once after each block read
thereafter. The hook will be passed three arguments; a count of blocks
transferred so far, a block size in bytes, and the total size of the file. The
third argument may be -1 on older FTP servers which do not return a file
size in response to a retrieval request.
If the url uses the http: scheme identifier, the optional data
argument may be given to specify a POST request (normally the request type
is GET). The data argument must in standard
application/x-www-form-urlencoded format; see the urlencode()
function below.
urlretrieve() will raise ContentTooShortError when it detects that
the amount of data available was less than the expected amount (which is the
size reported by a Content-Length header). This can occur, for example, when
the download is interrupted.
The Content-Length is treated as a lower bound: if there’s more data to read,
urlretrieve() reads more data, but if less data is available, it raises
the exception.
You can still retrieve the downloaded data in this case, it is stored in the
content attribute of the exception instance.
If no Content-Length header was supplied, urlretrieve() can not check
the size of the data it has downloaded, and just returns it. In this case
you just have to assume that the download was successful.
Clear the cache that may have been built up by previous calls to
urlretrieve().
class urllib.request.URLopener(proxies=None, **x509)¶
Base class for opening and reading URLs. Unless you need to support opening
objects using schemes other than http:, ftp:, or file:,
you probably want to use FancyURLopener.
By default, the URLopener class sends a User-Agent header
of urllib/VVV, where VVV is the urllib version number.
Applications can define their own User-Agent header by subclassing
URLopener or FancyURLopener and setting the class attribute
version to an appropriate string value in the subclass definition.
The optional proxies parameter should be a dictionary mapping scheme names to
proxy URLs, where an empty dictionary turns proxies off completely. Its default
value is None, in which case environmental proxy settings will be used if
present, as discussed in the definition of urlopen(), above.
Additional keyword parameters, collected in x509, may be used for
authentication of the client when using the https: scheme. The keywords
key_file and cert_file are supported to provide an SSL key and certificate;
both are needed to support client authentication.
URLopener objects will raise an IOError exception if the server
returns an error code.
Open fullurl using the appropriate protocol. This method sets up cache and
proxy information, then calls the appropriate open method with its input
arguments. If the scheme is not recognized, open_unknown() is called.
The data argument has the same meaning as the data argument of
urlopen().
Retrieves the contents of url and places it in filename. The return value
is a tuple consisting of a local filename and either a
email.message.Message object containing the response headers (for remote
URLs) or None (for local URLs). The caller must then open and read the
contents of filename. If filename is not given and the URL refers to a
local file, the input filename is returned. If the URL is non-local and
filename is not given, the filename is the output of tempfile.mktemp()
with a suffix that matches the suffix of the last path component of the input
URL. If reporthook is given, it must be a function accepting three numeric
parameters. It will be called after each chunk of data is read from the
network. reporthook is ignored for local URLs.
If the url uses the http: scheme identifier, the optional data
argument may be given to specify a POST request (normally the request type
is GET). The data argument must in standard
application/x-www-form-urlencoded format; see the urlencode()
function below.
Variable that specifies the user agent of the opener object. To get
urllib to tell servers that it is a particular user agent, set this in a
subclass as a class variable or in the constructor before calling the base
constructor.
FancyURLopener subclasses URLopener providing default handling
for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
response codes listed above, the Location header is used to fetch
the actual URL. For 401 response codes (authentication required), basic HTTP
authentication is performed. For the 30x response codes, recursion is bounded
by the value of the maxtries attribute, which defaults to 10.
For all other response codes, the method http_error_default() is called
which you can override in subclasses to handle the error appropriately.
Note
According to the letter of RFC 2616, 301 and 302 responses to POST requests
must not be automatically redirected without confirmation by the user. In
reality, browsers do allow automatic redirection of these responses, changing
the POST to a GET, and urllib reproduces this behaviour.
The parameters to the constructor are the same as those for URLopener.
Note
When performing basic authentication, a FancyURLopener instance calls
its prompt_user_passwd() method. The default implementation asks the
users for the required information on the controlling terminal. A subclass may
override this method to support more appropriate behavior if needed.
The FancyURLopener class offers one additional method that should be
overloaded to provide the appropriate behavior:
Return information needed to authenticate the user at the given host in the
specified security realm. The return value should be a tuple, (user,password), which can be used for basic authentication.
The implementation prompts for this information on the terminal; an application
should override this method to use an appropriate interaction model in the local
environment.
Currently, only the following protocols are supported: HTTP, (versions 0.9 and
1.0), FTP, and local files.
The caching feature of urlretrieve() has been disabled until I find the
time to hack proper processing of Expiration time headers.
There should be a function to query whether a particular URL is in the cache.
For backward compatibility, if a URL appears to point to a local file but the
file can’t be opened, the URL is re-interpreted using the FTP protocol. This
can sometimes cause confusing error messages.
The urlopen() and urlretrieve() functions can cause arbitrarily
long delays while waiting for a network connection to be set up. This means
that it is difficult to build an interactive Web client using these functions
without using threads.
The data returned by urlopen() or urlretrieve() is the raw data
returned by the server. This may be binary data (such as an image), plain text
or (for example) HTML. The HTTP protocol provides type information in the reply
header, which can be inspected by looking at the Content-Type
header. If the returned data is HTML, you can use the module
html.parser to parse it.
The code handling the FTP protocol cannot differentiate between a file and a
directory. This can lead to unexpected behavior when attempting to read a URL
that points to a file that is not accessible. If the URL ends in a /, it is
assumed to refer to a directory and will be handled accordingly. But if an
attempt to read a file leads to a 550 error (meaning the URL cannot be found or
is not accessible, often for permission reasons), then the path is treated as a
directory in order to handle the case when a directory is specified by a URL but
the trailing / has been left off. This can cause misleading results when
you try to fetch a file whose read permissions make it inaccessible; the FTP
code will try to read it, fail with a 550 error, and then perform a directory
listing for the unreadable file. If fine-grained control is needed, consider
using the ftplib module, subclassing FancyURLopener, or changing
_urlopener to meet your needs.
The urllib.response module defines functions and classes which define a
minimal file like interface, including read() and readline(). The
typical response object is an addinfourl instance, which defines an info()
method and that returns headers and a geturl() method that returns the url.
Functions defined by this module are used internally by the
urllib.request module.
This module defines a standard interface to break Uniform Resource Locator (URL)
strings up in components (addressing scheme, network location, path etc.), to
combine the components back into a URL string, and to convert a “relative URL”
to an absolute URL given a “base URL.”
The module has been designed to match the Internet RFC on Relative Uniform
Resource Locators (and discovered a bug in an earlier draft!). It supports the
following URL schemes: file, ftp, gopher, hdl, http,
https, imap, mailto, mms, news, nntp, prospero,
rsync, rtsp, rtspu, sftp, shttp, sip, sips,
snews, svn, svn+ssh, telnet, wais.
The urllib.parse module defines functions that fall into two broad
categories: URL parsing and URL quoting. These are covered in detail in
the following sections.
Parse a URL into six components, returning a 6-tuple. This corresponds to the
general structure of a URL: scheme://netloc/path;parameters?query#fragment.
Each tuple item is a string, possibly empty. The components are not broken up in
smaller parts (for example, the network location is a single string), and %
escapes are not expanded. The delimiters as shown above are not part of the
result, except for a leading slash in the path component, which is retained if
present. For example:
Following the syntax specifications in RFC 1808, urlparse recognizes
a netloc only if it is properly introduced by ‘//’. Otherwise the
input is presumed to be a relative URL and thus to start with
a path component.
If the scheme argument is specified, it gives the default addressing
scheme, to be used only if the URL does not specify one. The default value for
this argument is the empty string.
If the allow_fragments argument is false, fragment identifiers are not
allowed, even if the URL’s addressing scheme normally does support them. The
default value for this argument is True.
The return value is actually an instance of a subclass of tuple. This
class has the following additional read-only convenience attributes:
Parse a query string given as a string argument (data of type
application/x-www-form-urlencoded). Data are returned as a
dictionary. The dictionary keys are the unique query variable names and the
values are lists of values for each name.
The optional argument keep_blank_values is a flag indicating whether blank
values in percent-encoded queries should be treated as blank strings. A true value
indicates that blanks should be retained as blank strings. The default false
value indicates that blank values are to be ignored and treated as if they were
not included.
The optional argument strict_parsing is a flag indicating what to do with
parsing errors. If false (the default), errors are silently ignored. If true,
errors raise a ValueError exception.
The optional encoding and errors parameters specify how to decode
percent-encoded sequences into Unicode characters, as accepted by the
bytes.decode() method.
Parse a query string given as a string argument (data of type
application/x-www-form-urlencoded). Data are returned as a list of
name, value pairs.
The optional argument keep_blank_values is a flag indicating whether blank
values in percent-encoded queries should be treated as blank strings. A true value
indicates that blanks should be retained as blank strings. The default false
value indicates that blank values are to be ignored and treated as if they were
not included.
The optional argument strict_parsing is a flag indicating what to do with
parsing errors. If false (the default), errors are silently ignored. If true,
errors raise a ValueError exception.
The optional encoding and errors parameters specify how to decode
percent-encoded sequences into Unicode characters, as accepted by the
bytes.decode() method.
Construct a URL from a tuple as returned by urlparse(). The parts
argument can be any six-item iterable. This may result in a slightly
different, but equivalent URL, if the URL that was parsed originally had
unnecessary delimiters (for example, a ? with an empty query; the RFC
states that these are equivalent).
This is similar to urlparse(), but does not split the params from the URL.
This should generally be used instead of urlparse() if the more recent URL
syntax allowing parameters to be applied to each segment of the path portion
of the URL (see RFC 2396) is wanted. A separate function is needed to
separate the path segments and parameters. This function returns a 5-tuple:
(addressing scheme, network location, path, query, fragment identifier).
The return value is actually an instance of a subclass of tuple. This
class has the following additional read-only convenience attributes:
Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The parts argument can be any five-item
iterable. This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example, a ?
with an empty query; the RFC states that these are equivalent).
Construct a full (“absolute”) URL by combining a “base URL” (base) with
another URL (url). Informally, this uses components of the base URL, in
particular the addressing scheme, the network location and (part of) the
path, to provide missing components in the relative URL. For example:
If url contains a fragment identifier, return a modified version of url
with no fragment identifier, and the fragment identifier as a separate
string. If there is no fragment identifier in url, return url unmodified
and an empty string.
The return value is actually an instance of a subclass of tuple. This
class has the following additional read-only convenience attributes:
The URL parsing functions were originally designed to operate on character
strings only. In practice, it is useful to be able to manipulate properly
quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
URL parsing functions in this module all operate on bytes and
bytearray objects in addition to str objects.
If str data is passed in, the result will also contain only
str data. If bytes or bytearray data is
passed in, the result will contain only bytes data.
Attempting to mix str data with bytes or
bytearray in a single function call will result in a
TypeError being raised, while attempting to pass in non-ASCII
byte values will trigger UnicodeDecodeError.
To support easier conversion of result objects between str and
bytes, all return values from URL parsing functions provide
either an encode() method (when the result contains str
data) or a decode() method (when the result contains bytes
data). The signatures of these methods match those of the corresponding
str and bytes methods (except that the default encoding
is 'ascii' rather than 'utf-8'). Each produces a value of a
corresponding type that contains either bytes data (for
encode() methods) or str data (for
decode() methods).
Applications that need to operate on potentially improperly quoted URLs
that may contain non-ASCII data will need to do their own decoding from
bytes to characters before invoking the URL parsing methods.
The behaviour described in this section applies only to the URL parsing
functions. The URL quoting functions use their own rules when producing
or consuming byte sequences as detailed in the documentation of the
individual URL quoting functions.
Changed in version 3.2:
Changed in version 3.2: URL parsing functions now accept ASCII encoded byte sequences
The result objects from the urlparse(), urlsplit() and
urldefrag() functions are subclasses of the tuple type.
These subclasses add the attributes listed in the documentation for
those functions, the encoding and decoding support described in the
previous section, as well as an additional method:
Return the re-combined version of the original URL as a string. This may
differ from the original URL in that the scheme may be normalized to lower
case and empty components may be dropped. Specifically, empty parameters,
queries, and fragment identifiers will be removed.
For urldefrag() results, only empty fragment identifiers will be removed.
For urlsplit() and urlparse() results, all noted changes will be
made to the URL returned by this method.
The result of this method remains unchanged if passed back through the original
parsing function:
The URL quoting functions focus on taking program data and making it safe
for use as URL components by quoting special characters and appropriately
encoding non-ASCII text. They also support reversing these operations to
recreate the original data from the contents of a URL component if that
task isn’t already covered by the URL parsing functions above.
Replace special characters in string using the %xx escape. Letters,
digits, and the characters '_.-' are never quoted. By default, this
function is intended for quoting the path section of URL. The optional safe
parameter specifies additional ASCII characters that should not be quoted
— its default value is '/'.
The optional encoding and errors parameters specify how to deal with
non-ASCII characters, as accepted by the str.encode() method.
encoding defaults to 'utf-8'.
errors defaults to 'strict', meaning unsupported characters raise a
UnicodeEncodeError.
encoding and errors must not be supplied if string is a
bytes, or a TypeError is raised.
Note that quote(string,safe,encoding,errors) is equivalent to
quote_from_bytes(string.encode(encoding,errors),safe).
Like quote(), but also replace spaces by plus signs, as required for
quoting HTML form values when building up a query string to go into a URL.
Plus signs in the original string are escaped unless they are included in
safe. It also does not have safe default to '/'.
Replace %xx escapes by their single-character equivalent.
The optional encoding and errors parameters specify how to decode
percent-encoded sequences into Unicode characters, as accepted by the
bytes.decode() method.
Convert a mapping object or a sequence of two-element tuples, which may
either be a str or a bytes, to a “percent-encoded”
string. The resultant string must be converted to bytes using the
user-specified encoding before it is sent to urlopen() as the optional
data argument.
The resulting string is a series of key=value pairs separated by '&'
characters, where both key and value are quoted using quote_plus()
above. When a sequence of two-element tuples is used as the query
argument, the first element of each tuple is a key and the second is a
value. The value element in itself can be a sequence and in that case, if
the optional parameter doseq is evaluates to True, individual
key=value pairs separated by '&' are generated for each element of
the value sequence for the key. The order of parameters in the encoded
string will match the order of parameter tuples in the sequence.
When query parameter is a str, the safe, encoding and error
parameters are passed down to quote_plus() for encoding.
To reverse this encoding process, parse_qs() and parse_qsl() are
provided in this module to parse query strings into Python data structures.
Refer to urllib examples to find out how urlencode
method can be used for generating query string for a URL or data for POST.
Changed in version 3.2:
Changed in version 3.2: Query parameter supports bytes and string objects.
This is the current standard (STD66). Any changes to urllib.parse module
should conform to this. Certain deviations could be observed, which are
mostly for backward compatibility purposes and for certain de-facto
parsing requirements as commonly observed in major browsers.
RFC 2732 - Format for Literal IPv6 Addresses in URL’s.
This specifies the parsing requirements of IPv6 URLs.
This Request For Comments includes the rules for joining an absolute and a
relative URL, including a fair number of “Abnormal Examples” which govern the
treatment of border cases.
Though being an exception (a subclass of URLError), an
HTTPError can also function as a non-exceptional file-like return
value (the same thing that urlopen() returns). This is useful when
handling exotic HTTP errors, such as requests for authentication.
This exception is raised when the urlretrieve() function detects that
the amount of the downloaded data is less than the expected amount (given by
the Content-Length header). The content attribute stores the
downloaded (and supposedly truncated) data.
This module provides a single class, RobotFileParser, which answers
questions about whether or not a particular user agent can fetch a URL on the
Web site that published the robots.txt file. For more details on the
structure of robots.txt files, see http://www.robotstxt.org/orig.html.
Returns the time the robots.txt file was last fetched. This is
useful for long-running web spiders that need to check for new
robots.txt files periodically.
This module defines classes which implement the client side of the HTTP and
HTTPS protocols. It is normally not used directly — the module
urllib.request uses it to handle URLs that use HTTP and HTTPS.
Note
HTTPS support is only available if Python was compiled with SSL support
(through the ssl module).
The module provides the following classes:
class http.client.HTTPConnection(host, port=None[, strict[, timeout[, source_address]]])¶
An HTTPConnection instance represents one transaction with an HTTP
server. It should be instantiated passing it a host and optional port
number. If no port number is passed, the port is extracted from the host
string if it has the form host:port, else the default HTTP port (80) is
used. If the optional timeout parameter is given, blocking
operations (like connection attempts) will timeout after that many seconds
(if it is not given, the global default timeout setting is used).
The optional source_address parameter may be a tuple of a (host, port)
to use as the source address the HTTP connection is made from.
For example, the following calls all create instances that connect to the server
at the same host and port:
A subclass of HTTPConnection that uses SSL for communication with
secure servers. Default port is 443. If context is specified, it
must be a ssl.SSLContext instance describing the various SSL
options. If context is specified and has a verify_mode
of either CERT_OPTIONAL or CERT_REQUIRED, then
by default host is matched against the host name(s) allowed by the
server’s certificate. If you want to change that behaviour, you can
explicitly set check_hostname to False.
If you access arbitrary hosts on the Internet, it is recommended to
require certificate checking and feed the context with a set of
trusted CA certificates:
This will send a request to the server using the HTTP request
method method and the selector url. If the body argument is
present, it should be string or bytes object of data to send after
the headers are finished. Strings are encoded as ISO-8859-1, the
default charset for HTTP. To use other encodings, pass a bytes
object. The Content-Length header is set to the length of the
string.
The body may also be an open file object, in which case the
contents of the file is sent; this file object should support fileno()
and read() methods. The header Content-Length is automatically set to
the length of the file as reported by stat. The body argument may also be
an iterable and Content-Length header should be explicitly provided when the
body is an iterable.
The headers argument should be a mapping of extra HTTP
headers to send with the request.
Set the debugging level. The default debug level is 0, meaning no
debugging output is printed. Any value greater than 0 will cause all
currently defined debug output to be printed to stdout. The debuglevel
is passed to any new HTTPResponse objects that are created.
This should be the first call after the connection to the server has been made.
It sends a line to the server consisting of the request string, the selector
string, and the HTTP version (HTTP/1.1). To disable automatic sending of
Host: or Accept-Encoding: headers (for example to accept additional
content encodings), specify skip_host or skip_accept_encoding with non-False
values.
Send an RFC 822-style header to the server. It sends a line to the server
consisting of the header, a colon and a space, and the first argument. If more
arguments are given, continuation lines are sent, each consisting of a tab and
an argument.
An HTTPResponse instance wraps the HTTP response from the
server. It provides access to the request headers and the entity
body. The response is an iterable object and can be used in a with
statement.
Return the value of the header name, or default if there is no header
matching name. If there is more than one header with the name name,
return all of the values joined by ‘, ‘. If ‘default’ is any iterable other
than a single string, its elements are similarly returned joined by commas.
Here is an example session that uses the GET method:
>>> importhttp.client>>> conn=http.client.HTTPConnection("www.python.org")>>> conn.request("GET","/index.html")>>> r1=conn.getresponse()>>> print(r1.status,r1.reason)200 OK>>> data1=r1.read()# This will return entire content.>>> # The following example demonstrates reading data in chunks.>>> conn.request("GET","/index.html")>>> r1=conn.getresponse()>>> whilenotr1.closed:... print(r1.read(200))# 200 bytesb'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"......>>> # Example of an invalid request>>> conn.request("GET","/parrot.spam")>>> r2=conn.getresponse()>>> print(r2.status,r2.reason)404 Not Found>>> data2=r2.read()>>> conn.close()
Here is an example session that uses the HEAD method. Note that the
HEAD method never returns any data.
This module defines the class FTP and a few related items. The
FTP class implements the client side of the FTP protocol. You can use
this to write Python programs that perform a variety of automated FTP jobs, such
as mirroring other ftp servers. It is also used by the module
urllib.request to handle URLs that use FTP. For more information on FTP
(File Transfer Protocol), see Internet RFC 959.
>>> fromftplibimportFTP>>> ftp=FTP('ftp.cwi.nl')# connect to host, default port>>> ftp.login()# user anonymous, passwd anonymous@>>> ftp.retrlines('LIST')# list directory contentstotal 24418drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 .dr-xr-srwt 105 ftp-usr pdmaint 1536 Mar 21 14:32 ..-rw-r--r-- 1 ftp-usr pdmaint 5305 Mar 20 09:48 INDEX . . .>>> ftp.retrbinary('RETR README',open('README','wb').write)'226 Transfer complete.'>>> ftp.quit()
The module defines the following items:
class ftplib.FTP(host='', user='', passwd='', acct=''[, timeout])¶
Return a new instance of the FTP class. When host is given, the
method call connect(host) is made. When user is given, additionally
the method call login(user,passwd,acct) is made (where passwd and
acct default to the empty string when not given). The optional timeout
parameter specifies a timeout in seconds for blocking operations like the
connection attempt (if is not specified, the global default timeout setting
will be used).
FTP class supports the with statement. Here is a sample
on how using it:
Changed in version 3.2: Support for the with statement was added.
class ftplib.FTP_TLS(host='', user='', passwd='', acct=''[, keyfile[, certfile[, context[, timeout]]]])¶
A FTP subclass which adds TLS support to FTP as described in
RFC 4217.
Connect as usual to port 21 implicitly securing the FTP control connection
before authenticating. Securing the data connection requires the user to
explicitly ask for it by calling the prot_p() method.
keyfile and certfile are optional – they can contain a PEM formatted
private key and certificate chain file name for the SSL connection.
context parameter is a ssl.SSLContext object which allows
bundling SSL configuration options, certificates and private keys into a
single (potentially long-lived) structure.
Exception raised when a reply is received from the server that does not fit
the response specifications of the File Transfer Protocol, i.e. begin with a
digit in the range 1–5.
The set of all exceptions (as a tuple) that methods of FTP
instances may raise as a result of problems with the FTP connection (as
opposed to programming errors made by the caller). This set includes the
four exceptions listed above as well as socket.error and
IOError.
Parser for the .netrc file format. The file .netrc is
typically used by FTP clients to load user authentication information
before prompting the user.
The file Tools/scripts/ftpmirror.py in the Python source distribution is
a script that can mirror FTP sites, or portions thereof, using the ftplib
module. It can be used as an extended example that applies this module.
Several methods are available in two flavors: one for handling text files and
another for binary files. These are named for the command which is used
followed by lines for the text version or binary for the binary version.
Set the instance’s debugging level. This controls the amount of debugging
output printed. The default, 0, produces no debugging output. A value of
1 produces a moderate amount of debugging output, generally a single line
per request. A value of 2 or higher produces the maximum amount of
debugging output, logging each line sent and received on the control connection.
Connect to the given host and port. The default port number is 21, as
specified by the FTP protocol specification. It is rarely needed to specify a
different port number. This function should be called only once for each
instance; it should not be called at all if a host was given when the instance
was created. All other methods can only be used after a connection has been
made.
The optional timeout parameter specifies a timeout in seconds for the
connection attempt. If no timeout is passed, the global default timeout
setting will be used.
Return the welcome message sent by the server in reply to the initial
connection. (This message sometimes contains disclaimers or help information
that may be relevant to the user.)
Log in as the given user. The passwd and acct parameters are optional and
default to the empty string. If no user is specified, it defaults to
'anonymous'. If user is 'anonymous', the default passwd is
'anonymous@'. This function should be called only once for each instance,
after a connection has been established; it should not be called at all if a
host and user were given when the instance was created. Most FTP commands are
only allowed after the client has logged in. The acct parameter supplies
“accounting information”; few systems implement this.
Send a simple command string to the server and handle the response. Return
nothing if a response code corresponding to success (codes in the range
200–299) is received. Raise error_reply otherwise.
Retrieve a file in binary transfer mode. cmd should be an appropriate
RETR command: 'RETRfilename'. The callback function is called for
each block of data received, with a single string argument giving the data
block. The optional blocksize argument specifies the maximum chunk size to
read on the low-level socket object created to do the actual transfer (which
will also be the largest size of the data blocks passed to callback). A
reasonable default is chosen. rest means the same thing as in the
transfercmd() method.
Retrieve a file or directory listing in ASCII transfer mode. cmd should be
an appropriate RETR command (see retrbinary()) or a command such as
LIST, NLST or MLSD (usually just the string 'LIST').
LIST retrieves a list of files and information about those files.
NLST retrieves a list of file names. On some servers, MLSD retrieves
a machine readable list of files and information about those files. The
callback function is called for each line with a string argument containing
the line with the trailing CRLF stripped. The default callback prints the
line to sys.stdout.
Store a file in binary transfer mode. cmd should be an appropriate
STOR command: "STORfilename". file is an open file object
which is read until EOF using its read() method in blocks of size
blocksize to provide the data to be stored. The blocksize argument
defaults to 8192. callback is an optional single parameter callable that
is called on each block of data after it is sent. rest means the same thing
as in the transfercmd() method.
Store a file in ASCII transfer mode. cmd should be an appropriate
STOR command (see storbinary()). Lines are read until EOF from the
open file objectfile using its readline() method to provide
the data to be stored. callback is an optional single parameter callable
that is called on each line after it is sent.
Initiate a transfer over the data connection. If the transfer is active, send a
EPRT or PORT command and the transfer command specified by cmd, and
accept the connection. If the server is passive, send a EPSV or PASV
command, connect to it, and start the transfer command. Either way, return the
socket for the connection.
If optional rest is given, a REST command is sent to the server, passing
rest as an argument. rest is usually a byte offset into the requested file,
telling the server to restart sending the file’s bytes at the requested offset,
skipping over the initial bytes. Note however that RFC 959 requires only that
rest be a string containing characters in the printable range from ASCII code
33 to ASCII code 126. The transfercmd() method, therefore, converts
rest to a string, but no check is performed on the string’s contents. If the
server does not recognize the REST command, an error_reply exception
will be raised. If this happens, simply call transfercmd() without a
rest argument.
Like transfercmd(), but returns a tuple of the data connection and the
expected size of the data. If the expected size could not be computed, None
will be returned as the expected size. cmd and rest means the same thing as
in transfercmd().
Return a list of file names as returned by the NLST command. The
optional argument is a directory to list (default is the current server
directory). Multiple arguments can be used to pass non-standard options to
the NLST command.
Produce a directory listing as returned by the LIST command, printing it to
standard output. The optional argument is a directory to list (default is the
current server directory). Multiple arguments can be used to pass non-standard
options to the LIST command. If the last argument is a function, it is used
as a callback function as for retrlines(); the default prints to
sys.stdout. This method returns None.
Remove the file named filename from the server. If successful, returns the
text of the response, otherwise raises error_perm on permission errors or
error_reply on other errors.
Request the size of the file named filename on the server. On success, the
size of the file is returned as an integer, otherwise None is returned.
Note that the SIZE command is not standardized, but is supported by many
common server implementations.
Send a QUIT command to the server and close the connection. This is the
“polite” way to close a connection, but it may raise an exception if the server
responds with an error to the QUIT command. This implies a call to the
close() method which renders the FTP instance useless for
subsequent calls (see below).
Close the connection unilaterally. This should not be applied to an already
closed connection such as after a successful call to quit(). After this
call the FTP instance should not be used any more (after a call to
close() or quit() you cannot reopen the connection by issuing
another login() method).
This module defines a class, POP3, which encapsulates a connection to a
POP3 server and implements the protocol as defined in RFC 1725. The
POP3 class supports both the minimal and optional command sets.
Additionally, this module provides a class POP3_SSL, which provides
support for connecting to POP3 servers that use SSL as an underlying protocol
layer.
Note that POP3, though widely supported, is obsolescent. The implementation
quality of POP3 servers varies widely, and too many are quite poor. If your
mailserver supports IMAP, you would be better off using the
imaplib.IMAP4 class, as IMAP servers tend to be better implemented.
class poplib.POP3(host, port=POP3_PORT[, timeout])¶
This class implements the actual POP3 protocol. The connection is created when
the instance is initialized. If port is omitted, the standard POP3 port (110)
is used. The optional timeout parameter specifies a timeout in seconds for the
connection attempt (if not specified, the global default timeout setting will
be used).
class poplib.POP3_SSL(host, port=POP3_SSL_PORT, keyfile=None, certfile=None, timeout=None, context=None)¶
This is a subclass of POP3 that connects to the server over an SSL
encrypted socket. If port is not specified, 995, the standard POP3-over-SSL
port is used. keyfile and certfile are also optional - they can contain a
PEM formatted private key and certificate chain file for the SSL connection.
timeout works as in the POP3 constructor. context parameter is a
ssl.SSLContext object which allows bundling SSL configuration
options, certificates and private keys into a single (potentially long-lived)
structure.
Changed in version 3.2:
Changed in version 3.2: context parameter added.
One exception is defined as an attribute of the poplib module:
Exception raised on any errors from this module (errors from socket
module are not caught). The reason for the exception is passed to the
constructor as a string.
The FAQ for the fetchmail POP/IMAP client collects information on
POP3 server variations and RFC noncompliance that may be useful if you need to
write an application based on the POP protocol.
Set the instance’s debugging level. This controls the amount of debugging
output printed. The default, 0, produces no debugging output. A value of
1 produces a moderate amount of debugging output, generally a single line
per request. A value of 2 or higher produces the maximum amount of
debugging output, logging each line sent and received on the control connection.
Flag message number which for deletion. On most servers deletions are not
actually performed until QUIT (the major exception is Eudora QPOP, which
deliberately violates the RFCs by doing pending deletes on any disconnect).
Retrieves the message header plus howmuch lines of the message after the
header of message number which. Result is in form (response,['line',...],octets).
The POP3 TOP command this method uses, unlike the RETR command, doesn’t set the
message’s seen flag; unfortunately, TOP is poorly specified in the RFCs and is
frequently broken in off-brand servers. Test this method by hand against the
POP3 servers you will use before trusting it.
Return message digest (unique id) list. If which is specified, result contains
the unique id for that message in the form 'responsemesgnumuid, otherwise
result is list (response,['mesgnumuid',...],octets).
Instances of POP3_SSL have no additional methods. The interface of this
subclass is identical to its parent.
This module defines three classes, IMAP4, IMAP4_SSL and
IMAP4_stream, which encapsulate a connection to an IMAP4 server and
implement a large subset of the IMAP4rev1 client protocol as defined in
RFC 2060. It is backward compatible with IMAP4 (RFC 1730) servers, but
note that the STATUS command is not supported in IMAP4.
Three classes are provided by the imaplib module, IMAP4 is the
base class:
This class implements the actual IMAP4 protocol. The connection is created and
protocol version (IMAP4 or IMAP4rev1) is determined when the instance is
initialized. If host is not specified, '' (the local host) is used. If
port is omitted, the standard IMAP4 port (143) is used.
Three exceptions are defined as attributes of the IMAP4 class:
IMAP4 server errors cause this exception to be raised. This is a sub-class of
IMAP4.error. Note that closing the instance and instantiating a new one
will usually allow recovery from this exception.
This exception is raised when a writable mailbox has its status changed by the
server. This is a sub-class of IMAP4.error. Some other client now has
write permission, and the mailbox will need to be re-opened to re-obtain write
permission.
There’s also a subclass for secure connections:
class imaplib.IMAP4_SSL(host='', port=IMAP4_SSL_PORT, keyfile=None, certfile=None)¶
This is a subclass derived from IMAP4 that connects over an SSL
encrypted socket (to use this class you need a socket module that was compiled
with SSL support). If host is not specified, '' (the local host) is used.
If port is omitted, the standard IMAP4-over-SSL port (993) is used. keyfile
and certfile are also optional - they can contain a PEM formatted private key
and certificate chain file for the SSL connection.
The second subclass allows for connections created by a child process:
Parse an IMAP4 INTERNALDATE string and return corresponding local
time. The return value is a time.struct_time tuple or
None if the string has wrong format.
Convert date_time to an IMAP4 INTERNALDATE representation. The
return value is a string in the form: "DD-Mmm-YYYYHH:MM:SS+HHMM" (including double-quotes). The date_time argument can be a
number (int or float) represening seconds since epoch (as returned
by time.time()), a 9-tuple representing local time (as returned by
time.localtime()), or a double-quoted string. In the last case, it
is assumed to already be in the correct format.
Note that IMAP4 message numbers change as the mailbox changes; in particular,
after an EXPUNGE command performs deletions the remaining messages are
renumbered. So it is highly advisable to use UIDs instead, with the UID command.
At the end of the module, there is a test section that contains a more extensive
example of usage.
See also
Documents describing the protocol, and sources and binaries for servers
implementing it, can all be found at the University of Washington’s IMAP
Information Center (http://www.washington.edu/imap/).
All IMAP4rev1 commands are represented by methods of the same name, either
upper-case or lower-case.
All arguments to commands are converted to strings, except for AUTHENTICATE,
and the last argument to APPEND which is passed as an IMAP4 literal. If
necessary (the string contains IMAP4 protocol-sensitive characters and isn’t
enclosed with either parentheses or double quotes) each string is quoted.
However, the password argument to the LOGIN command is always quoted. If
you want to avoid having an argument string quoted (eg: the flags argument to
STORE) then enclose the string in parentheses (eg: r'(\Deleted)').
Each command returns a tuple: (type,[data,...]) where type is usually
'OK' or 'NO', and data is either the text from the command response,
or mandated results from the command. Each data is either a string, or a
tuple. If a tuple, then the first part is the header of the response, and the
second part contains the data (ie: ‘literal’ value).
The message_set options to commands below is a string specifying one or more
messages to be acted upon. It may be a simple message number ('1'), a range
of message numbers ('2:4'), or a group of non-contiguous ranges separated by
commas ('1:3,6:9'). A range can contain an asterisk to indicate an infinite
upper bound ('3:*').
mechanism specifies which authentication mechanism is to be used - it should
appear in the instance variable capabilities in the form AUTH=mechanism.
authobject must be a callable object:
data=authobject(response)
It will be called to process server continuation responses. It should return
data that will be encoded and sent to server. It should return None if
the client abort response * should be sent instead.
Permanently remove deleted items from selected mailbox. Generates an EXPUNGE
response for each deleted message. Returned data contains a list of EXPUNGE
message numbers in order received.
Fetch (parts of) messages. message_parts should be a string of message part
names enclosed within parentheses, eg: "(UIDBODY[TEXT])". Returned data
are tuples of message part envelope and data.
List mailbox names in directory matching pattern. directory defaults to
the top-level mail folder, and pattern defaults to match anything. Returned
data contains a list of LIST responses.
Force use of CRAM-MD5 authentication when identifying the client to protect
the password. Will only work if the server CAPABILITY response includes the
phrase AUTH=CRAM-MD5.
List subscribed mailbox names in directory matching pattern. directory
defaults to the top level directory and pattern defaults to match any mailbox.
Returned data are tuples of message part envelope and data.
Opens socket to port at host. This method is implicitly called by
the IMAP4 constructor. The connection objects established by this
method will be used in the read, readline, send, and shutdown
methods. You may override this method.
Search mailbox for matching messages. charset may be None, in which case
no CHARSET will be specified in the request to the server. The IMAP
protocol requires that at least one criterion be specified; an exception will be
raised when the server returns an error.
Example:
# M is a connected IMAP4 instance...typ,msgnums=M.search(None,'FROM','"LDJ"')# or:typ,msgnums=M.search(None,'(FROM "LDJ")')
Select a mailbox. Returned data is the count of messages in mailbox
(EXISTS response). The default mailbox is 'INBOX'. If the readonly
flag is set, modifications to the mailbox are not allowed.
The sort command is a variant of search with sorting semantics for the
results. Returned data contains a space separated list of matching message
numbers.
Sort has two arguments before the search_criterion argument(s); a
parenthesized list of sort_criteria, and the searching charset. Note that
unlike search, the searching charset argument is mandatory. There is also
a uidsort command which corresponds to sort the way that uidsearch
corresponds to search. The sort command first searches the mailbox for
messages that match the given searching criteria using the charset argument for
the interpretation of strings in the searching criteria. It then returns the
numbers of matching messages.
Send a STARTTLS command. The ssl_context argument is optional
and should be a ssl.SSLContext object. This will enable
encryption on the IMAP connection.
Alters flag dispositions for messages in mailbox. command is specified by
section 6.4.6 of RFC 2060 as being one of “FLAGS”, “+FLAGS”, or “-FLAGS”,
optionally with a suffix of ”.SILENT”.
For example, to set the delete flag on all messages:
The thread command is a variant of search with threading semantics for
the results. Returned data contains a space separated list of thread members.
Thread members consist of zero or more messages numbers, delimited by spaces,
indicating successive parent and child.
Thread has two arguments before the search_criterion argument(s); a
threading_algorithm, and the searching charset. Note that unlike
search, the searching charset argument is mandatory. There is also a
uidthread command which corresponds to thread the way that uidsearch corresponds to search. The thread command first searches the
mailbox for messages that match the given searching criteria using the charset
argument for the interpretation of strings in the searching criteria. It then
returns the matching messages threaded according to the specified threading
algorithm.
Execute command args with messages identified by UID, rather than message
number. Returns response appropriate to command. At least one argument must be
supplied; if none are provided, the server will return an error and an exception
will be raised.
This module defines the class NNTP which implements the client side of
the Network News Transfer Protocol. It can be used to implement a news reader
or poster, or automated news processors. It is compatible with RFC 3977
as well as the older RFC 977 and RFC 2980.
Here are two small examples of how it can be used. To list some statistics
about a newsgroup and print the subjects of the last 10 articles:
class nntplib.NNTP(host, port=119, user=None, password=None, readermode=None, usenetrc=False[, timeout])¶
Return a new NNTP object, representing a connection
to the NNTP server running on host host, listening at port port.
An optional timeout can be specified for the socket connection.
If the optional user and password are provided, or if suitable
credentials are present in /.netrc and the optional flag usenetrc
is true, the AUTHINFOUSER and AUTHINFOPASS commands are used
to identify and authenticate the user to the server. If the optional
flag readermode is true, then a modereader command is sent before
authentication is performed. Reader mode is sometimes necessary if you are
connecting to an NNTP server on the local machine and intend to call
reader-specific commands, such as group. If you get unexpected
NNTPPermanentErrors, you might need to set readermode.
Changed in version 3.2:
Changed in version 3.2: usenetrc is now False by default.
class nntplib.NNTP_SSL(host, port=563, user=None, password=None, ssl_context=None, readermode=None, usenetrc=False[, timeout])¶
Return a new NNTP_SSL object, representing an encrypted
connection to the NNTP server running on host host, listening at
port port. NNTP_SSL objects have the same methods as
NNTP objects. If port is omitted, port 563 (NNTPS) is used.
ssl_context is also optional, and is a SSLContext object.
All other parameters behave the same as for NNTP.
Note that SSL-on-563 is discouraged per RFC 4642, in favor of
STARTTLS as described below. However, some servers only support the
former.
Derived from the standard exception Exception, this is the base
class for all exceptions raised by the nntplib module. Instances
of this class have the following attribute:
An integer representing the version of the NNTP protocol supported by the
server. In practice, this should be 2 for servers advertising
RFC 3977 compliance and 1 for others.
The response that is returned as the first item in the return tuple of almost
all methods is the server’s response: a string beginning with a three-digit
code. If the server’s response indicates an error, the method raises one of
the above exceptions.
Many of the following methods take an optional keyword-only argument file.
When the file argument is supplied, it must be either a file object
opened for binary writing, or the name of an on-disk file to be written to.
The method will then write any data returned by the server (except for the
response line and the terminating dot) to the file; any list of lines,
tuples or objects that the method normally returns will be empty.
Changed in version 3.2:
Changed in version 3.2: Many of the following methods have been reworked and fixed, which makes
them incompatible with their 3.1 counterparts.
Return the welcome message sent by the server in reply to the initial
connection. (This message sometimes contains disclaimers or help information
that may be relevant to the user.)
Return the RFC 3977 capabilities advertised by the server, as a
dict instance mapping capability names to (possibly empty) lists
of values. On legacy servers which don’t understand the CAPABILITIES
command, an empty dictionary is returned instead.
Send AUTHINFO commands with the user name and password. If user
and password are None and usenetrc is True, credentials from
~/.netrc will be used if possible.
Unless intentionally delayed, login is normally performed during the
NNTP object initialization and separately calling this function
is unnecessary. To force authentication to be delayed, you must not set
user or password when creating the object, and must set usenetrc to
False.
Send a STARTTLS command. The ssl_context argument is optional
and should be a ssl.SSLContext object. This will enable
encryption on the NNTP connection.
Note that this may not be done after authentication information has
been transmitted, and authentication occurs by default if possible during a
NNTP object initialization. See NNTP.login() for information
on suppressing this behavior.
Send a NEWGROUPS command. The date argument should be a
datetime.date or datetime.datetime object.
Return a pair (response,groups) where groups is a list representing
the groups that are new since the given date. If file is supplied,
though, then groups will be empty.
Send a NEWNEWS command. Here, group is a group name or '*', and
date has the same meaning as for newgroups(). Return a pair
(response,articles) where articles is a list of message ids.
This command is frequently disabled by NNTP server administrators.
Send a LIST or LISTACTIVE command. Return a pair
(response,list) where list is a list of tuples representing all
the groups available from this NNTP server, optionally matching the
pattern string group_pattern. Each tuple has the form
(group,last,first,flag), where group is a group name, last
and first are the last and first article numbers, and flag usually
takes one of these values:
y: Local postings and articles from peers are allowed.
m: The group is moderated and all postings must be approved.
n: No local postings are allowed, only articles from peers.
j: Articles from peers are filed in the junk group instead.
x: No local postings, and articles from peers are ignored.
=foo.bar: Articles are filed in the foo.bar group instead.
If flag has another value, then the status of the newsgroup should be
considered unknown.
This command can return very large results, especially if group_pattern
is not specified. It is best to cache the results offline unless you
really need to refresh them.
Send a LISTNEWSGROUPS command, where grouppattern is a wildmat string as
specified in RFC 3977 (it’s essentially the same as DOS or UNIX shell wildcard
strings). Return a pair (response,descriptions), where descriptions
is a dictionary mapping group names to textual descriptions.
>>> resp,descs=s.descriptions('gmane.comp.python.*')>>> len(descs)295>>> descs.popitem()('gmane.comp.python.bio.general', 'BioPython discussion list (Moderated)')
Get a description for a single group group. If more than one group matches
(if ‘group’ is a real wildmat string), return the first match. If no group
matches, return an empty string.
This elides the response code from the server. If the response code is needed,
use descriptions().
Send a GROUP command, where name is the group name. The group is
selected as the current group, if it exists. Return a tuple
(response,count,first,last,name) where count is the (estimated)
number of articles in the group, first is the first article number in
the group, last is the last article number in the group, and name
is the group name.
Send a OVER command, or a XOVER command on legacy servers.
message_spec can be either a string representing a message id, or
a (first,last) tuple of numbers indicating a range of articles in
the current group, or a (first,None) tuple indicating a range of
articles starting from first to the last article in the current group,
or None to select the current article in the current group.
Return a pair (response,overviews). overviews is a list of
(article_number,overview) tuples, one for each article selected
by message_spec. Each overview is a dictionary with the same number
of items, but this number depends on the server. These items are either
message headers (the key is then the lower-cased header name) or metadata
items (the key is then the metadata name prepended with ":"). The
following items are guaranteed to be present by the NNTP specification:
the subject, from, date, message-id and references
headers
the :bytes metadata: the number of bytes in the entire raw article
(including headers and body)
the :lines metadata: the number of lines in the article body
The value of each item is either a string, or None if not present.
It is advisable to use the decode_header() function on header
values when they may contain non-ASCII characters:
Send a STAT command, where message_spec is either a message id
(enclosed in '<' and '>') or an article number in the current group.
If message_spec is omitted or None, the current article in the
current group is considered. Return a triple (response,number,id)
where number is the article number and id is the message id.
Send an ARTICLE command, where message_spec has the same meaning as
for stat(). Return a tuple (response,info) where info
is a namedtuple with three attributes number,
message_id and lines (in that order). number is the article number
in the group (or 0 if the information is not available), message_id the
message id as a string, and lines a list of lines (without terminating
newlines) comprising the raw message including headers and body.
>>> resp,info=s.article('<20030112190404.GE29873@epoch.metaslash.com>')>>> info.number0>>> info.message_id'<20030112190404.GE29873@epoch.metaslash.com>'>>> len(info.lines)65>>> info.lines[0]b'Path: main.gmane.org!not-for-mail'>>> info.lines[1]b'From: Neal Norwitz <neal@metaslash.com>'>>> info.lines[-3:][b'There is a patch for 2.3 as well as 2.2.', b'', b'Neal']
Post an article using the POST command. The data argument is either
a file object opened for binary reading, or any iterable of bytes
objects (representing raw lines of the article to be posted). It should
represent a well-formed news article, including the required headers. The
post() method automatically escapes lines beginning with . and
appends the termination line.
If the method succeeds, the server’s response is returned. If the server
refuses posting, a NNTPReplyError is raised.
Send an IHAVE command. message_id is the id of the message to send
to the server (enclosed in '<' and '>'). The data parameter
and the return value are the same as for post().
Set the instance’s debugging level. This controls the amount of debugging
output printed. The default, 0, produces no debugging output. A value of
1 produces a moderate amount of debugging output, generally a single line
per request or response. A value of 2 or higher produces the maximum amount
of debugging output, logging each line sent and received on the connection
(including message text).
The following are optional NNTP extensions defined in RFC 2980. Some of
them have been superseded by newer commands in RFC 3977.
Send an XHDR command. The header argument is a header keyword, e.g.
'subject'. The string argument should have the form 'first-last'
where first and last are the first and last article numbers to search.
Return a pair (response,list), where list is a list of pairs (id,text), where id is an article number (as a string) and text is the text of
the requested header for that article. If the file parameter is supplied, then
the output of the XHDR command is stored in a file. If file is a string,
then the method will open a file with that name, write to it then close it.
If file is a file object, then it will start calling write() on
it to store the lines of the command output. If file is supplied, then the
returned list is an empty list.
Send an XOVER command. start and end are article numbers
delimiting the range of articles to select. The return value is the
same of for over(). It is recommended to use over()
instead, since it will automatically use the newer OVER command
if available.
Return a pair (resp,path), where path is the directory path to the
article with message ID id. Most of the time, this extension is not
enabled by NNTP server administrators.
Decode a header value, un-escaping any escaped non-ASCII characters.
header_str must be a str object. The unescaped value is
returned. Using this function is recommended to display some headers
in a human readable form:
>>> decode_header("Some subject")'Some subject'>>> decode_header("=?ISO-8859-15?Q?D=E9buter_en_Python?=")'Débuter en Python'>>> decode_header("Re: =?UTF-8?B?cHJvYmzDqG1lIGRlIG1hdHJpY2U=?=")'Re: problème de matrice'
The smtplib module defines an SMTP client session object that can be used
to send mail to any Internet machine with an SMTP or ESMTP listener daemon. For
details of SMTP and ESMTP operation, consult RFC 821 (Simple Mail Transfer
Protocol) and RFC 1869 (SMTP Service Extensions).
class smtplib.SMTP(host='', port=0, local_hostname=None[, timeout])¶
A SMTP instance encapsulates an SMTP connection. It has methods
that support a full repertoire of SMTP and ESMTP operations. If the optional
host and port parameters are given, the SMTP connect() method is called
with those parameters during initialization. An SMTPConnectError is
raised if the specified host doesn’t respond correctly. The optional
timeout parameter specifies a timeout in seconds for blocking operations
like the connection attempt (if not specified, the global default timeout
setting will be used).
For normal use, you should only require the initialization/connect,
sendmail(), and quit() methods. An example is included below.
class smtplib.SMTP_SSL(host='', port=0, local_hostname=None, keyfile=None, certfile=None[, timeout])¶
A SMTP_SSL instance behaves exactly the same as instances of
SMTP. SMTP_SSL should be used for situations where SSL is
required from the beginning of the connection and using starttls() is
not appropriate. If host is not specified, the local host is used. If
port is zero, the standard SMTP-over-SSL port (465) is used. keyfile
and certfile are also optional, and can contain a PEM formatted private key
and certificate chain file for the SSL connection. The optional timeout
parameter specifies a timeout in seconds for blocking operations like the
connection attempt (if not specified, the global default timeout setting
will be used).
class smtplib.LMTP(host='', port=LMTP_PORT, local_hostname=None)¶
The LMTP protocol, which is very similar to ESMTP, is heavily based on the
standard SMTP client. It’s common to use Unix sockets for LMTP, so our connect()
method must support that as well as a regular host:port server. To specify a
Unix socket, you must use an absolute path for host, starting with a ‘/’.
Authentication is supported, using the regular SMTP mechanism. When using a Unix
socket, LMTP generally don’t support or require any authentication, but your
mileage might vary.
A nice selection of exceptions is defined as well:
This exception is raised when the server unexpectedly disconnects, or when an
attempt is made to use the SMTP instance before connecting it to a
server.
Base class for all exceptions that include an SMTP error code. These exceptions
are generated in some instances when the SMTP server returns an error code. The
error code is stored in the smtp_code attribute of the error, and the
smtp_error attribute is set to the error message.
Sender address refused. In addition to the attributes set by on all
SMTPResponseException exceptions, this sets ‘sender’ to the string that
the SMTP server refused.
All recipient addresses refused. The errors for each recipient are accessible
through the attribute recipients, which is a dictionary of exactly the
same sort as SMTP.sendmail() returns.
Definition of the ESMTP extensions for SMTP. This describes a framework for
extending SMTP with new commands, supporting dynamic discovery of the commands
provided by the server, and defines a few additional commands.
Connect to a host on a given port. The defaults are to connect to the local
host at the standard SMTP port (25). If the hostname ends with a colon (':')
followed by a number, that suffix will be stripped off and the number
interpreted as the port number to use. This method is automatically invoked by
the constructor if a host is specified during instantiation.
Send a command cmd to the server. The optional argument args is simply
concatenated to the command, separated by a space.
This returns a 2-tuple composed of a numeric response code and the actual
response line (multiline responses are joined into one long line.)
In normal operation it should not be necessary to call this method explicitly.
It is used to implement other methods and may be useful for testing private
extensions.
If the connection to the server is lost while waiting for the reply,
SMTPServerDisconnected will be raised.
Identify yourself to the SMTP server using HELO. The hostname argument
defaults to the fully qualified domain name of the local host.
The message returned by the server is stored as the helo_resp attribute
of the object.
In normal operation it should not be necessary to call this method explicitly.
It will be implicitly called by the sendmail() when necessary.
Identify yourself to an ESMTP server using EHLO. The hostname argument
defaults to the fully qualified domain name of the local host. Examine the
response for ESMTP option and store them for use by has_extn().
Also sets several informational attributes: the message returned by
the server is stored as the ehlo_resp attribute, does_esmtp
is set to true or false depending on whether the server supports ESMTP, and
esmtp_features will be a dictionary containing the names of the
SMTP service extensions this server supports, and their
parameters (if any).
Unless you wish to use has_extn() before sending mail, it should not be
necessary to call this method explicitly. It will be implicitly called by
sendmail() when necessary.
Check the validity of an address on this server using SMTP VRFY. Returns a
tuple consisting of code 250 and a full RFC 822 address (including human
name) if the user address is valid. Otherwise returns an SMTP error code of 400
or greater and an error string.
Note
Many sites disable SMTP VRFY in order to foil spammers.
Log in on an SMTP server that requires authentication. The arguments are the
username and the password to authenticate with. If there has been no previous
EHLO or HELO command this session, this method tries ESMTP EHLO
first. This method will return normally if the authentication was successful, or
may raise the following exceptions:
Send mail. The required arguments are an RFC 822 from-address string, a list
of RFC 822 to-address strings (a bare string will be treated as a list with 1
address), and a message string. The caller may pass a list of ESMTP options
(such as 8bitmime) to be used in MAILFROM commands as mail_options.
ESMTP options (such as DSN commands) that should be used with all RCPT
commands can be passed as rcpt_options. (If you need to use different ESMTP
options to different recipients you have to use the low-level methods such as
mail(), rcpt() and data() to send the message.)
Note
The from_addr and to_addrs parameters are used to construct the message
envelope used by the transport agents. sendmail does not modify the
message headers in any way.
msg may be a string containing characters in the ASCII range, or a byte
string. A string is encoded to bytes using the ascii codec, and lone \r
and \n characters are converted to \r\n characters. A byte string
is not modified.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first. If the server does ESMTP, message size and
each of the specified options will be passed to it (if the option is in the
feature set the server advertises). If EHLO fails, HELO will be tried
and ESMTP options suppressed.
This method will return normally if the mail is accepted for at least one
recipient. Otherwise it will raise an exception. That is, if this method does
not raise an exception, then someone should get your mail. If this method does
not raise an exception, it returns a dictionary, with one entry for each
recipient that was refused. Each entry contains a tuple of the SMTP error code
and the accompanying error message sent by the server.
All recipients were refused. Nobody got the mail. The recipients
attribute of the exception object is a dictionary with information about the
refused recipients (like the one returned when at least one recipient was
accepted).
This is a convenience method for calling sendmail() with the message
represented by an email.message.Message object. The arguments have
the same meaning as for sendmail(), except that msg is a Message
object.
If from_addr is None or to_addrs is None, send_message fills
those arguments with addresses extracted from the headers of msg as
specified in RFC 2822: from_addr is set to the Sender
field if it is present, and otherwise to the From field.
to_adresses combines the values (if any) of the To,
Cc, and Bcc fields from msg. If exactly one
set of Resent-* headers appear in the message, the regular
headers are ignored and the Resent-* headers are used instead.
If the message contains more than one set of Resent-* headers,
a ValueError is raised, since there is no way to unambiguously detect
the most recent set of Resent- headers.
send_message serializes msg using
BytesGenerator with \r\n as the linesep, and
calls sendmail() to transmit the resulting message. Regardless of the
values of from_addr and to_addrs, send_message does not transmit any
Bcc or Resent-Bcc headers that may appear
in msg.
Terminate the SMTP session and close the connection. Return the result of
the SMTP QUIT command.
Low-level methods corresponding to the standard SMTP/ESMTP commands HELP,
RSET, NOOP, MAIL, RCPT, and DATA are also supported.
Normally these do not need to be called directly, so they are not documented
here. For details, consult the module code.
This example prompts the user for addresses needed in the message envelope (‘To’
and ‘From’ addresses), and the message to be delivered. Note that the headers
to be included with the message must be included in the message as entered; this
example doesn’t do any processing of the RFC 822 headers. In particular, the
‘To’ and ‘From’ addresses must be included in the message headers explicitly.
importsmtplibdefprompt(prompt):returninput(prompt).strip()fromaddr=prompt("From: ")toaddrs=prompt("To: ").split()print("Enter message, end with ^D (Unix) or ^Z (Windows):")# Add the From: and To: headers at the start!msg=("From: %s\r\nTo: %s\r\n\r\n"%(fromaddr,", ".join(toaddrs)))whileTrue:try:line=input()exceptEOFError:breakifnotline:breakmsg=msg+lineprint("Message length is",len(msg))server=smtplib.SMTP('localhost')server.set_debuglevel(1)server.sendmail(fromaddr,toaddrs,msg)server.quit()
Note
In general, you will want to use the email package’s features to
construct an email message, which you can then send
via send_message(); see email: Examples.
This module offers several classes to implement SMTP (email) servers.
Several server implementations are present; one is a generic
do-nothing implementation, which can be overridden, while the other two offer
specific mail-sending strategies.
Additionally the SMTPChannel may be extended to implement very specific
interaction behaviour with SMTP clients.
Create a new SMTPServer object, which binds to local address
localaddr. It will treat remoteaddr as an upstream SMTP relayer. It
inherits from asyncore.dispatcher, and so will insert itself into
asyncore‘s event loop on instantiation.
Raise NotImplementedError exception. Override this in subclasses to
do something useful with this message. Whatever was passed in the
constructor as remoteaddr will be available as the _remoteaddr
attribute. peer is the remote host’s address, mailfrom is the envelope
originator, rcpttos are the envelope recipients and data is a string
containing the contents of the e-mail (which should be in RFC 2822
format).
Create a new pure proxy server. Arguments are as per SMTPServer.
Everything will be relayed to remoteaddr. Note that running this has a good
chance to make you into an open relay, so please be careful.
Create a new pure proxy server. Arguments are as per SMTPServer.
Everything will be relayed to remoteaddr, unless local mailman configurations
knows about an address, in which case it will be handled via mailman. Note that
running this has a good chance to make you into an open relay, so please be
careful.
Holds the name of the client peer as returned by conn.getpeername()
where conn is conn.
The SMTPChannel operates by invoking methods named smtp_<command>
upon reception of a command line from the client. Built into the base
SMTPChannel class are methods for handling the following commands
(and responding to them appropriately):
Command
Action taken
HELO
Accepts the greeting from the client and stores it in
seen_greeting.
NOOP
Takes no action.
QUIT
Closes the connection cleanly.
MAIL
Accepts the “MAIL FROM:” syntax and stores the supplied address as
mailfrom.
RCPT
Accepts the “RCPT TO:” syntax and stores the supplied addresses in
the rcpttos list.
The telnetlib module provides a Telnet class that implements the
Telnet protocol. See RFC 854 for details about the protocol. In addition, it
provides symbolic constants for the protocol characters (see below), and for the
telnet options. The symbolic names of the telnet options follow the definitions
in arpa/telnet.h, with the leading TELOPT_ removed. For symbolic names
of options which are traditionally not included in arpa/telnet.h, see the
module source itself.
The symbolic constants for the telnet commands are: IAC, DONT, DO, WONT, WILL,
SE (Subnegotiation End), NOP (No Operation), DM (Data Mark), BRK (Break), IP
(Interrupt process), AO (Abort output), AYT (Are You There), EC (Erase
Character), EL (Erase Line), GA (Go Ahead), SB (Subnegotiation Begin).
class telnetlib.Telnet(host=None, port=0[, timeout])¶
Telnet represents a connection to a Telnet server. The instance is
initially not connected by default; the open() method must be used to
establish a connection. Alternatively, the host name and optional port
number can be passed to the constructor, to, in which case the connection to
the server will be established before the constructor returns. The optional
timeout parameter specifies a timeout in seconds for blocking operations
like the connection attempt (if not specified, the global default timeout
setting will be used).
Do not reopen an already connected instance.
This class has many read_*() methods. Note that some of them raise
EOFError when the end of the connection is read, because they can return
an empty string for other reasons. See the individual descriptions below.
Read until a given byte string, expected, is encountered or until timeout
seconds have passed.
When no match is found, return whatever is available instead, possibly empty
bytes. Raise EOFError if the connection is closed and no cooked data
is available.
Read everything that can be without blocking in I/O (eager).
Raise EOFError if connection closed and no cooked data available.
Return b'' if no cooked data available otherwise. Do not block unless in
the midst of an IAC sequence.
Raise EOFError if connection closed and no cooked data available.
Return b'' if no cooked data available otherwise. Do not block unless in
the midst of an IAC sequence.
Process and return data already in the queues (lazy).
Raise EOFError if connection closed and no data available. Return
b'' if no cooked data available otherwise. Do not block unless in the
midst of an IAC sequence.
Return the data collected between a SB/SE pair (suboption begin/end). The
callback should access these data when it was invoked with a SE command.
This method never blocks.
Connect to a host. The optional second argument is the port number, which
defaults to the standard Telnet port (23). The optional timeout parameter
specifies a timeout in seconds for blocking operations like the connection
attempt (if not specified, the global default timeout setting will be used).
Do not try to reopen an already connected instance.
Print a debug message when the debug level is > 0. If extra arguments are
present, they are substituted in the message using the standard string
formatting operator.
Write a byte string to the socket, doubling any IAC characters. This can
block if the connection is blocked. May raise socket.error if the
connection is closed.
Read until one from a list of a regular expressions matches.
The first argument is a list of regular expressions, either compiled
(re.RegexObject instances) or uncompiled (byte strings). The
optional second argument is a timeout, in seconds; the default is to block
indefinitely.
Return a tuple of three items: the index in the list of the first regular
expression that matches; the match object returned; and the bytes read up
till and including the match.
If end of file is found and no bytes were read, raise EOFError.
Otherwise, when nothing matches, return (-1,None,data) where data is
the bytes received so far (may be empty bytes if a timeout happened).
If a regular expression ends with a greedy match (such as .*) or if more
than one expression can match the same input, the results are
non-deterministic, and may depend on the I/O timing.
Each time a telnet option is read on the input flow, this callback (if set) is
called with the following parameters : callback(telnet socket, command
(DO/DONT/WILL/WONT), option). No other action is done afterwards by telnetlib.
This module provides immutable UUID objects (the UUID class)
and the functions uuid1(), uuid3(), uuid4(), uuid5() for
generating version 1, 3, 4, and 5 UUIDs as specified in RFC 4122.
If all you want is a unique ID, you should probably call uuid1() or
uuid4(). Note that uuid1() may compromise privacy since it creates
a UUID containing the computer’s network address. uuid4() creates a
random UUID.
class uuid.UUID(hex=None, bytes=None, bytes_le=None, fields=None, int=None, version=None)¶
Create a UUID from either a string of 32 hexadecimal digits, a string of 16
bytes as the bytes argument, a string of 16 bytes in little-endian order as
the bytes_le argument, a tuple of six integers (32-bit time_low, 16-bit
time_mid, 16-bit time_hi_version, 8-bit clock_seq_hi_variant, 8-bit
clock_seq_low, 48-bit node) as the fields argument, or a single 128-bit
integer as the int argument. When a string of hex digits is given, curly
braces, hyphens, and a URN prefix are all optional. For example, these
expressions all yield the same UUID:
Exactly one of hex, bytes, bytes_le, fields, or int must be given.
The version argument is optional; if given, the resulting UUID will have its
variant and version number set according to RFC 4122, overriding bits in the
given hex, bytes, bytes_le, fields, or int.
Get the hardware address as a 48-bit positive integer. The first time this
runs, it may launch a separate program, which could be quite slow. If all
attempts to obtain the hardware address fail, we choose a random 48-bit number
with its eighth bit set to 1 as recommended in RFC 4122. “Hardware address”
means the MAC address of a network interface, and on a machine with multiple
network interfaces the MAC address of any one of them may be returned.
Generate a UUID from a host ID, sequence number, and the current time. If node
is not given, getnode() is used to obtain the hardware address. If
clock_seq is given, it is used as the sequence number; otherwise a random
14-bit sequence number is chosen.
Here are some examples of typical usage of the uuid module:
>>> importuuid# make a UUID based on the host ID and current time>>> uuid.uuid1()UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')# make a UUID using an MD5 hash of a namespace UUID and a name>>> uuid.uuid3(uuid.NAMESPACE_DNS,'python.org')UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')# make a random UUID>>> uuid.uuid4()UUID('16fd2706-8baf-433b-82eb-8c7fada847da')# make a UUID using a SHA-1 hash of a namespace UUID and a name>>> uuid.uuid5(uuid.NAMESPACE_DNS,'python.org')UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')# make a UUID from a string of hex digits (braces and hyphens ignored)>>> x=uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')# convert a UUID to a string of hex digits in standard form>>> str(x)'00010203-0405-0607-0809-0a0b0c0d0e0f'# get the raw 16 bytes of the UUID>>> x.bytesb'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'# make a UUID from a 16-byte string>>> uuid.UUID(bytes=x.bytes)UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')
The socketserver module simplifies the task of writing network servers.
There are four basic server classes: TCPServer uses the Internet TCP
protocol, which provides for continuous streams of data between the client and
server. UDPServer uses datagrams, which are discrete packets of
information that may arrive out of order or be lost while in transit. The more
infrequently used UnixStreamServer and UnixDatagramServer
classes are similar, but use Unix domain sockets; they’re not available on
non-Unix platforms. For more details on network programming, consult a book
such as
W. Richard Steven’s UNIX Network Programming or Ralph Davis’s Win32 Network
Programming.
These four classes process requests synchronously; each request must be
completed before the next request can be started. This isn’t suitable if each
request takes a long time to complete, because it requires a lot of computation,
or because it returns a lot of data which the client is slow to process. The
solution is to create a separate process or thread to handle each request; the
ForkingMixIn and ThreadingMixIn mix-in classes can be used to
support asynchronous behaviour.
Creating a server requires several steps. First, you must create a request
handler class by subclassing the BaseRequestHandler class and
overriding its handle() method; this method will process incoming
requests. Second, you must instantiate one of the server classes, passing it
the server’s address and the request handler class. Finally, call the
handle_request() or serve_forever() method of the server object to
process one or many requests.
When inheriting from ThreadingMixIn for threaded connection behavior,
you should explicitly declare how you want your threads to behave on an abrupt
shutdown. The ThreadingMixIn class defines an attribute
daemon_threads, which indicates whether or not the server should wait for
thread termination. You should set the flag explicitly if you would like threads
to behave autonomously; the default is False, meaning that Python will
not exit until all threads created by ThreadingMixIn have exited.
Server classes have the same external methods and attributes, no matter what
network protocol they use.
Note that UnixDatagramServer derives from UDPServer, not from
UnixStreamServer — the only difference between an IP and a Unix
stream server is the address family, which is simply repeated in both Unix
server classes.
Forking and threading versions of each type of server can be created using the
ForkingMixIn and ThreadingMixIn mix-in classes. For instance,
a threading UDP server class is created as follows:
The mix-in class must come first, since it overrides a method defined in
UDPServer. Setting the various attributes also change the
behavior of the underlying server mechanism.
To implement a service, you must derive a class from BaseRequestHandler
and redefine its handle() method. You can then run various versions of
the service by combining one of the server classes with your request handler
class. The request handler class must be different for datagram or stream
services. This can be hidden by using the handler subclasses
StreamRequestHandler or DatagramRequestHandler.
Of course, you still have to use your head! For instance, it makes no sense to
use a forking server if the service contains state in memory that can be
modified by different requests, since the modifications in the child process
would never reach the initial state kept in the parent process and passed to
each child. In this case, you can use a threading server, but you will probably
have to use locks to protect the integrity of the shared data.
On the other hand, if you are building an HTTP server where all data is stored
externally (for instance, in the file system), a synchronous class will
essentially render the service “deaf” while one request is being handled –
which may be for a very long time if a client is slow to receive all the data it
has requested. Here a threading or forking server is appropriate.
In some cases, it may be appropriate to process part of a request synchronously,
but to finish processing in a forked child depending on the request data. This
can be implemented by using a synchronous server and doing an explicit fork in
the request handler class handle() method.
Another approach to handling multiple simultaneous requests in an environment
that supports neither threads nor fork() (or where these are too expensive
or inappropriate for the service) is to maintain an explicit table of partially
finished requests and to use select() to decide which request to work on
next (or whether to handle a new incoming request). This is particularly
important for stream services where each client can potentially be connected for
a long time (if threads or subprocesses cannot be used). See asyncore for
another way to manage this.
This is the superclass of all Server objects in the module. It defines the
interface, given below, but does not implement most of the methods, which is
done in subclasses.
Return an integer file descriptor for the socket on which the server is
listening. This function is most commonly passed to select.select(), to
allow monitoring multiple servers in the same process.
Process a single request. This function calls the following methods in
order: get_request(), verify_request(), and
process_request(). If the user-provided handle() method of the
handler class raises an exception, the server’s handle_error() method
will be called. If no request is received within self.timeout
seconds, handle_timeout() will be called and handle_request()
will return.
The address on which the server is listening. The format of addresses varies
depending on the protocol family; see the documentation for the socket module
for details. For Internet protocols, this is a tuple containing a string giving
the address, and an integer port number: ('127.0.0.1',80), for example.
The size of the request queue. If it takes a long time to process a single
request, any requests that arrive while the server is busy are placed into a
queue, up to request_queue_size requests. Once the queue is full,
further requests from clients will get a “Connection denied” error. The default
value is usually 5, but this can be overridden by subclasses.
Timeout duration, measured in seconds, or None if no timeout is
desired. If handle_request() receives no incoming requests within the
timeout period, the handle_timeout() method is called.
There are various server methods that can be overridden by subclasses of base
server classes like TCPServer; these methods aren’t useful to external
users of the server object.
Must accept a request from the socket, and return a 2-tuple containing the new
socket object to be used to communicate with the client, and the client’s
address.
This function is called if the RequestHandlerClass‘s handle()
method raises an exception. The default action is to print the traceback to
standard output and continue handling further requests.
This function is called when the timeout attribute has been set to a
value other than None and the timeout period has passed with no
requests being received. The default action for forking servers is
to collect the status of any child processes that have exited, while
in threading servers this method does nothing.
Calls finish_request() to create an instance of the
RequestHandlerClass. If desired, this function can create a new process
or thread to handle the request; the ForkingMixIn and
ThreadingMixIn classes do this.
Must return a Boolean value; if the value is True, the request will be
processed, and if it’s False, the request will be denied. This function
can be overridden to implement access controls for a server. The default
implementation always returns True.
The request handler class must define a new handle() method, and can
override any of the following methods. A new instance is created for each
request.
Called after the handle() method to perform any clean-up actions
required. The default implementation does nothing. If setup() or
handle() raise an exception, this function will not be called.
This function must do all the work required to service a request. The
default implementation does nothing. Several instance attributes are
available to it; the request is available as self.request; the client
address as self.client_address; and the server instance as
self.server, in case it needs access to per-server information.
The type of self.request is different for datagram or stream
services. For stream services, self.request is a socket object; for
datagram services, self.request is a pair of string and socket.
However, this can be hidden by using the request handler subclasses
StreamRequestHandler or DatagramRequestHandler, which
override the setup() and finish() methods, and provide
self.rfile and self.wfile attributes. self.rfile and
self.wfile can be read or written, respectively, to get the request
data or return data to the client.
importsocketserverclassMyTCPHandler(socketserver.BaseRequestHandler):""" The RequestHandler class for our server. It is instantiated once per connection to the server, and must override the handle() method to implement communication to the client. """defhandle(self):# self.request is the TCP socket connected to the clientself.data=self.request.recv(1024).strip()print("%s wrote:"%self.client_address[0])print(self.data)# just send back the same data, but upper-casedself.request.send(self.data.upper())if__name__=="__main__":HOST,PORT="localhost",9999# Create the server, binding to localhost on port 9999server=socketserver.TCPServer((HOST,PORT),MyTCPHandler)# Activate the server; this will keep running until you# interrupt the program with Ctrl-Cserver.serve_forever()
An alternative request handler class that makes use of streams (file-like
objects that simplify communication by providing the standard file interface):
classMyTCPHandler(socketserver.StreamRequestHandler):defhandle(self):# self.rfile is a file-like object created by the handler;# we can now use e.g. readline() instead of raw recv() callsself.data=self.rfile.readline().strip()print("%s wrote:"%self.client_address[0])print(self.data)# Likewise, self.wfile is a file-like object used to write back# to the clientself.wfile.write(self.data.upper())
The difference is that the readline() call in the second handler will call
recv() multiple times until it encounters a newline character, while the
single recv() call in the first handler will just return what has been sent
from the client in one send() call.
This is the client side:
importsocketimportsysHOST,PORT="localhost",9999data=" ".join(sys.argv[1:])# Create a socket (SOCK_STREAM means a TCP socket)sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)# Connect to server and send datasock.connect((HOST,PORT))sock.send(bytes(data+"\n","utf8"))# Receive data from the server and shut downreceived=sock.recv(1024)sock.close()print("Sent: %s"%data)print("Received: %s"%received)
The output of the example should look something like this:
Server:
$ python TCPServer.py
127.0.0.1 wrote:
b'hello world with TCP'
127.0.0.1 wrote:
b'python is nice'
Client:
$ python TCPClient.py hello world with TCP
Sent: hello world with TCP
Received: b'HELLO WORLD WITH TCP'
$ python TCPClient.py python is nice
Sent: python is nice
Received: b'PYTHON IS NICE'
importsocketserverclassMyUDPHandler(socketserver.BaseRequestHandler):""" This class works similar to the TCP handler class, except that self.request consists of a pair of data and client socket, and since there is no connection the client address must be given explicitly when sending data back via sendto(). """defhandle(self):data=self.request[0].strip()socket=self.request[1]print("%s wrote:"%self.client_address[0])print(data)socket.sendto(data.upper(),self.client_address)if__name__=="__main__":HOST,PORT="localhost",9999server=socketserver.UDPServer((HOST,PORT),MyUDPHandler)server.serve_forever()
This is the client side:
importsocketimportsysHOST,PORT="localhost",9999data=" ".join(sys.argv[1:])# SOCK_DGRAM is the socket type to use for UDP socketssock=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)# As you can see, there is no connect() call; UDP has no connections.# Instead, data is directly sent to the recipient via sendto().sock.sendto(bytes(data+"\n","utf8"),(HOST,PORT))received=sock.recv(1024)print("Sent: %s"%data)print("Received: %s"%received)
The output of the example should look exactly like for the TCP server example.
To build asynchronous handlers, use the ThreadingMixIn and
ForkingMixIn classes.
An example for the ThreadingMixIn class:
importsocketimportthreadingimportsocketserverclassThreadedTCPRequestHandler(socketserver.BaseRequestHandler):defhandle(self):data=self.request.recv(1024)cur_thread=threading.current_thread()response=bytes("%s: %s"%(cur_thread.getName(),data),'ascii')self.request.send(response)classThreadedTCPServer(socketserver.ThreadingMixIn,socketserver.TCPServer):passdefclient(ip,port,message):sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)sock.connect((ip,port))sock.send(message)response=sock.recv(1024)print("Received: %s"%response)sock.close()if__name__=="__main__":# Port 0 means to select an arbitrary unused portHOST,PORT="localhost",0server=ThreadedTCPServer((HOST,PORT),ThreadedTCPRequestHandler)ip,port=server.server_address# Start a thread with the server -- that thread will then start one# more thread for each requestserver_thread=threading.Thread(target=server.serve_forever)# Exit the server thread when the main thread terminatesserver_thread.setDaemon(True)server_thread.start()print("Server loop running in thread:",server_thread.name)client(ip,port,b"Hello World 1")client(ip,port,b"Hello World 2")client(ip,port,b"Hello World 3")server.shutdown()
The output of the example should look something like this:
$ python ThreadedTCPServer.py
Server loop running in thread: Thread-1
Received: b"Thread-2: b'Hello World 1'"
Received: b"Thread-3: b'Hello World 2'"
Received: b"Thread-4: b'Hello World 3'"
The ForkingMixIn class is used in the same way, except that the server
will spawn a new process for each request.
This module defines classes for implementing HTTP servers (Web servers).
One class, HTTPServer, is a socketserver.TCPServer subclass.
It creates and listens at the HTTP socket, dispatching the requests to a
handler. Code to create and run the server looks like this:
class http.server.HTTPServer(server_address, RequestHandlerClass)¶
This class builds on the TCPServer class by storing the server
address as instance variables named server_name and
server_port. The server is accessible by the handler, typically
through the handler’s server instance variable.
The HTTPServer must be given a RequestHandlerClass on instantiation,
of which this module provides three different variants:
class http.server.BaseHTTPRequestHandler(request, client_address, server)¶
This class is used to handle the HTTP requests that arrive at the server. By
itself, it cannot respond to any actual HTTP requests; it must be subclassed
to handle each request method (e.g. GET or POST).
BaseHTTPRequestHandler provides a number of class and instance
variables, and methods for use by subclasses.
The handler will parse the request and the headers, then call a method
specific to the request type. The method name is constructed from the
request. For example, for the request method SPAM, the do_SPAM()
method will be called with no arguments. All of the relevant information is
stored in instance variables of the handler. Subclasses should not need to
override or extend the __init__() method.
Specifies the server software version. You may want to override this. The
format is multiple whitespace-separated strings, where each string is of
the form name[/version]. For example, 'BaseHTTP/0.2'.
Specifies a format string for building an error response to the client. It
uses parenthesized, keyed format specifiers, so the format operand must be
a dictionary. The code key should be an integer, specifying the numeric
HTTP error code value. message should be a string containing a
(detailed) error message of what occurred, and explain should be an
explanation of the error code number. Default message and explain
values can found in the responses class variable.
This specifies the HTTP protocol version used in responses. If set to
'HTTP/1.1', the server will permit HTTP persistent connections;
however, your server must then include an accurate Content-Length
header (using send_header()) in all of its responses to clients.
For backwards compatibility, the setting defaults to 'HTTP/1.0'.
This variable contains a mapping of error code integers to two-element tuples
containing a short and long message. For example, {code:(shortmessage,longmessage)}. The shortmessage is usually used as the message key in an
error response, and longmessage as the explain key (see the
error_message_format class variable).
Calls handle_one_request() once (or, if persistent connections are
enabled, multiple times) to handle incoming HTTP requests. You should
never need to override it; instead, implement appropriate do_*()
methods.
When a HTTP/1.1 compliant server receives a Expect:100-continue
request header it responds back with a 100Continue followed by 200OK headers.
This method can be overridden to raise an error if the server does not
want the client to continue. For e.g. server can chose to send 417ExpectationFailed as a response header and returnFalse.
Sends and logs a complete error reply to the client. The numeric code
specifies the HTTP error code, with message as optional, more specific text. A
complete set of headers is sent, followed by text composed using the
error_message_format class variable.
Sends a response header and logs the accepted request. The HTTP response
line is sent, followed by Server and Date headers. The values for
these two headers are picked up from the version_string() and
date_time_string() methods, respectively.
Stores the HTTP header to an internal buffer which will be written to the
output stream when end_headers() method is invoked.
keyword should specify the header keyword, with value
specifying its value.
Changed in version 3.2:
Changed in version 3.2: Storing the headers in an internal buffer
Sends the reponse header only, used for the purposes when 100Continue response is sent by the server to the client. The headers not
buffered and sent directly the output stream.If the message is not
specified, the HTTP message corresponding the response code is sent.
Logs an accepted (successful) request. code should specify the numeric
HTTP code associated with the response. If a size of the response is
available, then it should be passed as the size parameter.
Logs an error when a request cannot be fulfilled. By default, it passes
the message to log_message(), so it takes the same arguments
(format and additional values).
Logs an arbitrary message to sys.stderr. This is typically overridden
to create custom error logging mechanisms. The format argument is a
standard printf-style format string, where the additional arguments to
log_message() are applied as inputs to the formatting. The client
address and current date and time are prefixed to every message logged.
Returns the date and time given by timestamp (which must be None or in
the format returned by time.time()), formatted for a message
header. If timestamp is omitted, it uses the current date and time.
Returns the client address, formatted for logging. A name lookup is
performed on the client’s IP address.
class http.server.SimpleHTTPRequestHandler(request, client_address, server)¶
This class serves files from the current directory and below, directly
mapping the directory structure to HTTP requests.
A lot of the work, such as parsing the request, is done by the base class
BaseHTTPRequestHandler. This class implements the do_GET()
and do_HEAD() functions.
A dictionary mapping suffixes into MIME types. The default is
signified by an empty string, and is considered to be
application/octet-stream. The mapping is used case-insensitively,
and so should contain only lower-cased keys.
This method serves the 'HEAD' request type: it sends the headers it
would send for the equivalent GET request. See the do_GET()
method for a more complete explanation of the possible headers.
The request is mapped to a local file by interpreting the request as a
path relative to the current working directory.
If the request was mapped to a directory, the directory is checked for a
file named index.html or index.htm (in that order). If found, the
file’s contents are returned; otherwise a directory listing is generated
by calling the list_directory() method. This method uses
os.listdir() to scan the directory, and returns a 404 error
response if the listdir() fails.
If the request was mapped to a file, it is opened and the contents are
returned. Any IOError exception in opening the requested file is
mapped to a 404, 'Filenotfound' error. Otherwise, the content
type is guessed by calling the guess_type() method, which in turn
uses the extensions_map variable.
A 'Content-type:' header with the guessed content type is output,
followed by a 'Content-Length:' header with the file’s size and a
'Last-Modified:' header with the file’s modification time.
Then follows a blank line signifying the end of the headers, and then the
contents of the file are output. If the file’s MIME type starts with
text/ the file is opened in text mode; otherwise binary mode is used.
For example usage, see the implementation of the test() function
invocation in the http.server module.
The SimpleHTTPRequestHandler class can be used in the following
manner in order to create a very basic webserver serving files relative to
the current directory.
importhttp.serverimportsocketserverPORT=8000Handler=http.server.SimpleHTTPRequestHandlerhttpd=socketserver.TCPServer(("",PORT),Handler)print("serving at port",PORT)httpd.serve_forever()
http.server can also be invoked directly using the -m
switch of the interpreter a with portnumber argument. Similar to
the previous example, this serves files relative to the current directory.
python -m http.server 8000
class http.server.CGIHTTPRequestHandler(request, client_address, server)¶
This class is used to serve either files or output of CGI scripts from the
current directory and below. Note that mapping HTTP hierarchic structure to
local directory structure is exactly as in SimpleHTTPRequestHandler.
Note
CGI scripts run by the CGIHTTPRequestHandler class cannot execute
redirects (HTTP code 302), because code 200 (script output follows) is
sent prior to execution of the CGI script. This pre-empts the status
code.
The class will however, run the CGI script, instead of serving it as a file,
if it guesses it to be a CGI script. Only directory-based CGI are used —
the other common server configuration is to treat special extensions as
denoting CGI scripts.
The do_GET() and do_HEAD() functions are modified to run CGI scripts
and serve the output, instead of serving files, if the request leads to
somewhere below the cgi_directories path.
This method serves the 'POST' request type, only allowed for CGI
scripts. Error 501, “Can only POST to CGI scripts”, is output when trying
to POST to a non-CGI url.
Note that CGI scripts will be run with UID of user nobody, for security
reasons. Problems with the CGI script will be translated to error 403.
The http.cookies module defines classes for abstracting the concept of
cookies, an HTTP state management mechanism. It supports both simple string-only
cookies, and provides an abstraction for having any serializable data-type as
cookie value.
The module formerly strictly applied the parsing rules described in the
RFC 2109 and RFC 2068 specifications. It has since been discovered that
MSIE 3.0x doesn’t follow the character rules outlined in those specs. As a
result, the parsing rules used are a bit less strict.
Note
On encountering an invalid cookie, CookieError is raised, so if your
cookie data comes from a browser you should always prepare for invalid data
and catch CookieError on parsing.
This class is a dictionary-like object whose keys are strings and whose values
are Morsel instances. Note that upon setting a key to a value, the
value is first converted to a Morsel containing the key and the value.
If input is given, it is passed to the load() method.
Return a decoded value from a string representation. Return value can be any
type. This method does nothing in BaseCookie — it exists so it can be
overridden.
Return an encoded value. val can be any type, but return value must be a
string. This method does nothing in BaseCookie — it exists so it can
be overridden
In general, it should be the case that value_encode() and
value_decode() are inverses on the range of value_decode.
Return a string representation suitable to be sent as HTTP headers. attrs and
header are sent to each Morsel‘s output() method. sep is used
to join the headers together, and is by default the combination '\r\n'
(CRLF).
Abstract a key/value pair, which has some RFC 2109 attributes.
Morsels are dictionary-like objects, whose set of keys is constant — the valid
RFC 2109 attributes, which are
expires
path
comment
domain
max-age
secure
version
httponly
The attribute httponly specifies that the cookie is only transferred
in HTTP requests, and is not accessible through JavaScript. This is intended
to mitigate some forms of cross-site scripting.
Return a string representation of the Morsel, suitable to be sent as an HTTP
header. By default, all the attributes are included, unless attrs is given, in
which case it should be a list of attributes to use. header is by default
"Set-Cookie:".
The http.cookiejar module defines classes for automatic handling of HTTP
cookies. It is useful for accessing web sites that require small pieces of data
– cookies – to be set on the client machine by an HTTP response from a
web server, and then returned to the server in later HTTP requests.
Both the regular Netscape cookie protocol and the protocol defined by
RFC 2965 are handled. RFC 2965 handling is switched off by default.
RFC 2109 cookies are parsed as Netscape cookies and subsequently treated
either as Netscape or RFC 2965 cookies according to the ‘policy’ in effect.
Note that the great majority of cookies on the Internet are Netscape cookies.
http.cookiejar attempts to follow the de-facto Netscape cookie protocol (which
differs substantially from that set out in the original Netscape specification),
including taking note of the max-age and port cookie-attributes
introduced with RFC 2965.
Note
The various named parameters found in Set-Cookie and
Set-Cookie2 headers (eg. domain and expires) are
conventionally referred to as attributes. To distinguish them from
Python attributes, the documentation for this module uses the term
cookie-attribute instead.
policy is an object implementing the CookiePolicy interface.
The CookieJar class stores HTTP cookies. It extracts cookies from HTTP
requests, and returns them in HTTP responses. CookieJar instances
automatically expire contained cookies when necessary. Subclasses are also
responsible for storing and retrieving cookies from a file or database.
class http.cookiejar.FileCookieJar(filename, delayload=None, policy=None)¶
policy is an object implementing the CookiePolicy interface. For the
other arguments, see the documentation for the corresponding attributes.
Constructor arguments should be passed as keyword arguments only.
blocked_domains is a sequence of domain names that we never accept cookies
from, nor return cookies to. allowed_domains if not None, this is a
sequence of the only domains for which we accept and return cookies. For all
other arguments, see the documentation for CookiePolicy and
DefaultCookiePolicy objects.
DefaultCookiePolicy implements the standard accept / reject rules for
Netscape and RFC 2965 cookies. By default, RFC 2109 cookies (ie. cookies
received in a Set-Cookie header with a version cookie-attribute of
1) are treated according to the RFC 2965 rules. However, if RFC 2965 handling
is turned off or rfc2109_as_netscape is True, RFC 2109 cookies are
‘downgraded’ by the CookieJar instance to Netscape cookies, by
setting the version attribute of the Cookie instance to 0.
DefaultCookiePolicy also provides some parameters to allow some
fine-tuning of policy.
This class represents Netscape, RFC 2109 and RFC 2965 cookies. It is not
expected that users of http.cookiejar construct their own Cookie
instances. Instead, if necessary, call make_cookies() on a
CookieJar instance.
The specification of the original Netscape cookie protocol. Though this is
still the dominant protocol, the ‘Netscape cookie protocol’ implemented by all
the major browsers (and http.cookiejar) only bears a passing resemblance to
the one sketched out in cookie_spec.html.
If policy allows (ie. the rfc2965 and hide_cookie2 attributes of
the CookieJar‘s CookiePolicy instance are true and false
respectively), the Cookie2 header is also added when appropriate.
The request object (usually a urllib.request..Request instance)
must support the methods get_full_url(), get_host(),
get_type(), unverifiable(), get_origin_req_host(),
has_header(), get_header(), header_items(), and
add_unredirected_header(), as documented by urllib.request.
Extract cookies from HTTP response and store them in the CookieJar,
where allowed by policy.
The CookieJar will look for allowable Set-Cookie and
Set-Cookie2 headers in the response argument, and store cookies
as appropriate (subject to the CookiePolicy.set_ok() method’s approval).
The request object (usually a urllib.request.Request instance)
must support the methods get_full_url(), get_host(),
unverifiable(), and get_origin_req_host(), as documented by
urllib.request. The request is used to set default values for
cookie-attributes as well as for checking that the cookie is allowed to be
set.
If invoked without arguments, clear all cookies. If given a single argument,
only cookies belonging to that domain will be removed. If given two arguments,
cookies belonging to the specified domain and URL path are removed. If
given three arguments, then the cookie with the specified domain, path and
name is removed.
Discards all contained cookies that have a true discard attribute
(usually because they had either no max-age or expires cookie-attribute,
or an explicit discard cookie-attribute). For interactive browsers, the end
of a session usually corresponds to closing the browser window.
Note that the save() method won’t save session cookies anyway, unless you
ask otherwise by passing a true ignore_discard argument.
FileCookieJar implements the following additional methods:
This base class raises NotImplementedError. Subclasses may leave this
method unimplemented.
filename is the name of file in which to save cookies. If filename is not
specified, self.filename is used (whose default is the value passed to
the constructor, if any); if self.filename is None,
ValueError is raised.
ignore_discard: save even cookies set to be discarded. ignore_expires: save
even cookies that have expired
The file is overwritten if it already exists, thus wiping all the cookies it
contains. Saved cookies can be restored later using the load() or
revert() methods.
The named file must be in the format understood by the class, or
LoadError will be raised. Also, IOError may be raised, for
example if the file does not exist.
If true, load cookies lazily from disk. This attribute should not be assigned
to. This is only a hint, since this only affects performance, not behaviour
(unless the cookies on disk are changing). A CookieJar object may
ignore it. None of the FileCookieJar classes included in the standard
library lazily loads cookies.
FileCookieJar subclasses and co-operation with web browsers¶
The following CookieJar subclasses are provided for reading and
writing .
class http.cookiejar.MozillaCookieJar(filename, delayload=None, policy=None)¶
A FileCookieJar that can load from and save cookies to disk in the
Mozilla cookies.txt file format (which is also used by the Lynx and Netscape
browsers).
Note
This loses information about RFC 2965 cookies, and also about newer or
non-standard cookie-attributes such as port.
Warning
Back up your cookies before saving if you have cookies whose loss / corruption
would be inconvenient (there are some subtleties which may lead to slight
changes in the file over a load / save round-trip).
Also note that cookies saved while Mozilla is running will get clobbered by
Mozilla.
class http.cookiejar.LWPCookieJar(filename, delayload=None, policy=None)¶
A FileCookieJar that can load from and save cookies to disk in format
compatible with the libwww-perl library’s Set-Cookie3 file format. This is
convenient if you want to store cookies in a human-readable file.
Return false if cookies should not be returned, given cookie domain.
This method is an optimization. It removes the need for checking every cookie
with a particular domain (which might involve reading many files). Returning
true from domain_return_ok() and path_return_ok() leaves all the
work to return_ok().
Note that domain_return_ok() is called for every cookie domain, not just
for the request domain. For example, the function might be called with both
".example.com" and "www.example.com" if the request domain is
"www.example.com". The same goes for path_return_ok().
The request argument is as documented for return_ok().
In addition to implementing the methods above, implementations of the
CookiePolicy interface must also supply the following attributes,
indicating which protocols should be used, and how. All of these attributes may
be assigned to.
Don’t add Cookie2 header to requests (the presence of this header
indicates to the server that we understand RFC 2965 cookies).
The most useful way to define a CookiePolicy class is by subclassing
from DefaultCookiePolicy and overriding some or all of the methods
above. CookiePolicy itself may be used as a ‘null policy’ to allow
setting and receiving any and all cookies (this is unlikely to be useful).
Implements the standard rules for accepting and returning cookies.
Both RFC 2965 and Netscape cookies are covered. RFC 2965 handling is switched
off by default.
The easiest way to provide your own policy is to override this class and call
its methods in your overridden implementations before adding your own additional
checks:
In addition to the features required to implement the CookiePolicy
interface, this class allows you to block and allow domains from setting and
receiving cookies. There are also some strictness switches that allow you to
tighten up the rather loose Netscape protocol rules a little bit (at the cost of
blocking some benign cookies).
A domain blacklist and whitelist is provided (both off by default). Only domains
not in the blacklist and present in the whitelist (if the whitelist is active)
participate in cookie setting and returning. Use the blocked_domains
constructor argument, and blocked_domains() and
set_blocked_domains() methods (and the corresponding argument and methods
for allowed_domains). If you set a whitelist, you can turn it off again by
setting it to None.
Domains in block or allow lists that do not start with a dot must equal the
cookie domain to be matched. For example, "example.com" matches a blacklist
entry of "example.com", but "www.example.com" does not. Domains that do
start with a dot are matched by more specific domains too. For example, both
"www.example.com" and "www.coyote.example.com" match ".example.com"
(but "example.com" itself does not). IP addresses are an exception, and
must match exactly. For example, if blocked_domains contains "192.168.1.2"
and ".168.1.2", 192.168.1.2 is blocked, but 193.168.1.2 is not.
Return whether domain is not on the whitelist for setting or receiving
cookies.
DefaultCookiePolicy instances have the following attributes, which are
all initialised from the constructor arguments of the same name, and which may
all be assigned to.
If true, request that the CookieJar instance downgrade RFC 2109 cookies
(ie. cookies received in a Set-Cookie header with a version
cookie-attribute of 1) to Netscape cookies by setting the version attribute of
the Cookie instance to 0. The default value is None, in which
case RFC 2109 cookies are downgraded if and only if RFC 2965 handling is turned
off. Therefore, RFC 2109 cookies are downgraded by default.
Don’t allow sites to set two-component domains with country-code top-level
domains like .co.uk, .gov.uk, .co.nz.etc. This is far from perfect
and isn’t guaranteed to work!
Follow RFC 2965 rules on unverifiable transactions (usually, an unverifiable
transaction is one resulting from a redirect or a request for an image hosted on
another site). If this is false, cookies are never blocked on the basis of
verifiability
Don’t allow setting cookies whose path doesn’t path-match request URI.
strict_ns_domain is a collection of flags. Its value is constructed by
or-ing together (for example, DomainStrictNoDots|DomainStrictNonDomain means
both flags are set).
Cookies that did not explicitly specify a domain cookie-attribute can only
be returned to a domain equal to the domain that set the cookie (eg.
spam.example.com won’t be returned cookies from example.com that had no
domain cookie-attribute).
Cookie instances have Python attributes roughly corresponding to the
standard cookie-attributes specified in the various cookie standards. The
correspondence is not one-to-one, because there are complicated rules for
assigning default values, because the max-age and expires
cookie-attributes contain equivalent information, and because RFC 2109 cookies
may be ‘downgraded’ by http.cookiejar from version 1 to version 0 (Netscape)
cookies.
Assignment to these attributes should not be necessary other than in rare
circumstances in a CookiePolicy method. The class does not enforce
internal consistency, so you should know what you’re doing if you do that.
Integer or None. Netscape cookies have version 0. RFC 2965 and
RFC 2109 cookies have a version cookie-attribute of 1. However, note that
http.cookiejar may ‘downgrade’ RFC 2109 cookies to Netscape cookies, in which
case version is 0.
True if this cookie was received as an RFC 2109 cookie (ie. the cookie
arrived in a Set-Cookie header, and the value of the Version
cookie-attribute in that header was 1). This attribute is provided because
http.cookiejar may ‘downgrade’ RFC 2109 cookies to Netscape cookies, in
which case version is 0.
True if cookie has passed the time at which the server requested it should
expire. If now is given (in seconds since the epoch), return whether the
cookie has expired at the specified time.
This example illustrates how to open a URL using your Netscape, Mozilla, or Lynx
cookies (assumes Unix/Netscape convention for location of the cookies file):
The next example illustrates the use of DefaultCookiePolicy. Turn on
RFC 2965 cookies, be more strict about domains when setting and returning
Netscape cookies, and block some domains from setting cookies or having them
returned:
XML-RPC is a Remote Procedure Call method that uses XML passed via HTTP as a
transport. With it, a client can call methods with parameters on a remote
server (the server is named by a URI) and get back structured data. This module
supports writing XML-RPC client code; it handles all the details of translating
between conformable Python objects and XML on the wire.
class xmlrpc.client.ServerProxy(uri, transport=None, encoding=None, verbose=False, allow_none=False, use_datetime=False)¶
A ServerProxy instance is an object that manages communication with a
remote XML-RPC server. The required first argument is a URI (Uniform Resource
Indicator), and will normally be the URL of the server. The optional second
argument is a transport factory instance; by default it is an internal
SafeTransport instance for https: URLs and an internal HTTP
Transport instance otherwise. The optional third argument is an
encoding, by default UTF-8. The optional fourth argument is a debugging flag.
If allow_none is true, the Python constant None will be translated into
XML; the default behaviour is for None to raise a TypeError. This is
a commonly-used extension to the XML-RPC specification, but isn’t supported by
all clients and servers; see http://ontosys.com/xml-rpc/extensions.php for a
description. The use_datetime flag can be used to cause date/time values to
be presented as datetime.datetime objects; this is false by default.
datetime.datetime objects may be passed to calls.
Both the HTTP and HTTPS transports support the URL syntax extension for HTTP
Basic Authentication: http://user:pass@host:port/path. The user:pass
portion will be base64-encoded as an HTTP ‘Authorization’ header, and sent to
the remote server as part of the connection process when invoking an XML-RPC
method. You only need to use this if the remote server requires a Basic
Authentication user and password.
The returned instance is a proxy object with methods that can be used to invoke
corresponding RPC calls on the remote server. If the remote server supports the
introspection API, the proxy can also be used to query the remote server for the
methods it supports (service discovery) and fetch other server-associated
metadata.
ServerProxy instance methods take Python basic types and objects as
arguments and return Python basic types and classes. Types that are conformable
(e.g. that can be marshalled through XML), include the following (and except
where noted, they are unmarshalled as the same Python type):
Any Python sequence type containing
conformable elements. Arrays are returned
as lists
structures
A Python dictionary. Keys must be strings,
values may be any conformable type. Objects
of user-defined classes can be passed in;
only their __dict__ attribute is
transmitted.
dates
in seconds since the epoch (pass in an
instance of the DateTime class) or
a datetime.datetime instance.
binarydata
pass in an instance of the Binary
wrapper class
This is the full set of data types supported by XML-RPC. Method calls may also
raise a special Fault instance, used to signal XML-RPC server errors, or
ProtocolError used to signal an error in the HTTP/HTTPS transport layer.
Both Fault and ProtocolError derive from a base class called
Error. Note that the xmlrpc client module currently does not marshal
instances of subclasses of built-in types.
When passing strings, characters special to XML such as <, >, and &
will be automatically escaped. However, it’s the caller’s responsibility to
ensure that the string is free of characters that aren’t allowed in XML, such as
the control characters with ASCII values between 0 and 31 (except, of course,
tab, newline and carriage return); failing to do this will result in an XML-RPC
request that isn’t well-formed XML. If you have to pass arbitrary strings via
XML-RPC, use the Binary wrapper class described below.
Server is retained as an alias for ServerProxy for backwards
compatibility. New code should use ServerProxy.
A good description of XML-RPC operation and client software in several languages.
Contains pretty much everything an XML-RPC client developer needs to know.
Fredrik Lundh’s “unofficial errata, intended to clarify certain
details in the XML-RPC specification, as well as hint at
‘best practices’ to use when designing your own XML-RPC
implementations.”
A ServerProxy instance has a method corresponding to each remote
procedure call accepted by the XML-RPC server. Calling the method performs an
RPC, dispatched by both name and argument signature (e.g. the same method name
can be overloaded with multiple argument signatures). The RPC finishes by
returning a value, which may be either returned data in a conformant type or a
Fault or ProtocolError object indicating an error.
Servers that support the XML introspection API support some common methods
grouped under the reserved system attribute:
This method takes one parameter, the name of a method implemented by the XML-RPC
server. It returns an array of possible signatures for this method. A signature
is an array of types. The first of these types is the return type of the method,
the rest are parameters.
Because multiple signatures (ie. overloading) is permitted, this method returns
a list of signatures rather than a singleton.
Signatures themselves are restricted to the top level parameters expected by a
method. For instance if a method expects one array of structs as a parameter,
and it returns a string, its signature is simply “string, array”. If it expects
three integers and returns a string, its signature is “string, int, int, int”.
If no signature is defined for the method, a non-array value is returned. In
Python this means that the type of the returned value will be something other
than list.
This method takes one parameter, the name of a method implemented by the XML-RPC
server. It returns a documentation string describing the use of that method. If
no such string is available, an empty string is returned. The documentation
string may contain HTML markup.
A working example follows. The server code:
fromxmlrpc.serverimportSimpleXMLRPCServerdefis_even(n):returnn%2==0server=SimpleXMLRPCServer(("localhost",8000))print("Listening on port 8000...")server.register_function(is_even,"is_even")server.serve_forever()
The client code for the preceding server:
importxmlrpc.clientproxy=xmlrpc.client.ServerProxy("http://localhost:8000/")print("3 is even: %s"%str(proxy.is_even(3)))print("100 is even: %s"%str(proxy.is_even(100)))
This class may be initialized with seconds since the epoch, a time
tuple, an ISO 8601 time/date string, or a datetime.datetime
instance. It has the following methods, supported mainly for internal
use by the marshalling/unmarshalling code:
Write the XML-RPC encoding of this DateTime item to the out stream
object.
It also supports certain of Python’s built-in operators through rich comparison
and __repr__() methods.
A working example follows. The server code:
importdatetimefromxmlrpc.serverimportSimpleXMLRPCServerimportxmlrpc.clientdeftoday():today=datetime.datetime.today()returnxmlrpc.client.DateTime(today)server=SimpleXMLRPCServer(("localhost",8000))print("Listening on port 8000...")server.register_function(today,"today")server.serve_forever()
The client code for the preceding server:
importxmlrpc.clientimportdatetimeproxy=xmlrpc.client.ServerProxy("http://localhost:8000/")today=proxy.today()# convert the ISO8601 string to a datetime objectconverted=datetime.datetime.strptime(today.value,"%Y%m%dT%H:%M:%S")print("Today: %s"%converted.strftime("%d.%m.%Y, %H:%M"))
This class may be initialized from string data (which may include NULs). The
primary access to the content of a Binary object is provided by an
attribute:
Write the XML-RPC base 64 encoding of this binary item to the out stream object.
The encoded data will have newlines every 76 characters as per
RFC 2045 section 6.8,
which was the de facto standard base64 specification when the
XML-RPC spec was written.
It also supports certain of Python’s built-in operators through __eq__()
and __ne__() methods.
Example usage of the binary objects. We’re going to transfer an image over
XMLRPC:
fromxmlrpc.serverimportSimpleXMLRPCServerimportxmlrpc.clientdefpython_logo():withopen("python_logo.jpg","rb")ashandle:returnxmlrpc.client.Binary(handle.read())server=SimpleXMLRPCServer(("localhost",8000))print("Listening on port 8000...")server.register_function(python_logo,'python_logo')server.serve_forever()
A string containing a diagnostic message associated with the fault.
In the following example we’re going to intentionally cause a Fault by
returning a complex type object. The server code:
fromxmlrpc.serverimportSimpleXMLRPCServer# A marshalling error is going to occur because we're returning a# complex numberdefadd(x,y):returnx+y+0jserver=SimpleXMLRPCServer(("localhost",8000))print("Listening on port 8000...")server.register_function(add,'add')server.serve_forever()
A ProtocolError object describes a protocol error in the underlying
transport layer (such as a 404 ‘not found’ error if the server named by the URI
does not exist). It has the following attributes:
A dict containing the headers of the HTTP/HTTPS request that triggered the
error.
In the following example we’re going to intentionally cause a ProtocolError
by providing an invalid URI:
importxmlrpc.client# create a ServerProxy with an URI that doesn't respond to XMLRPC requestsproxy=xmlrpc.client.ServerProxy("http://google.com/")try:proxy.some_method()exceptxmlrpc.client.ProtocolErroraserr:print("A protocol error occurred")print("URL: %s"%err.url)print("HTTP/HTTPS headers: %s"%err.headers)print("Error code: %d"%err.errcode)print("Error message: %s"%err.errmsg)
Create an object used to boxcar method calls. server is the eventual target of
the call. Calls can be made to the result object, but they will immediately
return None, and only store the call name and parameters in the
MultiCall object. Calling the object itself causes all stored calls to
be transmitted as a single system.multicall request. The result of this call
is a generator; iterating over this generator yields the individual
results.
A usage example of this class follows. The server code
fromxmlrpc.serverimportSimpleXMLRPCServerdefadd(x,y):returnx+ydefsubtract(x,y):returnx-ydefmultiply(x,y):returnx*ydefdivide(x,y):returnx/y# A simple server with simple arithmetic functionsserver=SimpleXMLRPCServer(("localhost",8000))print("Listening on port 8000...")server.register_multicall_functions()server.register_function(add,'add')server.register_function(subtract,'subtract')server.register_function(multiply,'multiply')server.register_function(divide,'divide')server.serve_forever()
Convert params into an XML-RPC request. or into a response if methodresponse
is true. params can be either a tuple of arguments or an instance of the
Fault exception class. If methodresponse is true, only a single value
can be returned, meaning that params must be of length 1. encoding, if
supplied, is the encoding to use in the generated XML; the default is UTF-8.
Python’s None value cannot be used in standard XML-RPC; to allow using
it via an extension, provide a true value for allow_none.
Convert an XML-RPC request or response into Python objects, a (params,methodname). params is a tuple of argument; methodname is a string, or
None if no method name is present in the packet. If the XML-RPC packet
represents a fault condition, this function will raise a Fault exception.
The use_datetime flag can be used to cause date/time values to be presented as
datetime.datetime objects; this is false by default.
# simple test program (from the XML-RPC specification)fromxmlrpc.clientimportServerProxy,Error# server = ServerProxy("http://localhost:8000") # local serverserver=ServerProxy("http://betty.userland.com")print(server)try:print(server.examples.getStateName(41))exceptErrorasv:print("ERROR",v)
To access an XML-RPC server through a proxy, you need to define a custom
transport. The following example shows how:
The xmlrpc.server module provides a basic server framework for XML-RPC
servers written in Python. Servers can either be free standing, using
SimpleXMLRPCServer, or embedded in a CGI environment, using
CGIXMLRPCRequestHandler.
class xmlrpc.server.SimpleXMLRPCServer(addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True)¶
Create a new server instance. This class provides methods for registration of
functions that can be called by the XML-RPC protocol. The requestHandler
parameter should be a factory for request handler instances; it defaults to
SimpleXMLRPCRequestHandler. The addr and requestHandler parameters
are passed to the socketserver.TCPServer constructor. If logRequests
is true (the default), requests will be logged; setting this parameter to false
will turn off logging. The allow_none and encoding parameters are passed
on to xmlrpc.client and control the XML-RPC responses that will be returned
from the server. The bind_and_activate parameter controls whether
server_bind() and server_activate() are called immediately by the
constructor; it defaults to true. Setting it to false allows code to manipulate
the allow_reuse_address class variable before the address is bound.
class xmlrpc.server.CGIXMLRPCRequestHandler(allow_none=False, encoding=None)¶
Create a new instance to handle XML-RPC requests in a CGI environment. The
allow_none and encoding parameters are passed on to xmlrpc.client
and control the XML-RPC responses that will be returned from the server.
Create a new request handler instance. This request handler supports POST
requests and modifies logging so that the logRequests parameter to the
SimpleXMLRPCServer constructor parameter is honored.
Register a function that can respond to XML-RPC requests. If name is given,
it will be the method name associated with function, otherwise
function.__name__ will be used. name can be either a normal or Unicode
string, and may contain characters not legal in Python identifiers, including
the period character.
Register an object which is used to expose method names which have not been
registered using register_function(). If instance contains a
_dispatch() method, it is called with the requested method name and the
parameters from the request. Its API is def_dispatch(self,method,params)
(note that params does not represent a variable argument list). If it calls
an underlying function to perform its task, that function is called as
func(*params), expanding the parameter list. The return value from
_dispatch() is returned to the client as the result. If instance does
not have a _dispatch() method, it is searched for an attribute matching
the name of the requested method.
If the optional allow_dotted_names argument is true and the instance does not
have a _dispatch() method, then if the requested method name contains
periods, each component of the method name is searched for individually, with
the effect that a simple hierarchical search is performed. The value found from
this search is then called with the parameters from the request, and the return
value is passed back to the client.
Warning
Enabling the allow_dotted_names option allows intruders to access your
module’s global variables and may allow intruders to execute arbitrary code on
your machine. Only use this option on a secure, closed network.
An attribute value that must be a tuple listing valid path portions of the URL
for receiving XML-RPC requests. Requests posted to other paths will result in a
404 “no such page” HTTP error. If this tuple is empty, all paths will be
considered valid. The default value is ('/','/RPC2').
fromxmlrpc.serverimportSimpleXMLRPCServerfromxmlrpc.serverimportSimpleXMLRPCRequestHandler# Restrict to a particular path.classRequestHandler(SimpleXMLRPCRequestHandler):rpc_paths=('/RPC2',)# Create serverserver=SimpleXMLRPCServer(("localhost",8000),requestHandler=RequestHandler)server.register_introspection_functions()# Register pow() function; this will use the value of# pow.__name__ as the name, which is just 'pow'.server.register_function(pow)# Register a function under a different namedefadder_function(x,y):returnx+yserver.register_function(adder_function,'add')# Register an instance; all the methods of the instance are# published as XML-RPC methods (in this case, just 'mul').classMyFuncs:defmul(self,x,y):returnx*yserver.register_instance(MyFuncs())# Run the server's main loopserver.serve_forever()
The following client code will call the methods made available by the preceding
server:
importxmlrpc.clients=xmlrpc.client.ServerProxy('http://localhost:8000')print(s.pow(2,3))# Returns 2**3 = 8print(s.add(2,3))# Returns 5print(s.mul(5,2))# Returns 5*2 = 10# Print list of available methodsprint(s.system.listMethods())
Register a function that can respond to XML-RPC requests. If name is given,
it will be the method name associated with function, otherwise
function.__name__ will be used. name can be either a normal or Unicode
string, and may contain characters not legal in Python identifiers, including
the period character.
Register an object which is used to expose method names which have not been
registered using register_function(). If instance contains a
_dispatch() method, it is called with the requested method name and the
parameters from the request; the return value is returned to the client as the
result. If instance does not have a _dispatch() method, it is searched
for an attribute matching the name of the requested method; if the requested
method name contains periods, each component of the method name is searched for
individually, with the effect that a simple hierarchical search is performed.
The value found from this search is then called with the parameters from the
request, and the return value is passed back to the client.
Handle a XML-RPC request. If request_text is given, it should be the POST
data provided by the HTTP server, otherwise the contents of stdin will be used.
These classes extend the above classes to serve HTML documentation in response
to HTTP GET requests. Servers can either be free standing, using
DocXMLRPCServer, or embedded in a CGI environment, using
DocCGIXMLRPCRequestHandler.
class xmlrpc.server.DocXMLRPCServer(addr, requestHandler=DocXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True)¶
Create a new request handler instance. This request handler supports XML-RPC
POST requests, documentation GET requests, and modifies logging so that the
logRequests parameter to the DocXMLRPCServer constructor parameter is
honored.
The DocXMLRPCServer class is derived from SimpleXMLRPCServer
and provides a means of creating self-documenting, stand alone XML-RPC
servers. HTTP POST requests are handled as XML-RPC method calls. HTTP GET
requests are handled by generating pydoc-style HTML documentation. This allows a
server to provide its own web-based documentation.
Set the description used in the generated HTML documentation. This description
will appear as a paragraph, below the server name, in the documentation.
The DocCGIXMLRPCRequestHandler class is derived from
CGIXMLRPCRequestHandler and provides a means of creating
self-documenting, XML-RPC CGI scripts. HTTP POST requests are handled as XML-RPC
method calls. HTTP GET requests are handled by generating pydoc-style HTML
documentation. This allows a server to provide its own web-based documentation.
Set the description used in the generated HTML documentation. This description
will appear as a paragraph, below the server name, in the documentation.
The modules described in this chapter implement various algorithms or interfaces
that are mainly useful for multimedia applications. They are available at the
discretion of the installation. Here’s an overview:
The audioop module contains some useful operations on sound fragments.
It operates on sound fragments consisting of signed integer samples 8, 16 or 32
bits wide, stored in Python strings. All scalar items are integers, unless
specified otherwise.
This module provides support for a-LAW, u-LAW and Intel/DVI ADPCM encodings.
A few of the more complicated operations only take 16-bit samples, otherwise the
sample size (in bytes) is always a parameter of the operation.
The module defines the following variables and functions:
Return a fragment which is the addition of the two samples passed as parameters.
width is the sample width in bytes, either 1, 2 or 4. Both
fragments should have the same length.
Decode an Intel/DVI ADPCM coded fragment to a linear fragment. See the
description of lin2adpcm() for details on ADPCM coding. Return a tuple
(sample,newstate) where the sample has the width specified in width.
Convert sound fragments in a-LAW encoding to linearly encoded sound fragments.
a-LAW encoding always uses 8 bits samples, so width refers only to the sample
width of the output fragment here.
Return a factor F such that rms(add(fragment,mul(reference,-F))) is
minimal, i.e., return the factor with which you should multiply reference to
make it match as well as possible to fragment. The fragments should both
contain 2-byte samples.
The time taken by this routine is proportional to len(fragment).
Try to match reference as well as possible to a portion of fragment (which
should be the longer fragment). This is (conceptually) done by taking slices
out of fragment, using findfactor() to compute the best match, and
minimizing the result. The fragments should both contain 2-byte samples.
Return a tuple (offset,factor) where offset is the (integer) offset into
fragment where the optimal match started and factor is the (floating-point)
factor as per findfactor().
Search fragment for a slice of length length samples (not bytes!) with
maximum energy, i.e., return i for which rms(fragment[i*2:(i+length)*2])
is maximal. The fragments should both contain 2-byte samples.
The routine takes time proportional to len(fragment).
Convert samples to 4 bit Intel/DVI ADPCM encoding. ADPCM coding is an adaptive
coding scheme, whereby each 4 bit number is the difference between one sample
and the next, divided by a (varying) step. The Intel/DVI ADPCM algorithm has
been selected for use by the IMA, so it may well become a standard.
state is a tuple containing the state of the coder. The coder returns a tuple
(adpcmfrag,newstate), and the newstate should be passed to the next call
of lin2adpcm(). In the initial call, None can be passed as the state.
adpcmfrag is the ADPCM coded fragment packed 2 4-bit values per byte.
Convert samples in the audio fragment to a-LAW encoding and return this as a
Python string. a-LAW is an audio encoding format whereby you get a dynamic
range of about 13 bits using only 8 bit samples. It is used by the Sun audio
hardware, among others.
Convert samples between 1-, 2- and 4-byte formats.
Note
In some audio formats, such as .WAV files, 16 and 32 bit samples are
signed, but 8 bit samples are unsigned. So when converting to 8 bit wide
samples for these formats, you need to also add 128 to the result:
Convert samples in the audio fragment to u-LAW encoding and return this as a
Python string. u-LAW is an audio encoding format whereby you get a dynamic
range of about 14 bits using only 8 bit samples. It is used by the Sun audio
hardware, among others.
state is a tuple containing the state of the converter. The converter returns
a tuple (newfragment,newstate), and newstate should be passed to the next
call of ratecv(). The initial call should pass None as the state.
The weightA and weightB arguments are parameters for a simple digital filter
and default to 1 and 0 respectively.
Convert a stereo fragment to a mono fragment. The left channel is multiplied by
lfactor and the right channel by rfactor before adding the two channels to
give a mono signal.
Generate a stereo fragment from a mono fragment. Each pair of samples in the
stereo fragment are computed from the mono sample, whereby left channel samples
are multiplied by lfactor and right channel samples by rfactor.
Convert sound fragments in u-LAW encoding to linearly encoded sound fragments.
u-LAW encoding always uses 8 bits samples, so width refers only to the sample
width of the output fragment here.
Note that operations such as mul() or max() make no distinction
between mono and stereo fragments, i.e. all samples are treated equal. If this
is a problem the stereo fragment should be split into two mono fragments first
and recombined later. Here is an example of how to do that:
If you use the ADPCM coder to build network packets and you want your protocol
to be stateless (i.e. to be able to tolerate packet loss) you should not only
transmit the data but also the state. Note that you should send the initial
state (the one you passed to lin2adpcm()) along to the decoder, not the
final state (as returned by the coder). If you want to use
struct.struct() to store the state in binary you can code the first
element (the predicted value) in 16 bits and the second (the delta index) in 8.
The ADPCM coders have never been tried against other ADPCM coders, only against
themselves. It could well be that I misinterpreted the standards in which case
they will not be interoperable with the respective standards.
The find*() routines might look a bit funny at first sight. They are
primarily meant to do echo cancellation. A reasonably fast way to do this is to
pick the most energetic piece of the output sample, locate that in the input
sample and subtract the whole output sample from the input sample:
This module provides support for reading and writing AIFF and AIFF-C files.
AIFF is Audio Interchange File Format, a format for storing digital audio
samples in a file. AIFF-C is a newer version of the format that includes the
ability to compress the audio data.
Note
Some operations may only work under IRIX; these will raise ImportError
when attempting to import the cl module, which is only available on
IRIX.
Audio files have a number of parameters that describe the audio data. The
sampling rate or frame rate is the number of times per second the sound is
sampled. The number of channels indicate if the audio is mono, stereo, or
quadro. Each frame consists of one sample per channel. The sample size is the
size in bytes of each sample. Thus a frame consists of
nchannels**samplesize* bytes, and a second’s worth of audio consists of
nchannels**samplesize***framerate* bytes.
For example, CD quality audio has a sample size of two bytes (16 bits), uses two
channels (stereo) and has a frame rate of 44,100 frames/second. This gives a
frame size of 4 bytes (2*2), and a second’s worth occupies 2*2*44100 bytes
(176,400 bytes).
Open an AIFF or AIFF-C file and return an object instance with methods that are
described below. The argument file is either a string naming a file or a
file object. mode must be 'r' or 'rb' when the file must be
opened for reading, or 'w' or 'wb' when the file must be opened for writing.
If omitted, file.mode is used if it exists, otherwise 'rb' is used. When
used for writing, the file object should be seekable, unless you know ahead of
time how many samples you are going to write in total and use
writeframesraw() and setnframes().
Objects returned by open() when a file is opened for reading have the
following methods:
Return a bytes array convertible to a human-readable description
of the type of compression used in the audio file. For AIFF files,
the returned value is b'notcompressed'.
Return a list of markers in the audio file. A marker consists of a tuple of
three elements. The first is the mark ID (an integer), the second is the mark
position in frames from the beginning of the data (an integer), the third is the
name of the mark (a string).
Read and return the next nframes frames from the audio file. The returned
data is a string containing for each frame the uncompressed samples of all
channels.
Close the AIFF file. After calling this method, the object can no longer be
used.
Objects returned by open() when a file is opened for writing have all the
above methods, except for readframes() and setpos(). In addition
the following methods exist. The get*() methods can only be called after
the corresponding set*() methods have been called. Before the first
writeframes() or writeframesraw(), all parameters except for the
number of frames must be filled in.
Create an AIFF file. The default is that an AIFF-C file is created, unless the
name of the file ends in '.aiff' in which case the default is an AIFF file.
Create an AIFF-C file. The default is that an AIFF-C file is created, unless
the name of the file ends in '.aiff' in which case the default is an AIFF
file.
Specify the number of frames that are to be written to the audio file. If this
parameter is not set, or not set correctly, the file needs to support seeking.
Specify the compression type. If not specified, the audio data will
not be compressed. In AIFF files, compression is not possible.
The name parameter should be a human-readable description of the
compression type as a bytes array, the type parameter should be a
bytes array of length 4. Currently the following compression types
are supported: b'NONE', b'ULAW', b'ALAW', b'G722'.
Set all the above parameters at once. The argument is a tuple consisting of the
various parameters. This means that it is possible to use the result of a
getparams() call as argument to setparams().
Like writeframes(), except that the header of the audio file is not
updated.
aifc.close()
Close the AIFF file. The header of the file is updated to reflect the actual
size of the audio data. After calling this method, the object can no longer be
used.
The sunau module provides a convenient interface to the Sun AU sound
format. Note that this module is interface-compatible with the modules
aifc and wave.
An audio file consists of a header followed by the data. The fields of the
header are:
Field
Contents
magic word
The four bytes .snd.
header size
Size of the header, including info, in bytes.
data size
Physical size of the data, in bytes.
encoding
Indicates how the audio samples are encoded.
sample rate
The sampling rate.
# of channels
The number of channels in the samples.
info
ASCII string giving a description of the
audio file (padded with null bytes).
Apart from the info field, all header fields are 4 bytes in size. They are all
32-bit unsigned integers encoded in big-endian byte order.
Reads and returns at most n frames of audio, as a string of bytes. The data
will be returned in linear format. If the original data is in u-LAW format, it
will be converted.
The wave module provides a convenient interface to the WAV sound format.
It does not support compression/decompression, but it does support mono/stereo.
The wave module defines the following function and exception:
If file is a string, open the file by that name, otherwise treat it as a
seekable file-like object. mode can be any of
'r', 'rb'
Read only mode.
'w', 'wb'
Write only mode.
Note that it does not allow read/write WAV files.
A mode of 'r' or 'rb' returns a Wave_read object, while a
mode of 'w' or 'wb' returns a Wave_write object. If
mode is omitted and a file-like object is passed as file, file.mode
is used as the default value for mode (the 'b' flag is still added if
necessary).
If you pass in a file-like object, the wave object will not close it when its
close() method is called; it is the caller’s responsibility to close
the file object.
This module provides an interface for reading files that use EA IFF 85 chunks.
[1] This format is used in at least the Audio Interchange File Format
(AIFF/AIFF-C) and the Real Media File Format (RMFF). The WAVE audio file format
is closely related and can also be read using this module.
A chunk has the following structure:
Offset
Length
Contents
0
4
Chunk ID
4
4
Size of chunk in big-endian
byte order, not including the
header
8
n
Data bytes, where n is the
size given in the preceding
field
8 + n
0 or 1
Pad byte needed if n is odd
and chunk alignment is used
The ID is a 4-byte string which identifies the type of chunk.
The size field (a 32-bit value, encoded using big-endian byte order) gives the
size of the chunk data, not including the 8-byte header.
Usually an IFF-type file consists of one or more chunks. The proposed usage of
the Chunk class defined here is to instantiate an instance at the start
of each chunk and read from the instance until it reaches the end, after which a
new instance can be instantiated. At the end of the file, creating a new
instance will fail with a EOFError exception.
class chunk.Chunk(file, align=True, bigendian=True, inclheader=False)¶
Class which represents a chunk. The file argument is expected to be a
file-like object. An instance of this class is specifically allowed. The
only method that is needed is read(). If the methods seek() and
tell() are present and don’t raise an exception, they are also used.
If these methods are present and raise an exception, they are expected to not
have altered the object. If the optional argument align is true, chunks
are assumed to be aligned on 2-byte boundaries. If align is false, no
alignment is assumed. The default value is true. If the optional argument
bigendian is false, the chunk size is assumed to be in little-endian order.
This is needed for WAVE audio files. The default value is true. If the
optional argument inclheader is true, the size given in the chunk header
includes the size of the header. The default value is false.
Set the chunk’s current position. The whence argument is optional and
defaults to 0 (absolute file positioning); other values are 1
(seek relative to the current position) and 2 (seek relative to the
file’s end). There is no return value. If the underlying file does not
allow seek, only forward seeks are allowed.
Read at most size bytes from the chunk (less if the read hits the end of
the chunk before obtaining size bytes). If the size argument is
negative or omitted, read all data until the end of the chunk. The bytes
are returned as a string object. An empty string is returned when the end
of the chunk is encountered immediately.
Skip to the end of the chunk. All further calls to read() for the
chunk will return ''. If you are not interested in the contents of
the chunk, this method should be called so that the file points to the
start of the next chunk.
The colorsys module defines bidirectional conversions of color values
between colors expressed in the RGB (Red Green Blue) color space used in
computer monitors and three other coordinate systems: YIQ, HLS (Hue Lightness
Saturation) and HSV (Hue Saturation Value). Coordinates in all of these color
spaces are floating point values. In the YIQ space, the Y coordinate is between
0 and 1, but the I and Q coordinates can be positive or negative. In all other
spaces, the coordinates are all between 0 and 1.
Tests the image data contained in the file named by filename, and returns a
string describing the image type. If optional h is provided, the filename
is ignored and h is assumed to contain the byte stream to test.
The following image types are recognized, as listed below with the return value
from what():
Value
Image format
'rgb'
SGI ImgLib Files
'gif'
GIF 87a and 89a Files
'pbm'
Portable Bitmap Files
'pgm'
Portable Graymap Files
'ppm'
Portable Pixmap Files
'tiff'
TIFF Files
'rast'
Sun Raster Files
'xbm'
X Bitmap Files
'jpeg'
JPEG data in JFIF or Exif formats
'bmp'
BMP files
'png'
Portable Network Graphics
You can extend the list of file types imghdr can recognize by appending
to this variable:
A list of functions performing the individual tests. Each function takes two
arguments: the byte-stream and an open file-like object. When what() is
called with a byte-stream, the file-like object will be None.
The test function should return a string describing the image type if the test
succeeded, or None if it failed.
The sndhdr provides utility functions which attempt to determine the type
of sound data which is in a file. When these functions are able to determine
what type of sound data is stored in a file, they return a tuple (type,sampling_rate,channels,frames,bits_per_sample). The value for type
indicates the data type and will be one of the strings 'aifc', 'aiff',
'au', 'hcom', 'sndr', 'sndt', 'voc', 'wav', '8svx',
'sb', 'ub', or 'ul'. The sampling_rate will be either the actual
value or 0 if unknown or difficult to decode. Similarly, channels will be
either the number of channels or 0 if it cannot be determined or if the
value is difficult to decode. The value for frames will be either the number
of frames or -1. The last item in the tuple, bits_per_sample, will either
be the sample size in bits or 'A' for A-LAW or 'U' for u-LAW.
Determines the type of sound data stored in the file filename using
whathdr(). If it succeeds, returns a tuple as described above, otherwise
None is returned.
Determines the type of sound data stored in a file based on the file header.
The name of the file is given by filename. This function returns a tuple as
described above on success, or None.
ossaudiodev — Access to OSS-compatible audio devices¶
This module allows you to access the OSS (Open Sound System) audio interface.
OSS is available for a wide range of open-source and commercial Unices, and is
the standard audio interface for Linux and recent versions of FreeBSD.
This exception is raised on certain errors. The argument is a string describing
what went wrong.
(If ossaudiodev receives an error from a system call such as
open(), write(), or ioctl(), it raises IOError.
Errors detected directly by ossaudiodev result in OSSAudioError.)
(For backwards compatibility, the exception class is also available as
ossaudiodev.error.)
Open an audio device and return an OSS audio device object. This object
supports many file-like methods, such as read(), write(), and
fileno() (although there are subtle differences between conventional Unix
read/write semantics and those of OSS audio devices). It also supports a number
of audio-specific methods; see below for the complete list of methods.
device is the audio device filename to use. If it is not specified, this
module first looks in the environment variable AUDIODEV for a device
to use. If not found, it falls back to /dev/dsp.
mode is one of 'r' for read-only (record) access, 'w' for
write-only (playback) access and 'rw' for both. Since many sound cards
only allow one process to have the recorder or player open at a time, it is a
good idea to open the device only for the activity needed. Further, some
sound cards are half-duplex: they can be opened for reading or writing, but
not both at once.
Note the unusual calling syntax: the first argument is optional, and the
second is required. This is a historical artifact for compatibility with the
older linuxaudiodev module which ossaudiodev supersedes.
Open a mixer device and return an OSS mixer device object. device is the
mixer device filename to use. If it is not specified, this module first looks
in the environment variable MIXERDEV for a device to use. If not
found, it falls back to /dev/mixer.
Before you can write to or read from an audio device, you must call three
methods in the correct order:
setfmt() to set the output format
channels() to set the number of channels
speed() to set the sample rate
Alternately, you can use the setparameters() method to set all three audio
parameters at once. This is more convenient, but may not be as flexible in all
cases.
The audio device objects returned by open() define the following methods
and (read-only) attributes:
Explicitly close the audio device. When you are done writing to or reading from
an audio device, you should explicitly close it. A closed device cannot be used
again.
Read size bytes from the audio input and return them as a Python string.
Unlike most Unix device drivers, OSS audio devices in blocking mode (the
default) will block read() until the entire requested amount of data is
available.
Write the Python string data to the audio device and return the number of
bytes written. If the audio device is in blocking mode (the default), the
entire string is always written (again, this is different from usual Unix device
semantics). If the device is in non-blocking mode, some data may not be written
—see writeall().
Write the entire Python string data to the audio device: waits until the audio
device is able to accept data, writes as much data as it will accept, and
repeats until data has been completely written. If the device is in blocking
mode (the default), this has the same effect as write(); writeall()
is only useful in non-blocking mode. Has no return value, since the amount of
data written is always equal to the amount of data supplied.
Changed in version 3.2:
Changed in version 3.2: Audio device objects also support the context manager protocol, i.e. they can
be used in a with statement.
The following methods each map to exactly one ioctl() system call. The
correspondence is obvious: for example, setfmt() corresponds to the
SNDCTL_DSP_SETFMT ioctl, and sync() to SNDCTL_DSP_SYNC (this can
be useful when consulting the OSS documentation). If the underlying
ioctl() fails, they all raise IOError.
Return a bitmask of the audio output formats supported by the soundcard. Some
of the formats supported by OSS are:
Format
Description
AFMT_MU_LAW
a logarithmic encoding (used by Sun .au
files and /dev/audio)
AFMT_A_LAW
a logarithmic encoding
AFMT_IMA_ADPCM
a 4:1 compressed format defined by the
Interactive Multimedia Association
AFMT_U8
Unsigned, 8-bit audio
AFMT_S16_LE
Signed, 16-bit audio, little-endian byte
order (as used by Intel processors)
AFMT_S16_BE
Signed, 16-bit audio, big-endian byte order
(as used by 68k, PowerPC, Sparc)
AFMT_S8
Signed, 8 bit audio
AFMT_U16_LE
Unsigned, 16-bit little-endian audio
AFMT_U16_BE
Unsigned, 16-bit big-endian audio
Consult the OSS documentation for a full list of audio formats, and note that
most devices support only a subset of these formats. Some older devices only
support AFMT_U8; the most common format used today is
AFMT_S16_LE.
Try to set the current audio format to format—see getfmts() for a
list. Returns the audio format that the device was set to, which may not be the
requested format. May also be used to return the current audio format—do this
by passing an “audio format” of AFMT_QUERY.
Set the number of output channels to nchannels. A value of 1 indicates
monophonic sound, 2 stereophonic. Some devices may have more than 2 channels,
and some high-end devices may not support mono. Returns the number of channels
the device was set to.
Try to set the audio sampling rate to samplerate samples per second. Returns
the rate actually set. Most sound devices don’t support arbitrary sampling
rates. Common rates are:
Rate
Description
8000
default rate for /dev/audio
11025
speech recording
22050
44100
CD quality audio (at 16 bits/sample and 2
channels)
Wait until the sound device has played every byte in its buffer. (This happens
implicitly when the device is closed.) The OSS documentation recommends closing
and re-opening the device rather than using sync().
Immediately stop playing or recording and return the device to a state where it
can accept commands. The OSS documentation recommends closing and re-opening
the device after calling reset().
Tell the driver that there is likely to be a pause in the output, making it
possible for the device to handle the pause more intelligently. You might use
this after playing a spot sound effect, before waiting for user input, or before
doing disk I/O.
The following convenience methods combine several ioctls, or one ioctl and some
simple calculations.
Set the key audio sampling parameters—sample format, number of channels, and
sampling rate—in one method call. format, nchannels, and samplerate
should be as specified in the setfmt(), channels(), and
speed() methods. If strict is true, setparameters() checks to
see if each parameter was actually set to the requested value, and raises
OSSAudioError if not. Returns a tuple (format, nchannels,
samplerate) indicating the parameter values that were actually set by the
device driver (i.e., the same as the return values of setfmt(),
channels(), and speed()).
This method returns a bitmask specifying the available mixer controls (“Control”
being a specific mixable “channel”, such as SOUND_MIXER_PCM or
SOUND_MIXER_SYNTH). This bitmask indicates a subset of all available
mixer controls—the SOUND_MIXER_* constants defined at module level.
To determine if, for example, the current mixer object supports a PCM mixer, use
the following Python code:
mixer=ossaudiodev.openmixer()ifmixer.controls()&(1<<ossaudiodev.SOUND_MIXER_PCM):# PCM is supported...code...
For most purposes, the SOUND_MIXER_VOLUME (master volume) and
SOUND_MIXER_PCM controls should suffice—but code that uses the mixer
should be flexible when it comes to choosing mixer controls. On the Gravis
Ultrasound, for example, SOUND_MIXER_VOLUME does not exist.
Returns a bitmask indicating stereo mixer controls. If a bit is set, the
corresponding control is stereo; if it is unset, the control is either
monophonic or not supported by the mixer (use in combination with
controls() to determine which).
See the code example for the controls() function for an example of getting
data from a bitmask.
Returns a bitmask specifying the mixer controls that may be used to record. See
the code example for controls() for an example of reading from a bitmask.
Returns the volume of a given mixer control. The returned volume is a 2-tuple
(left_volume,right_volume). Volumes are specified as numbers from 0
(silent) to 100 (full volume). If the control is monophonic, a 2-tuple is still
returned, but both volumes are the same.
Raises OSSAudioError if an invalid control was is specified, or
IOError if an unsupported control is specified.
Sets the volume for a given mixer control to (left,right). left and
right must be ints and between 0 (silent) and 100 (full volume). On
success, the new volume is returned as a 2-tuple. Note that this may not be
exactly the same as the volume specified, because of the limited resolution of
some soundcard’s mixers.
Raises OSSAudioError if an invalid mixer control was specified, or if the
specified volumes were out-of-range.
Call this function to specify a recording source. Returns a bitmask indicating
the new recording source (or sources) if successful; raises IOError if an
invalid source was specified. To set the current recording source to the
microphone input:
The modules described in this chapter help you write software that is
independent of language and locale by providing mechanisms for selecting a
language to be used in program messages or by tailoring output to match local
conventions.
The gettext module provides internationalization (I18N) and localization
(L10N) services for your Python modules and applications. It supports both the
GNU gettext message catalog API and a higher level, class-based API that may
be more appropriate for Python files. The interface described below allows you
to write your module and application messages in one natural language, and
provide a catalog of translated messages for running under different natural
languages.
Some hints on localizing your Python modules and applications are also given.
The gettext module defines the following API, which is very similar to
the GNU gettext API. If you use this API you will affect the
translation of your entire application globally. Often this is what you want if
your application is monolingual, with the choice of language dependent on the
locale of your user. If you are localizing a Python module, or if your
application needs to switch languages on the fly, you probably want to use the
class-based API instead.
Bind the domain to the locale directory localedir. More concretely,
gettext will look for binary .mo files for the given domain using
the path (on Unix): localedir/language/LC_MESSAGES/domain.mo, where
languages is searched for in the environment variables LANGUAGE,
LC_ALL, LC_MESSAGES, and LANG respectively.
If localedir is omitted or None, then the current binding for domain is
returned. [1]
Bind the domain to codeset, changing the encoding of strings returned by the
gettext() family of functions. If codeset is omitted, then the current
binding is returned.
Change or query the current global domain. If domain is None, then the
current global domain is returned, otherwise the global domain is set to
domain, which is returned.
Return the localized translation of message, based on the current global
domain, language, and locale directory. This function is usually aliased as
_() in the local namespace (see examples below).
Equivalent to gettext(), but the translation is returned in the
preferred system encoding, if no other encoding was explicitly set with
bind_textdomain_codeset().
Equivalent to dgettext(), but the translation is returned in the
preferred system encoding, if no other encoding was explicitly set with
bind_textdomain_codeset().
Like gettext(), but consider plural forms. If a translation is found,
apply the plural formula to n, and return the resulting message (some
languages have more than two plural forms). If no translation is found, return
singular if n is 1; return plural otherwise.
The Plural formula is taken from the catalog header. It is a C or Python
expression that has a free variable n; the expression evaluates to the index
of the plural in the catalog. See the GNU gettext documentation for the precise
syntax to be used in .po files and the formulas for a variety of
languages.
Equivalent to ngettext(), but the translation is returned in the
preferred system encoding, if no other encoding was explicitly set with
bind_textdomain_codeset().
Equivalent to dngettext(), but the translation is returned in the
preferred system encoding, if no other encoding was explicitly set with
bind_textdomain_codeset().
Note that GNU gettext also defines a dcgettext() method, but
this was deemed not useful and so it is currently unimplemented.
Here’s an example of typical usage for this API:
importgettextgettext.bindtextdomain('myapplication','/path/to/my/language/directory')gettext.textdomain('myapplication')_=gettext.gettext# ...print(_('This is a translatable string.'))
The class-based API of the gettext module gives you more flexibility and
greater convenience than the GNU gettext API. It is the recommended
way of localizing your Python applications and modules. gettext defines
a “translations” class which implements the parsing of GNU .mo format
files, and has methods for returning strings. Instances of this “translations”
class can also install themselves in the built-in namespace as the function
_().
This function implements the standard .mo file search algorithm. It
takes a domain, identical to what textdomain() takes. Optional
localedir is as in bindtextdomain() Optional languages is a list of
strings, where each string is a language code.
If localedir is not given, then the default system locale directory is used.
[2] If languages is not given, then the following environment variables are
searched: LANGUAGE, LC_ALL, LC_MESSAGES, and
LANG. The first one returning a non-empty value is used for the
languages variable. The environment variables should contain a colon separated
list of languages, which will be split on the colon to produce the expected list
of language code strings.
find() then expands and normalizes the languages, and then iterates
through them, searching for an existing file built of these components:
localedir/language/LC_MESSAGES/domain.mo
The first such file name that exists is returned by find(). If no such
file is found, then None is returned. If all is given, it returns a list
of all file names, in the order in which they appear in the languages list or
the environment variables.
Return a Translations instance based on the domain, localedir,
and languages, which are first passed to find() to get a list of the
associated .mo file paths. Instances with identical .mo file
names are cached. The actual class instantiated is either class_ if
provided, otherwise GNUTranslations. The class’s constructor must
take a single file object argument. If provided, codeset will change
the charset used to encode translated strings in the lgettext() and
lngettext() methods.
If multiple files are found, later files are used as fallbacks for earlier ones.
To allow setting the fallback, copy.copy() is used to clone each
translation object from the cache; the actual instance data is still shared with
the cache.
If no .mo file is found, this function raises IOError if
fallback is false (which is the default), and returns a
NullTranslations instance if fallback is true.
This installs the function _() in Python’s builtins namespace, based on
domain, localedir, and codeset which are passed to the function
translation().
For the names parameter, please see the description of the translation
object’s install() method.
As seen below, you usually mark the strings in your application that are
candidates for translation, by wrapping them in a call to the _()
function, like this:
print(_('This string will be translated.'))
For convenience, you want the _() function to be installed in Python’s
builtins namespace, so it is easily accessible in all modules of your
application.
Translation classes are what actually implement the translation of original
source file message strings to translated message strings. The base class used
by all translation classes is NullTranslations; this provides the basic
interface you can use to write your own specialized translation classes. Here
are the methods of NullTranslations:
Takes an optional file objectfp, which is ignored by the base class.
Initializes “protected” instance variables _info and _charset which are set
by derived classes, as well as _fallback, which is set through
add_fallback(). It then calls self._parse(fp) if fp is not
None.
No-op’d in the base class, this method takes file object fp, and reads
the data from the file, initializing its message catalog. If you have an
unsupported message catalog file format, you should override this method
to parse your format.
Add fallback as the fallback object for the current translation object.
A translation object should consult the fallback if it cannot provide a
translation for a given message.
This method installs self.gettext() into the built-in namespace,
binding it to _.
If the names parameter is given, it must be a sequence containing the
names of functions you want to install in the builtins namespace in
addition to _(). Supported names are 'gettext' (bound to
self.gettext()), 'ngettext' (bound to self.ngettext()),
'lgettext' and 'lngettext'.
Note that this is only one way, albeit the most convenient way, to make
the _() function available to your application. Because it affects
the entire application globally, and specifically the built-in namespace,
localized modules should never install _(). Instead, they should use
this code to make _() available to their module:
The gettext module provides one additional class derived from
NullTranslations: GNUTranslations. This class overrides
_parse() to enable reading GNU gettext format .mo files
in both big-endian and little-endian format.
GNUTranslations parses optional meta-data out of the translation
catalog. It is convention with GNU gettext to include meta-data as
the translation for the empty string. This meta-data is in RFC 822-style
key:value pairs, and should contain the Project-Id-Version key. If the
key Content-Type is found, then the charset property is used to
initialize the “protected” _charset instance variable, defaulting to
None if not found. If the charset encoding is specified, then all message
ids and message strings read from the catalog are converted to Unicode using
this encoding, else ASCII encoding is assumed.
Since message ids are read as Unicode strings too, all *gettext() methods
will assume message ids as Unicode strings, not byte strings.
The entire set of key/value pairs are placed into a dictionary and set as the
“protected” _info instance variable.
If the .mo file’s magic number is invalid, or if other problems occur
while reading the file, instantiating a GNUTranslations class can raise
IOError.
The following methods are overridden from the base class implementation:
Look up the message id in the catalog and return the corresponding message
string, as a Unicode string. If there is no entry in the catalog for the
message id, and a fallback has been set, the look up is forwarded to the
fallback’s gettext() method. Otherwise, the message id is returned.
Equivalent to gettext(), but the translation is returned as a
bytestring encoded in the selected output charset, or in the preferred system
encoding if no encoding was explicitly set with set_output_charset().
Do a plural-forms lookup of a message id. singular is used as the message id
for purposes of lookup in the catalog, while n is used to determine which
plural form to use. The returned message string is a Unicode string.
If the message id is not found in the catalog, and a fallback is specified, the
request is forwarded to the fallback’s ngettext() method. Otherwise, when
n is 1 singular is returned, and plural is returned in all other cases.
Here is an example:
n=len(os.listdir('.'))cat=GNUTranslations(somefile)message=cat.ngettext('There is %(num)d file in this directory','There are %(num)d files in this directory',n)%{'num':n}
Equivalent to gettext(), but the translation is returned as a
bytestring encoded in the selected output charset, or in the preferred system
encoding if no encoding was explicitly set with set_output_charset().
The Solaris operating system defines its own binary .mo file format, but
since no documentation can be found on this format, it is not supported at this
time.
For compatibility with this older module, the function Catalog() is an
alias for the translation() function described above.
One difference between this module and Henstridge’s: his catalog objects
supported access through a mapping API, but this appears to be unused and so is
not currently supported.
Internationalization (I18N) refers to the operation by which a program is made
aware of multiple languages. Localization (L10N) refers to the adaptation of
your program, once internationalized, to the local language and cultural habits.
In order to provide multilingual messages for your Python programs, you need to
take the following steps:
prepare your program or module by specially marking translatable strings
run a suite of tools over your marked files to generate raw messages catalogs
create language specific translations of the message catalogs
use the gettext module so that message strings are properly translated
In order to prepare your code for I18N, you need to look at all the strings in
your files. Any string that needs to be translated should be marked by wrapping
it in _('...') — that is, a call to the function _(). For example:
filename='mylog.txt'message=_('writing a log message')fp=open(filename,'w')fp.write(message)fp.close()
In this example, the string 'writingalogmessage' is marked as a candidate
for translation, while the strings 'mylog.txt' and 'w' are not.
The Python distribution comes with two tools which help you generate the message
catalogs once you’ve prepared your source code. These may or may not be
available from a binary distribution, but they can be found in a source
distribution, in the Tools/i18n directory.
The pygettext[3] program scans all your Python source code looking
for the strings you previously marked as translatable. It is similar to the GNU
gettext program except that it understands all the intricacies of
Python source code, but knows nothing about C or C++ source code. You don’t
need GNU gettext unless you’re also going to be translating C code (such as
C extension modules).
pygettext generates textual Uniforum-style human readable message
catalog .pot files, essentially structured human readable files which
contain every marked string in the source code, along with a placeholder for the
translation strings. pygettext is a command line script that supports
a similar command line interface as xgettext; for details on its use,
run:
pygettext.py--help
Copies of these .pot files are then handed over to the individual human
translators who write language-specific versions for every supported natural
language. They send you back the filled in language-specific versions as a
.po file. Using the msgfmt.py[4] program (in the
Tools/i18n directory), you take the .po files from your
translators and generate the machine-readable .mo binary catalog files.
The .mo files are what the gettext module uses for the actual
translation processing during run-time.
How you use the gettext module in your code depends on whether you are
internationalizing a single module or your entire application. The next two
sections will discuss each case.
If you are localizing your module, you must take care not to make global
changes, e.g. to the built-in namespace. You should not use the GNU gettext
API but instead the class-based API.
Let’s say your module is called “spam” and the module’s various natural language
translation .mo files reside in /usr/share/locale in GNU
gettext format. Here’s what you would put at the top of your
module:
If you are localizing your application, you can install the _() function
globally into the built-in namespace, usually in the main driver file of your
application. This will let all your application-specific files just use
_('...') without having to explicitly install it in each file.
In the simple case then, you need only add the following bit of code to the main
driver file of your application:
importgettextgettext.install('myapplication')
If you need to set the locale directory, you can pass these into the
install() function:
If your program needs to support many languages at the same time, you may want
to create multiple translation instances and then switch between them
explicitly, like so:
importgettextlang1=gettext.translation('myapplication',languages=['en'])lang2=gettext.translation('myapplication',languages=['fr'])lang3=gettext.translation('myapplication',languages=['de'])# start by using language1lang1.install()# ... time goes by, user selects language 2lang2.install()# ... more time goes by, user selects language 3lang3.install()
In most coding situations, strings are translated where they are coded.
Occasionally however, you need to mark strings for translation, but defer actual
translation until later. A classic example is:
This works because the dummy definition of _() simply returns the string
unchanged. And this dummy definition will temporarily override any definition
of _() in the built-in namespace (until the del command). Take
care, though if you have a previous definition of _() in the local
namespace.
Note that the second use of _() will not identify “a” as being
translatable to the pygettext program, since it is not a string.
Another way to handle this is with the following example:
In this case, you are marking translatable strings with the function N_(),
[5] which won’t conflict with any definition of _(). However, you will
need to teach your message extraction program to look for translatable strings
marked with N_(). pygettext and xpot both support
this through the use of command line switches.
The default locale directory is system dependent; for example, on RedHat Linux
it is /usr/share/locale, but on Solaris it is /usr/lib/locale.
The gettext module does not try to support these system dependent
defaults; instead its default is sys.prefix/share/locale. For this
reason, it is always best to call bindtextdomain() with an explicit
absolute path at the start of your application.
François Pinard has written a program called xpot which does a
similar job. It is available as part of his po-utils package at http
://po-utils.progiciels-bpi.ca/.
msgfmt.py is binary compatible with GNU msgfmt except that
it provides a simpler, all-Python implementation. With this and
pygettext.py, you generally won’t need to install the GNU
gettext package to internationalize your Python applications.
The locale module opens access to the POSIX locale database and
functionality. The POSIX locale mechanism allows programmers to deal with
certain cultural issues in an application, without requiring the programmer to
know all the specifics of each country where the software is executed.
The locale module is implemented on top of the _locale module,
which in turn uses an ANSI C locale implementation if available.
The locale module defines the following exception and functions:
If locale is specified, it may be a string, a tuple of the form (languagecode,encoding), or None. If it is a tuple, it is converted to a string
using the locale aliasing engine. If locale is given and not None,
setlocale() modifies the locale setting for the category. The available
categories are listed in the data description below. The value is the name of a
locale. An empty string specifies the user’s default settings. If the
modification of the locale fails, the exception Error is raised. If
successful, the new locale setting is returned.
If locale is omitted or None, the current setting for category is
returned.
setlocale() is not thread-safe on most systems. Applications typically
start with a call of
importlocalelocale.setlocale(locale.LC_ALL,'')
This sets the locale for all categories to the user’s default setting (typically
specified in the LANG environment variable). If the locale is not
changed thereafter, using multithreading should not cause problems.
Sequence of numbers specifying
which relative positions the
'thousands_sep' is
expected. If the sequence is
terminated with
CHAR_MAX, no further
grouping is performed. If the
sequence terminates with a
0, the last group size is
repeatedly used.
Return some locale-specific information as a string. This function is not
available on all systems, and the set of possible options might also vary
across platforms. The possible argument values are numbers, for which
symbolic constants are available in the locale module.
The nl_langinfo() function accepts one of the following keys. Most
descriptions are taken from the corresponding description in the GNU C
library.
Get the currency symbol, preceded by “-” if the symbol should appear before
the value, “+” if the symbol should appear after the value, or ”.” if the
symbol should replace the radix character.
Get a string that represents the era used in the current locale.
Most locales do not define this value. An example of a locale which does
define this value is the Japanese one. In Japan, the traditional
representation of dates includes the name of the era corresponding to the
then-emperor’s reign.
Normally it should not be necessary to use this value directly. Specifying
the E modifier in their format strings causes the strftime()
function to use this information. The format of the returned string is not
specified, and therefore you should not assume knowledge of it on different
systems.
Tries to determine the default locale settings and returns them as a tuple of
the form (languagecode,encoding).
According to POSIX, a program which has not called setlocale(LC_ALL,'')
runs using the portable 'C' locale. Calling setlocale(LC_ALL,'') lets
it use the default locale as defined by the LANG variable. Since we
do not want to interfere with the current locale setting we thus emulate the
behavior in the way described above.
To maintain compatibility with other platforms, not only the LANG
variable is tested, but a list of variables given as envvars parameter. The
first found to be defined will be used. envvars defaults to the search
path used in GNU gettext; it must always contain the variable name
'LANG'. The GNU gettext search path contains 'LC_ALL',
'LC_CTYPE', 'LANG' and 'LANGUAGE', in that order.
Except for the code 'C', the language code corresponds to RFC 1766.
language code and encoding may be None if their values cannot be
determined.
Returns the current setting for the given locale category as sequence containing
language code, encoding. category may be one of the LC_* values
except LC_ALL. It defaults to LC_CTYPE.
Except for the code 'C', the language code corresponds to RFC 1766.
language code and encoding may be None if their values cannot be
determined.
Return the encoding used for text data, according to user preferences. User
preferences are expressed differently on different systems, and might not be
available programmatically on some systems, so this function only returns a
guess.
On some systems, it is necessary to invoke setlocale() to obtain the user
preferences, so this function is not thread-safe. If invoking setlocale is not
necessary or desired, do_setlocale should be set to False.
Returns a normalized locale code for the given locale name. The returned locale
code is formatted for use with setlocale(). If normalization fails, the
original name is returned unchanged.
If the given encoding is not known, the function defaults to the default
encoding for the locale code just like setlocale().
Compares two strings according to the current LC_COLLATE setting. As
any other compare function, returns a negative, or a positive value, or 0,
depending on whether string1 collates before or after string2 or is equal to
it.
Transforms a string to one that can be used in locale-aware
comparisons. For example, strxfrm(s1)<strxfrm(s2) is
equivalent to strcoll(s1,s2)<0. This function can be used
when the same string is compared repeatedly, e.g. when collating a
sequence of strings.
Formats a number val according to the current LC_NUMERIC setting.
The format follows the conventions of the % operator. For floating point
values, the decimal point is modified if appropriate. If grouping is true,
also takes the grouping into account.
If monetary is true, the conversion uses monetary thousands separator and
grouping strings.
Please note that this function will only work for exactly one %char specifier.
For whole format strings, use format_string().
Formats a number val according to the current LC_MONETARY settings.
The returned string includes the currency symbol if symbol is true, which is
the default. If grouping is true (which is not the default), grouping is done
with the value. If international is true (which is not the default), the
international currency symbol is used.
Note that this function will not work with the ‘C’ locale, so you have to set a
locale via setlocale() first.
Locale category for the character type functions. Depending on the settings of
this category, the functions of module string dealing with case change
their behaviour.
Locale category for message display. Python currently does not support
application specific locale-aware messages. Messages displayed by the operating
system, like those returned by os.strerror() might be affected by this
category.
Locale category for formatting numbers. The functions format(),
atoi(), atof() and str() of the locale module are
affected by that category. All other numeric formatting operations are not
affected.
Combination of all locale settings. If this flag is used when the locale is
changed, setting the locale for all categories is attempted. If that fails for
any category, no category is changed at all. When the locale is retrieved using
this flag, a string indicating the setting for all categories is returned. This
string can be later used to restore the settings.
This is a symbolic constant used for different values returned by
localeconv().
Example:
>>> importlocale>>> loc=locale.getlocale()# get current locale# use German locale; name might vary with platform>>> locale.setlocale(locale.LC_ALL,'de_DE')>>> locale.strcoll('f\xe4n','foo')# compare a string containing an umlaut>>> locale.setlocale(locale.LC_ALL,'')# use user's preferred locale>>> locale.setlocale(locale.LC_ALL,'C')# use default (C) locale>>> locale.setlocale(locale.LC_ALL,loc)# restore saved locale
The C standard defines the locale as a program-wide property that may be
relatively expensive to change. On top of that, some implementation are broken
in such a way that frequent locale changes may cause core dumps. This makes the
locale somewhat painful to use correctly.
Initially, when a program is started, the locale is the C locale, no matter
what the user’s preferred locale is. The program must explicitly say that it
wants the user’s preferred locale settings by calling setlocale(LC_ALL,'').
It is generally a bad idea to call setlocale() in some library routine,
since as a side effect it affects the entire program. Saving and restoring it
is almost as bad: it is expensive and affects other threads that happen to run
before the settings have been restored.
If, when coding a module for general use, you need a locale independent version
of an operation that is affected by the locale (such as
certain formats used with time.strftime()), you will have to find a way to
do it without using the standard library routine. Even better is convincing
yourself that using locale settings is okay. Only as a last resort should you
document that your module is not compatible with non-C locale settings.
The only way to perform numeric operations according to the locale is to use the
special functions defined by this module: atof(), atoi(),
format(), str().
There is no way to perform case conversions and character classifications
according to the locale. For (Unicode) text strings these are done according
to the character value only, while for byte strings, the conversions and
classifications are done according to the ASCII value of the byte, and bytes
whose high bit is set (i.e., non-ASCII bytes) are never converted or considered
part of a character class such as letter or whitespace.
For extension writers and programs that embed Python¶
Extension modules should never call setlocale(), except to find out what
the current locale is. But since the return value can only be used portably to
restore it, that is not very useful (except perhaps to find out whether or not
the locale is C).
When Python code uses the locale module to change the locale, this also
affects the embedding application. If the embedding application doesn’t want
this to happen, it should remove the _locale extension module (which does
all the work) from the table of built-in modules in the config.c file,
and make sure that the _locale module is not accessible as a shared
library.
The locale module exposes the C library’s gettext interface on systems that
provide this interface. It consists of the functions gettext(),
dgettext(), dcgettext(), textdomain(), bindtextdomain(),
and bind_textdomain_codeset(). These are similar to the same functions in
the gettext module, but use the C library’s binary format for message
catalogs, and the C library’s search algorithms for locating message catalogs.
Python applications should normally find no need to invoke these functions, and
should use gettext instead. A known exception to this rule are
applications that link with additional C libraries which internally invoke
gettext() or dcgettext(). For these applications, it may be
necessary to bind the text domain, so that the libraries can properly locate
their message catalogs.
The modules described in this chapter are frameworks that will largely dictate
the structure of your program. Currently the modules described here are all
oriented toward writing command-line interfaces.
The full list of modules described in this chapter is:
Turtle graphics is a popular way for introducing programming to kids. It was
part of the original Logo programming language developed by Wally Feurzig and
Seymour Papert in 1966.
Imagine a robotic turtle starting at (0, 0) in the x-y plane. After an importturtle, give it the
command turtle.forward(15), and it moves (on-screen!) 15 pixels in the
direction it is facing, drawing a line as it moves. Give it the command
turtle.right(25), and it rotates in-place 25 degrees clockwise.
Turtle star
Turtle can draw intricate shapes using programs that repeat simple
moves.
By combining together these and similar commands, intricate shapes and pictures
can easily be drawn.
The turtle module is an extended reimplementation of the same-named
module from the Python standard distribution up to version Python 2.5.
It tries to keep the merits of the old turtle module and to be (nearly) 100%
compatible with it. This means in the first place to enable the learning
programmer to use all the commands, classes and methods interactively when using
the module from within IDLE run with the -n switch.
The turtle module provides turtle graphics primitives, in both object-oriented
and procedure-oriented ways. Because it uses tkinter for the underlying
graphics, it needs a version of Python installed with Tk support.
The object-oriented interface uses essentially two+two classes:
The TurtleScreen class defines graphics windows as a playground for
the drawing turtles. Its constructor needs a tkinter.Canvas or a
ScrolledCanvas as argument. It should be used when turtle is
used as part of some application.
The function Screen() returns a singleton object of a
TurtleScreen subclass. This function should be used when
turtle is used as a standalone tool for doing graphics.
As a singleton object, inheriting from its class is not possible.
All methods of TurtleScreen/Screen also exist as functions, i.e. as part of
the procedure-oriented interface.
RawTurtle (alias: RawPen) defines Turtle objects which draw
on a TurtleScreen. Its constructor needs a Canvas, ScrolledCanvas
or TurtleScreen as argument, so the RawTurtle objects know where to draw.
Derived from RawTurtle is the subclass Turtle (alias: Pen),
which draws on “the” Screen instance which is automatically
created, if not already present.
All methods of RawTurtle/Turtle also exist as functions, i.e. part of the
procedure-oriented interface.
The procedural interface provides functions which are derived from the methods
of the classes Screen and Turtle. They have the same names as
the corresponding methods. A screen object is automatically created whenever a
function derived from a Screen method is called. An (unnamed) turtle object is
automatically created whenever any of the functions derived from a Turtle method
is called.
To use multiple turtles on a screen one has to use the object-oriented interface.
Note
In the following documentation the argument list for functions is given.
Methods, of course, have the additional first argument self which is
omitted here.
Turn turtle right by angle units. (Units are by default degrees, but
can be set via the degrees() and radians() functions.) Angle
orientation depends on the turtle mode, see mode().
Turn turtle left by angle units. (Units are by default degrees, but
can be set via the degrees() and radians() functions.) Angle
orientation depends on the turtle mode, see mode().
Draw a circle with given radius. The center is radius units left of
the turtle; extent – an angle – determines which part of the circle
is drawn. If extent is not given, draw the entire circle. If extent
is not a full circle, one endpoint of the arc is the current pen
position. Draw the arc in counterclockwise direction if radius is
positive, otherwise in clockwise direction. Finally the direction of the
turtle is changed by the amount of extent.
As the circle is approximated by an inscribed regular polygon, steps
determines the number of steps to use. If not given, it will be
calculated automatically. May be used to draw regular polygons.
Stamp a copy of the turtle shape onto the canvas at the current turtle
position. Return a stamp_id for that stamp, which can be used to delete
it by calling clearstamp(stamp_id).
x – a number or a pair/vector of numbers or a turtle instance
y – a number if x is a number, else None
Return the angle between the line from turtle position to position specified
by (x,y), the vector or the other turtle. This depends on the turtle’s start
orientation which depends on the mode - “standard”/”world” or “logo”).
Set angle measurement units, i.e. set number of “degrees” for a full circle.
Default value is 360 degrees.
>>> turtle.home()>>> turtle.left(90)>>> turtle.heading()90.0Change angle measurement unit to grad (also known as gon,grade, or gradian and equals 1/100-th of the right angle.)>>> turtle.degrees(400.0)>>> turtle.heading()100.0>>> turtle.degrees(360)>>> turtle.heading()90.0
Set the line thickness to width or return it. If resizemode is set to
“auto” and turtleshape is a polygon, that polygon is drawn with the same line
thickness. If no argument is given, the current pensize is returned.
>>> turtle.pensize()1>>> turtle.pensize(10)# from here on lines of width 10 are drawn
This dictionary can be used as argument for a subsequent call to pen()
to restore the former pen-state. Moreover one or more of these attributes
can be provided as keyword-arguments. This can be used to set several pen
attributes in one statement.
Return the current pencolor as color specification string or
as a tuple (see example). May be used as input to another
color/pencolor/fillcolor call.
pencolor(colorstring)
Set pencolor to colorstring, which is a Tk color specification string,
such as "red", "yellow", or "#33cc8c".
pencolor((r,g,b))
Set pencolor to the RGB color represented by the tuple of r, g, and
b. Each of r, g, and b must be in the range 0..colormode, where
colormode is either 1.0 or 255 (see colormode()).
pencolor(r,g,b)
Set pencolor to the RGB color represented by r, g, and b. Each of
r, g, and b must be in the range 0..colormode.
If turtleshape is a polygon, the outline of that polygon is drawn with the
newly set pencolor.
Return the current fillcolor as color specification string, possibly
in tuple format (see example). May be used as input to another
color/pencolor/fillcolor call.
fillcolor(colorstring)
Set fillcolor to colorstring, which is a Tk color specification string,
such as "red", "yellow", or "#33cc8c".
fillcolor((r,g,b))
Set fillcolor to the RGB color represented by the tuple of r, g, and
b. Each of r, g, and b must be in the range 0..colormode, where
colormode is either 1.0 or 255 (see colormode()).
fillcolor(r,g,b)
Set fillcolor to the RGB color represented by r, g, and b. Each of
r, g, and b must be in the range 0..colormode.
If turtleshape is a polygon, the interior of that polygon is drawn
with the newly set fillcolor.
Delete the turtle’s drawings from the screen. Do not move turtle. State and
position of the turtle as well as drawings of other turtles are not affected.
align – one of the strings “left”, “center” or right”
font – a triple (fontname, fontsize, fonttype)
Write text - the string representation of arg - at the current turtle
position according to align (“left”, “center” or right”) and with the given
font. If move is True, the pen is moved to the bottom-right corner of the
text. By default, move is False.
Make the turtle invisible. It’s a good idea to do this while you’re in the
middle of doing some complex drawing, because hiding the turtle speeds up the
drawing observably.
Set turtle shape to shape with given name or, if name is not given, return
name of current shape. Shape with name must exist in the TurtleScreen’s
shape dictionary. Initially there are the following polygon shapes: “arrow”,
“turtle”, “circle”, “square”, “triangle”, “classic”. To learn about how to
deal with shapes see Screen method register_shape().
rmode – one of the strings “auto”, “user”, “noresize”
Set resizemode to one of the values: “auto”, “user”, “noresize”. If rmode
is not given, return current resizemode. Different resizemodes have the
following effects:
“auto”: adapts the appearance of the turtle corresponding to the value of pensize.
“user”: adapts the appearance of the turtle according to the values of
stretchfactor and outlinewidth (outline), which are set by
shapesize().
“noresize”: no adaption of the turtle’s appearance takes place.
resizemode(“user”) is called by shapesize() when used with arguments.
Return or set the pen’s attributes x/y-stretchfactors and/or outline. Set
resizemode to “user”. If and only if resizemode is set to “user”, the turtle
will be displayed stretched according to its stretchfactors: stretch_wid is
stretchfactor perpendicular to its orientation, stretch_len is
stretchfactor in direction of its orientation, outline determines the width
of the shapes’s outline.
Set or return the current shearfactor. Shear the turtleshape according to
the given shearfactor shear, which is the tangent of the shear angle.
Do not change the turtle’s heading (direction of movement).
If shear is not given: return the current shearfactor, i. e. the
tangent of the shear angle, by which lines parallel to the
heading of the turtle are sheared.
Rotate the turtleshape to point in the direction specified by angle,
regardless of its current tilt-angle. Do not change the turtle’s heading
(direction of movement).
Set or return the current tilt-angle. If angle is given, rotate the
turtleshape to point in the direction specified by angle,
regardless of its current tilt-angle. Do not change the turtle’s
heading (direction of movement).
If angle is not given: return the current tilt-angle, i. e. the angle
between the orientation of the turtleshape and the heading of the
turtle (its direction of movement).
Set or return the current transformation matrix of the turtle shape.
If none of the matrix elements are given, return the transformation
matrix as a tuple of 4 elements.
Otherwise set the given elements and transform the turtleshape
according to the matrix consisting of first row t11, t12 and
second row t21, 22. The determinant t11 * t22 - t12 * t21 must not be
zero, otherwise an error is raised.
Modify stretchfactor, shearfactor and tiltangle according to the
given matrix.
fun – a function with two arguments which will be called with the
coordinates of the clicked point on the canvas
num – number of the mouse-button, defaults to 1 (left mouse button)
add – True or False – if True, a new binding will be
added, otherwise it will replace a former binding
Bind fun to mouse-click events on this turtle. If fun is None,
existing bindings are removed. Example for the anonymous turtle, i.e. the
procedural way:
>>> defturn(x,y):... left(180)...>>> onclick(turn)# Now clicking into the turtle will turn it.>>> onclick(None)# event-binding will be removed
Set or disable undobuffer. If size is an integer an empty undobuffer of
given size is installed. size gives the maximum number of turtle actions
that can be undone by the undo() method/function. If size is
None, the undobuffer is disabled.
To use compound turtle shapes, which consist of several polygons of different
color, you must use the helper class Shape explicitly as described
below:
Create an empty Shape object of type “compound”.
Add as many components to this object as desired, using the
addcomponent() method.
The Shape class is used internally by the register_shape()
method in different ways. The application programmer has to deal with the
Shape class only when using compound shapes like shown above!
Methods of TurtleScreen/Screen and corresponding functions¶
Most of the examples in this section refer to a TurtleScreen instance called
screen.
picname – a string, name of a gif-file or "nopic", or None
Set background image or return name of current backgroundimage. If picname
is a filename, set the corresponding image as background. If picname is
"nopic", delete background image, if present. If picname is None,
return the filename of the current backgroundimage.
Delete all drawings and all turtles from the TurtleScreen. Reset the now
empty TurtleScreen to its initial state: white background, no background
image, no event bindings and tracing on.
Note
This TurtleScreen method is available as a global function only under the
name clearscreen. The global function clear is a different one
derived from the Turtle method clear.
Reset all Turtles on the Screen to their initial state.
Note
This TurtleScreen method is available as a global function only under the
name resetscreen. The global function reset is another one
derived from the Turtle method reset.
canvwidth – positive integer, new width of canvas in pixels
canvheight – positive integer, new height of canvas in pixels
bg – colorstring or color-tuple, new background color
If no arguments are given, return current (canvaswidth, canvasheight). Else
resize the canvas the turtles are drawing on. Do not alter the drawing
window. To observe hidden parts of the canvas, use the scrollbars. With this
method, one can make visible those parts of a drawing which were outside the
canvas before.
llx – a number, x-coordinate of lower left corner of canvas
lly – a number, y-coordinate of lower left corner of canvas
urx – a number, x-coordinate of upper right corner of canvas
ury – a number, y-coordinate of upper right corner of canvas
Set up user-defined coordinate system and switch to mode “world” if
necessary. This performs a screen.reset(). If mode “world” is already
active, all drawings are redrawn according to the new coordinates.
ATTENTION: in user-defined coordinate systems angles may appear
distorted.
>>> screen.reset()>>> screen.setworldcoordinates(-50,-7.5,50,7.5)>>> for_inrange(72):... left(10)...>>> for_inrange(8):... left(45);fd(2)# a regular octagon
Set or return the drawing delay in milliseconds. (This is approximately
the time interval between two consecutive canvas updates.) The longer the
drawing delay, the slower the animation.
Turn turtle animation on/off and set delay for update drawings. If
n is given, only each n-th regular screen update is really
performed. (Can be used to accelerate the drawing of complex
graphics.) When called without arguments, returns the currently
stored value of n. Second argument sets delay value (see
delay()).
key – a string: key (e.g. “a”) or key-symbol (e.g. “space”)
Bind fun to key-release event of key. If fun is None, event bindings
are removed. Remark: in order to be able to register key-events, TurtleScreen
must have the focus. (See method listen().)
key – a string: key (e.g. “a”) or key-symbol (e.g. “space”)
Bind fun to key-press event of key if key is given,
or to any key-press-event if no key is given.
Remark: in order to be able to register key-events, TurtleScreen
must have focus. (See method listen().)
fun – a function with two arguments which will be called with the
coordinates of the clicked point on the canvas
num – number of the mouse-button, defaults to 1 (left mouse button)
add – True or False – if True, a new binding will be
added, otherwise it will replace a former binding
Bind fun to mouse-click events on this screen. If fun is None,
existing bindings are removed.
Example for a TurtleScreen instance named screen and a Turtle instance
named turtle:
>>> screen.onclick(turtle.goto)# Subsequently clicking into the TurtleScreen will>>> # make the turtle move to the clicked point.>>> screen.onclick(None)# remove event binding again
Note
This TurtleScreen method is available as a global function only under the
name onscreenclick. The global function onclick is another one
derived from the Turtle method onclick.
Starts event loop - calling Tkinter’s mainloop function.
Must be the last statement in a turtle graphics program.
Must not be used if a script is run from within IDLE in -n mode
(No subprocess) - for interactive use of turtle graphics.
Pop up a dialog window for input of a string. Parameter title is
the title of the dialog window, propmt is a text mostly describing
what information to input.
Return the string input. If the dialog is canceled, return None.
>>> screen.textinput("NIM","Name of first player:")
Pop up a dialog window for input of a number. title is the title of the
dialog window, prompt is a text mostly describing what numerical information
to input. default: default value, minval: minimum value for imput,
maxval: maximum value for input
The number input must be in the range minval .. maxval if these are
given. If not, a hint is issued and the dialog remains open for
correction.
Return the number input. If the dialog is canceled, return None.
mode – one of the strings “standard”, “logo” or “world”
Set turtle mode (“standard”, “logo” or “world”) and perform reset. If mode
is not given, current mode is returned.
Mode “standard” is compatible with old turtle. Mode “logo” is
compatible with most Logo turtle graphics. Mode “world” uses user-defined
“world coordinates”. Attention: in this mode angles appear distorted if
x/y unit-ratio doesn’t equal 1.
Mode
Initial turtle heading
positive angles
“standard”
to the right (east)
counterclockwise
“logo”
upward (north)
clockwise
>>> mode("logo")# resets turtle heading to north>>> mode()'logo'
If the value “using_IDLE” in the configuration dictionary is False
(default value), also enter mainloop. Remark: If IDLE with the -n switch
(no subprocess) is used, this value should be set to True in
turtle.cfg. In this case IDLE’s own mainloop is active also for the
client script.
Set the size and position of the main window. Default values of arguments
are stored in the configuration dictionary and can be changed via a
turtle.cfg file.
Parameters:
width – if an integer, a size in pixels, if a float, a fraction of the
screen; default is 50% of screen
height – if an integer, the height in pixels, if a float, a fraction of
the screen; default is 75% of screen
startx – if positive, starting position in pixels from the left
edge of the screen, if negative from the right edge, if None,
center window horizontally
startx – if positive, starting position in pixels from the top
edge of the screen, if negative from the bottom edge, if None,
center window vertically
>>> screen.setup(width=200,height=200,startx=0,starty=0)>>> # sets window to 200x200 pixels, in upper left of screen>>> screen.setup(width=.75,height=0.5,startx=None,starty=None)>>> # sets window to 75% of screen by 50% of screen and centers
poly – a polygon, i.e. a tuple of pairs of numbers
fill – a color the poly will be filled with
outline – a color for the poly’s outline (if given)
Example:
>>> poly=((0,0),(10,-5),(0,10),(-10,-5))>>> s=Shape("compound")>>> s.addcomponent(poly,"red","blue")>>> # ... add more components and then use register_shape()
A two-dimensional vector class, used as a helper class for implementing
turtle graphics. May be useful for turtle graphics programs too. Derived
from tuple, so a vector is a tuple!
The public methods of the Screen and Turtle classes are documented extensively
via docstrings. So these can be used as online-help via the Python help
facilities:
When using IDLE, tooltips show the signatures and first lines of the
docstrings of typed in function-/method calls.
Calling help() on methods or functions displays the docstrings:
>>> help(Screen.bgcolor)Help on method bgcolor in module turtle:bgcolor(self, *args) unbound turtle.Screen method Set or return backgroundcolor of the TurtleScreen. Arguments (if given): a color string or three numbers in the range 0..colormode or a 3-tuple of such numbers. >>> screen.bgcolor("orange") >>> screen.bgcolor() "orange" >>> screen.bgcolor(0.5,0,0.5) >>> screen.bgcolor() "#800080">>> help(Turtle.penup)Help on method penup in module turtle:penup(self) unbound turtle.Turtle method Pull the pen up -- no drawing when moving. Aliases: penup | pu | up No argument >>> turtle.penup()
The docstrings of the functions which are derived from methods have a modified
form:
>>> help(bgcolor)Help on function bgcolor in module turtle:bgcolor(*args) Set or return backgroundcolor of the TurtleScreen. Arguments (if given): a color string or three numbers in the range 0..colormode or a 3-tuple of such numbers. Example:: >>> bgcolor("orange") >>> bgcolor() "orange" >>> bgcolor(0.5,0,0.5) >>> bgcolor() "#800080">>> help(penup)Help on function penup in module turtle:penup() Pull the pen up -- no drawing when moving. Aliases: penup | pu | up No argument Example: >>> penup()
These modified docstrings are created automatically together with the function
definitions that are derived from the methods at import time.
Translation of docstrings into different languages¶
There is a utility to create a dictionary the keys of which are the method names
and the values of which are the docstrings of the public methods of the classes
Screen and Turtle.
Create and write docstring-dictionary to a Python script with the given
filename. This function has to be called explicitly (it is not used by the
turtle graphics classes). The docstring dictionary will be written to the
Python script filename.py. It is intended to serve as a template
for translation of the docstrings into different languages.
If you (or your students) want to use turtle with online help in your
native language, you have to translate the docstrings and save the resulting
file as e.g. turtle_docstringdict_german.py.
If you have an appropriate entry in your turtle.cfg file this dictionary
will be read in at import time and will replace the original English docstrings.
At the time of this writing there are docstring dictionaries in German and in
Italian. (Requests please to glingl@aon.at.)
The built-in default configuration mimics the appearance and behaviour of the
old turtle module in order to retain best possible compatibility with it.
If you want to use a different configuration which better reflects the features
of this module or which better fits to your needs, e.g. for use in a classroom,
you can prepare a configuration file turtle.cfg which will be read at import
time and modify the configuration according to its settings.
The built in configuration would correspond to the following turtle.cfg:
The first four lines correspond to the arguments of the Screen.setup()
method.
Line 5 and 6 correspond to the arguments of the method
Screen.screensize().
shape can be any of the built-in shapes, e.g: arrow, turtle, etc. For more
info try help(shape).
If you want to use no fillcolor (i.e. make the turtle transparent), you have
to write fillcolor="" (but all nonempty strings must not have quotes in
the cfg-file).
If you want to reflect the turtle its state, you have to use resizemode=auto.
If you set e.g. language=italian the docstringdict
turtle_docstringdict_italian.py will be loaded at import time (if
present on the import path, e.g. in the same directory as turtle.
The entries exampleturtle and examplescreen define the names of these
objects as they occur in the docstrings. The transformation of
method-docstrings to function-docstrings will delete these names from the
docstrings.
using_IDLE: Set this to True if you regularly work with IDLE and its -n
switch (“no subprocess”). This will prevent exitonclick() to enter the
mainloop.
There can be a turtle.cfg file in the directory where turtle is
stored and an additional one in the current working directory. The latter will
override the settings of the first one.
The Lib/turtledemo directory contains a turtle.cfg file. You can
study it as an example and see its effects when running the demos (preferably
not from within the demo-viewer).
There is a set of demo scripts in the turtledemo package. These
scripts can be run and viewed using the supplied demo viewer as follows:
python -m turtledemo
Alternatively, you can run the demo scripts individually. For example,
python -m turtledemo.bytedesign
The turtledemo package directory contains:
a set of 15 demo scripts demonstrating different features of the new module
turtle;
a demo viewer __main__.py which can be used to view the sourcecode
of the scripts and run them at the same time. 14 of the examples can be
accessed via the Examples menu; all of them can also be run standalone.
The example turtledemo.two_canvases demonstrates the simultaneous
use of two canvases with the turtle module. Therefore it only can be run
standalone.
There is a turtle.cfg file in this directory, which serves as an
example for how to write and use such files.
The methods Turtle.tracer(), Turtle.window_width() and
Turtle.window_height() have been eliminated.
Methods with these names and functionality are now available only
as methods of Screen. The functions derived from these remain
available. (In fact already in Python 2.6 these methods were merely
duplications of the corresponding
TurtleScreen/Screen-methods.)
The method Turtle.fill() has been eliminated.
The behaviour of begin_fill() and end_fill()
have changed slightly: now every filling-process must be completed with an
end_fill() call.
A method Turtle.filling() has been added. It returns a boolean
value: True if a filling process is under way, False otherwise.
This behaviour corresponds to a fill() call without arguments in
Python 2.6.
The methods Turtle.shearfactor(), Turtle.shapetransform() and
Turtle.get_shapepoly() have been added. Thus the full range of
regular linear transforms is now available for transforming turtle shapes.
Turtle.tiltangle() has been enhanced in functionality: it now can
be used to get or set the tiltangle. Turtle.settiltangle() has been
deprecated.
The method Screen.onkeypress() has been added as a complement to
Screen.onkey() which in fact binds actions to the keyrelease event.
Accordingly the latter has got an alias: Screen.onkeyrelease().
The method Screen.mainloop() has been added. So when working only
with Screen and Turtle objects one must not additonally import
mainloop() anymore.
Two input methods has been added Screen.textinput() and
Screen.numinput(). These popup input dialogs and return
strings and numbers respectively.
Two example scripts tdemo_nim.py and tdemo_round_dance.py
have been added to the Lib/turtledemo directory.
cmd — Support for line-oriented command interpreters¶
The Cmd class provides a simple framework for writing line-oriented
command interpreters. These are often useful for test harnesses, administrative
tools, and prototypes that will later be wrapped in a more sophisticated
interface.
class cmd.Cmd(completekey='tab', stdin=None, stdout=None)¶
A Cmd instance or subclass instance is a line-oriented interpreter
framework. There is no good reason to instantiate Cmd itself; rather,
it’s useful as a superclass of an interpreter class you define yourself in order
to inherit Cmd‘s methods and encapsulate action methods.
The optional argument completekey is the readline name of a completion
key; it defaults to Tab. If completekey is not None and
readline is available, command completion is done automatically.
The optional arguments stdin and stdout specify the input and output file
objects that the Cmd instance or subclass instance will use for input and
output. If not specified, they will default to sys.stdin and
sys.stdout.
If you want a given stdin to be used, make sure to set the instance’s
use_rawinput attribute to False, otherwise stdin will be
ignored.
Repeatedly issue a prompt, accept input, parse an initial prefix off the
received input, and dispatch to action methods, passing them the remainder of
the line as argument.
The optional argument is a banner or intro string to be issued before the first
prompt (this overrides the intro class attribute).
If the readline module is loaded, input will automatically inherit
bash-like history-list editing (e.g. Control-P scrolls back
to the last command, Control-N forward to the next one, Control-F
moves the cursor to the right non-destructively, Control-B moves the
cursor to the left non-destructively, etc.).
An end-of-file on input is passed back as the string 'EOF'.
An interpreter instance will recognize a command name foo if and only if it
has a method do_foo(). As a special case, a line beginning with the
character '?' is dispatched to the method do_help(). As another
special case, a line beginning with the character '!' is dispatched to the
method do_shell() (if such a method is defined).
This method will return when the postcmd() method returns a true value.
The stop argument to postcmd() is the return value from the command’s
corresponding do_*() method.
If completion is enabled, completing commands will be done automatically, and
completing of commands args is done by calling complete_foo() with
arguments text, line, begidx, and endidx. text is the string prefix
we are attempting to match: all returned matches must begin with it. line is
the current input line with leading whitespace removed, begidx and endidx
are the beginning and ending indexes of the prefix text, which could be used to
provide different completion depending upon which position the argument is in.
All subclasses of Cmd inherit a predefined do_help(). This
method, called with an argument 'bar', invokes the corresponding method
help_bar(), and if that is not present, prints the docstring of
do_bar(), if available. With no argument, do_help() lists all
available help topics (that is, all commands with corresponding
help_*() methods or commands that have docstrings), and also lists any
undocumented commands.
Interpret the argument as though it had been typed in response to the prompt.
This may be overridden, but should not normally need to be; see the
precmd() and postcmd() methods for useful execution hooks. The
return value is a flag indicating whether interpretation of commands by the
interpreter should stop. If there is a do_*() method for the command
str, the return value of that method is returned, otherwise the return value
from the default() method is returned.
Hook method executed just before the command line line is interpreted, but
after the input prompt is generated and issued. This method is a stub in
Cmd; it exists to be overridden by subclasses. The return value is
used as the command which will be executed by the onecmd() method; the
precmd() implementation may re-write the command or simply return line
unchanged.
Hook method executed just after a command dispatch is finished. This method is
a stub in Cmd; it exists to be overridden by subclasses. line is the
command line which was executed, and stop is a flag which indicates whether
execution will be terminated after the call to postcmd(); this will be the
return value of the onecmd() method. The return value of this method will
be used as the new value for the internal flag which corresponds to stop;
returning false will cause interpretation to continue.
The header to issue if the help output has a section for miscellaneous help
topics (that is, there are help_*() methods without corresponding
do_*() methods).
The header to issue if the help output has a section for undocumented commands
(that is, there are do_*() methods without corresponding help_*()
methods).
A flag, defaulting to true. If true, cmdloop() uses input() to
display a prompt and read the next command; if false, sys.stdout.write()
and sys.stdin.readline() are used. (This means that by importing
readline, on systems that support it, the interpreter will automatically
support Emacs-like line editing and command-history keystrokes.)
The cmd module is mainly useful for building custom shells that let a
user work with a program interactively.
This section presents a simple example of how to build a shell around a few of
the commands in the turtle module.
Basic turtle commands such as forward() are added to a
Cmd subclass with method named do_forward(). The argument is
converted to a number and dispatched to the turtle module. The docstring is
used in the help utility provided by the shell.
The example also includes a basic record and playback facility implemented with
the precmd() method which is responsible for converting the input to
lowercase and writing the commands to a file. The do_playback() method
reads the file and adds the recorded commands to the cmdqueue for
immediate playback:
import cmd, sys
from turtle import *
class TurtleShell(cmd.Cmd):
intro = 'Welcome to the turtle shell. Type help or ? to list commands.\n'
prompt = '(turtle) '
file = None
# ----- basic turtle commands -----
def do_forward(self, arg):
'Move the turtle forward by the specified distance: FORWARD 10'
forward(*parse(arg))
def do_right(self, arg):
'Turn turtle right by given number of degrees: RIGHT 20'
right(*parse(arg))
def do_left(self, arg):
'Turn turtle left by given number of degrees: LEFT 90'
right(*parse(arg))
def do_goto(self, arg):
'Move turtle to an absolute position with changing orientation. GOTO 100 200'
goto(*parse(arg))
def do_home(self, arg):
'Return turtle to the home postion: HOME'
home()
def do_circle(self, arg):
'Draw circle with given radius an options extent and steps: CIRCLE 50'
circle(*parse(arg))
def do_position(self, arg):
'Print the current turle position: POSITION'
print('Current position is %d %d\n' % position())
def do_heading(self, arg):
'Print the current turle heading in degrees: HEADING'
print('Current heading is %d\n' % (heading(),))
def do_color(self, arg):
'Set the color: COLOR BLUE'
color(arg.lower())
def do_undo(self, arg):
'Undo (repeatedly) the last turtle action(s): UNDO'
def do_reset(self, arg):
'Clear the screen and return turtle to center: RESET'
reset()
def do_bye(self, arg):
'Stop recording, close the turtle window, and exit: BYE'
print('Thank you for using Turtle')
self.close()
bye()
sys.exit(0)
# ----- record and playback -----
def do_record(self, arg):
'Save future commands to filename: RECORD rose.cmd'
self.file = open(arg, 'w')
def do_playback(self, arg):
'Playback commands from a file: PLAYBACK rose.cmd'
self.close()
cmds = open(arg).read().splitlines()
self.cmdqueue.extend(cmds)
def precmd(self, line):
line = line.lower()
if self.file and 'playback' not in line:
print(line, file=self.file)
return line
def close(self):
if self.file:
self.file.close()
self.file = None
def parse(arg):
'Convert a series of zero or more numbers to an argument tuple'
return tuple(map(int, arg.split()))
if __name__ == '__main__':
TurtleShell().cmdloop()
Here is a sample session with the turtle shell showing the help functions, using
blank lines to repeat commands, and the simple record and playback facility:
Welcome to the turtle shell. Type help or ? to list commands.
(turtle) ?
Documented commands (type help <topic>):
========================================
bye color goto home playback record right
circle forward heading left position reset undo
(turtle) help forward
Move the turtle forward by the specified distance: FORWARD 10
(turtle) record spiral.cmd
(turtle) position
Current position is 0 0
(turtle) heading
Current heading is 0
(turtle) reset
(turtle) circle 20
(turtle) right 30
(turtle) circle 40
(turtle) right 30
(turtle) circle 60
(turtle) right 30
(turtle) circle 80
(turtle) right 30
(turtle) circle 100
(turtle) right 30
(turtle) circle 120
(turtle) right 30
(turtle) circle 120
(turtle) heading
Current heading is 180
(turtle) forward 100
(turtle)
(turtle) right 90
(turtle) forward 100
(turtle)
(turtle) right 90
(turtle) forward 400
(turtle) right 90
(turtle) forward 500
(turtle) right 90
(turtle) forward 400
(turtle) right 90
(turtle) forward 300
(turtle) playback spiral.cmd
Current position is 0 0
Current heading is 0
Current heading is 180
(turtle) bye
Thank you for using Turtle
The shlex class makes it easy to write lexical analyzers for simple
syntaxes resembling that of the Unix shell. This will often be useful for
writing minilanguages, (for example, in run control files for Python
applications) or for parsing quoted strings.
Split the string s using shell-like syntax. If comments is False
(the default), the parsing of comments in the given string will be disabled
(setting the commenters attribute of the shlex instance to
the empty string). This function operates in POSIX mode by default, but uses
non-POSIX mode if the posix argument is false.
Note
Since the split() function instantiates a shlex instance,
passing None for s will read the string to split from standard
input.
class shlex.shlex(instream=None, infile=None, posix=False)¶
A shlex instance or subclass instance is a lexical analyzer object.
The initialization argument, if present, specifies where to read characters
from. It must be a file-/stream-like object with read() and
readline() methods, or a string. If no argument is given, input will
be taken from sys.stdin. The second optional argument is a filename
string, which sets the initial value of the infile attribute. If the
instream argument is omitted or equal to sys.stdin, this second
argument defaults to “stdin”. The posix argument defines the operational
mode: when posix is not true (default), the shlex instance will
operate in compatibility mode. When operating in POSIX mode, shlex
will try to be as close as possible to the POSIX shell parsing rules.
Return a token. If tokens have been stacked using push_token(), pop a
token off the stack. Otherwise, read one from the input stream. If reading
encounters an immediate end-of-file, self.eof is returned (the empty
string ('') in non-POSIX mode, and None in POSIX mode).
Read a raw token. Ignore the pushback stack, and do not interpret source
requests. (This is not ordinarily a useful entry point, and is documented here
only for the sake of completeness.)
When shlex detects a source request (see source below) this
method is given the following token as argument, and expected to return a tuple
consisting of a filename and an open file-like object.
Normally, this method first strips any quotes off the argument. If the result
is an absolute pathname, or there was no previous source request in effect, or
the previous source was a stream (such as sys.stdin), the result is left
alone. Otherwise, if the result is a relative pathname, the directory part of
the name of the file immediately before it on the source inclusion stack is
prepended (this behavior is like the way the C preprocessor handles #include"file.h").
The result of the manipulations is treated as a filename, and returned as the
first component of the tuple, with open() called on it to yield the second
component. (Note: this is the reverse of the order of arguments in instance
initialization!)
This hook is exposed so that you can use it to implement directory search paths,
addition of file extensions, and other namespace hacks. There is no
corresponding ‘close’ hook, but a shlex instance will call the close()
method of the sourced input stream when it returns EOF.
Push an input source stream onto the input stack. If the filename argument is
specified it will later be available for use in error messages. This is the
same method used internally by the sourcehook() method.
This method generates an error message leader in the format of a Unix C compiler
error label; the format is '"%s",line%d:', where the %s is replaced
with the name of the current source file and the %d with the current input
line number (the optional arguments can be used to override these).
This convenience is provided to encourage shlex users to generate error
messages in the standard, parseable format understood by Emacs and other Unix
tools.
Instances of shlex subclasses have some public instance variables which
either control lexical analysis or can be used for debugging:
The string of characters that are recognized as comment beginners. All
characters from the comment beginner to end of line are ignored. Includes just
'#' by default.
Characters that will be considered string quotes. The token accumulates until
the same quote is encountered again (thus, different quote types protect each
other as in the shell.) By default, includes ASCII single and double quotes.
If True, tokens will only be split in whitespaces. This is useful, for
example, for parsing command lines with shlex, getting tokens in a
similar way to shell arguments.
The name of the current input file, as initially set at class instantiation time
or stacked by later source requests. It may be useful to examine this when
constructing error messages.
This attribute is None by default. If you assign a string to it, that
string will be recognized as a lexical-level inclusion request similar to the
source keyword in various shells. That is, the immediately following token
will opened as a filename and input taken from that stream until EOF, at which
point the close() method of that stream will be called and the input
source will again become the original input stream. Source requests may be
stacked any number of levels deep.
If this attribute is numeric and 1 or more, a shlex instance will
print verbose progress output on its behavior. If you need to use this, you can
read the module source code to learn the details.
When operating in non-POSIX mode, shlex will try to obey to the
following rules.
Quote characters are not recognized within words (Do"Not"Separate is
parsed as the single word Do"Not"Separate);
Escape characters are not recognized;
Enclosing characters in quotes preserve the literal value of all characters
within the quotes;
Closing quotes separate words ("Do"Separate is parsed as "Do" and
Separate);
If whitespace_split is False, any character not declared to be a
word character, whitespace, or a quote will be returned as a single-character
token. If it is True, shlex will only split words in whitespaces;
EOF is signaled with an empty string ('');
It’s not possible to parse empty strings, even if quoted.
When operating in POSIX mode, shlex will try to obey to the following
parsing rules.
Quotes are stripped out, and do not separate words ("Do"Not"Separate" is
parsed as the single word DoNotSeparate);
Non-quoted escape characters (e.g. '\') preserve the literal value of the
next character that follows;
Enclosing characters in quotes which are not part of escapedquotes
(e.g. "'") preserve the literal value of all characters within the quotes;
Enclosing characters in quotes which are part of escapedquotes (e.g.
'"') preserves the literal value of all characters within the quotes, with
the exception of the characters mentioned in escape. The escape
characters retain its special meaning only when followed by the quote in use, or
the escape character itself. Otherwise the escape character will be considered a
normal character.
Tk/Tcl has long been an integral part of Python. It provides a robust and
platform independent windowing toolkit, that is available to Python programmers
using the tkinter package, and its extension, the tkinter.tix and
the tkinter.ttk modules.
The tkinter package is a thin object-oriented layer on top of Tcl/Tk. To
use tkinter, you don’t need to write Tcl code, but you will need to
consult the Tk documentation, and occasionally the Tcl documentation.
tkinter is a set of wrappers that implement the Tk widgets as Python
classes. In addition, the internal module _tkinter provides a threadsafe
mechanism which allows Python and Tcl to interact.
tkinter‘s chief virtues are that it is fast, and that it usually comes
bundled with Python. Although its standard documentation is weak, good
material is available, which includes: references, tutorials, a book and
others. tkinter is also famous for having an outdated look and feel,
which has been vastly improved in Tk 8.5. Nevertheless, there are many other
GUI libraries that you could be interested in. For more information about
alternatives, see the Other Graphical User Interface Packages section.
The tkinter package (“Tk interface”) is the standard Python interface to
the Tk GUI toolkit. Both Tk and tkinter are available on most Unix
platforms, as well as on Windows systems. (Tk itself is not part of Python; it
is maintained at ActiveState.) You can check that tkinter is properly
installed on your system by running python-mtkinter from the command line;
this should open a window demonstrating a simple Tk interface.
Most of the time, tkinter is all you really need, but a number of
additional modules are available as well. The Tk interface is located in a
binary module named _tkinter. This module contains the low-level
interface to Tk, and should never be used directly by application programmers.
It is usually a shared library (or DLL), but might in some cases be statically
linked with the Python interpreter.
In addition to the Tk interface module, tkinter includes a number of
Python modules, tkinter.constants being one of the most important.
Importing tkinter will automatically import tkinter.constants,
so, usually, to use Tkinter all you need is a simple import statement:
importtkinter
Or, more often:
fromtkinterimport*
class tkinter.Tk(screenName=None, baseName=None, className='Tk', useTk=1)¶
The Tk class is instantiated without arguments. This creates a toplevel
widget of Tk which usually is the main window of an application. Each instance
has its own associated Tcl interpreter.
The Tcl() function is a factory function which creates an object much like
that created by the Tk class, except that it does not initialize the Tk
subsystem. This is most often useful when driving the Tcl interpreter in an
environment where one doesn’t want to create extraneous toplevel windows, or
where one cannot (such as Unix/Linux systems without an X server). An object
created by the Tcl() object can have a Toplevel window created (and the Tk
subsystem initialized) by calling its loadtk() method.
This section is not designed to be an exhaustive tutorial on either Tk or
Tkinter. Rather, it is intended as a stop gap, providing some introductory
orientation on the system.
Credits:
Tk was written by John Ousterhout while at Berkeley.
Tkinter was written by Steen Lumholt and Guido van Rossum.
This Life Preserver was written by Matt Conway at the University of Virginia.
The HTML rendering, and some liberal editing, was produced from a FrameMaker
version by Ken Manheimer.
Fredrik Lundh elaborated and revised the class interface descriptions, to get
them current with Tk 4.2.
Mike Clarkson converted the documentation to LaTeX, and compiled the User
Interface chapter of the reference manual.
This section is designed in two parts: the first half (roughly) covers
background material, while the second half can be taken to the keyboard as a
handy reference.
When trying to answer questions of the form “how do I do blah”, it is often best
to find out how to do”blah” in straight Tk, and then convert this back into the
corresponding tkinter call. Python programmers can often guess at the
correct Python command by looking at the Tk documentation. This means that in
order to use Tkinter, you will have to know a little bit about Tk. This document
can’t fulfill that role, so the best we can do is point you to the best
documentation that exists. Here are some hints:
The authors strongly suggest getting a copy of the Tk man pages.
Specifically, the man pages in the manN directory are most useful.
The man3 man pages describe the C interface to the Tk library and thus
are not especially helpful for script writers.
Addison-Wesley publishes a book called Tcl and the Tk Toolkit by John
Ousterhout (ISBN 0-201-63337-X) which is a good introduction to Tcl and Tk for
the novice. The book is not exhaustive, and for many details it defers to the
man pages.
tkinter/__init__.py is a last resort for most, but can be a good
place to go when nothing else makes sense.
The class hierarchy looks complicated, but in actual practice, application
programmers almost always refer to the classes at the very bottom of the
hierarchy.
Notes:
These classes are provided for the purposes of organizing certain functions
under one namespace. They aren’t meant to be instantiated independently.
The Tk class is meant to be instantiated only once in an application.
Application programmers need not instantiate one explicitly, the system creates
one whenever any of the other classes are instantiated.
The Widget class is not meant to be instantiated, it is meant only
for subclassing to make “real” widgets (in C++, this is called an ‘abstract
class’).
To make use of this reference material, there will be times when you will need
to know how to read short passages of Tk and how to identify the various parts
of a Tk command. (See section Mapping Basic Tk into Tkinter for the
tkinter equivalents of what’s below.)
Tk scripts are Tcl programs. Like all Tcl programs, Tk scripts are just lists
of tokens separated by spaces. A Tk widget is just its class, the options
that help configure it, and the actions that make it do useful things.
To make a widget in Tk, the command is always of the form:
classCommand newPathname options
classCommand
denotes which kind of widget to make (a button, a label, a menu...)
newPathname
is the new name for this widget. All names in Tk must be unique. To help
enforce this, widgets in Tk are named with pathnames, just like files in a
file system. The top level widget, the root, is called . (period) and
children are delimited by more periods. For example,
.myApp.controlPanel.okButton might be the name of a widget.
options
configure the widget’s appearance and in some cases, its behavior. The options
come in the form of a list of flags and values. Flags are preceded by a ‘-‘,
like Unix shell command flags, and values are put in quotes if they are more
than one word.
For example:
button .fred -fg red -text "hi there"
^ ^ \______________________/
| | |
class new options
command widget (-opt val -opt val ...)
Once created, the pathname to the widget becomes a new command. This new
widget command is the programmer’s handle for getting the new widget to
perform some action. In C, you’d express this as someAction(fred,
someOptions), in C++, you would express this as fred.someAction(someOptions),
and in Tk, you say:
.fred someAction someOptions
Note that the object name, .fred, starts with a dot.
As you’d expect, the legal values for someAction will depend on the widget’s
class: .freddisable works if fred is a button (fred gets greyed out), but
does not work if fred is a label (disabling of labels is not supported in Tk).
The legal values of someOptions is action dependent. Some actions, like
disable, require no arguments, others, like a text-entry box’s delete
command, would need arguments to specify what range of text to delete.
Class commands in Tk correspond to class constructors in Tkinter.
button .fred =====> fred = Button()
The master of an object is implicit in the new name given to it at creation
time. In Tkinter, masters are specified explicitly.
button .panel.fred =====> fred = Button(panel)
The configuration options in Tk are given in lists of hyphened tags followed by
values. In Tkinter, options are specified as keyword-arguments in the instance
constructor, and keyword-args for configure calls or as instance indices, in
dictionary style, for established instances. See section
Setting Options on setting options.
button .fred -fg red =====> fred = Button(panel, fg="red")
.fred configure -fg red =====> fred["fg"] = red
OR ==> fred.config(fg="red")
In Tk, to perform an action on a widget, use the widget name as a command, and
follow it with an action name, possibly with arguments (options). In Tkinter,
you call methods on the class instance to invoke actions on the widget. The
actions (methods) that a given widget can perform are listed in
tkinter/__init__.py.
.fred invoke =====> fred.invoke()
To give a widget to the packer (geometry manager), you call pack with optional
arguments. In Tkinter, the Pack class holds all this functionality, and the
various forms of the pack command are implemented as methods. All widgets in
tkinter are subclassed from the Packer, and so inherit all the packing
methods. See the tkinter.tix module documentation for additional
information on the Form geometry manager.
pack .fred -side left =====> fred.pack(side="left")
This call (say, for example, creating a button widget), is implemented in
the tkinter package, which is written in Python. This Python
function will parse the commands and the arguments and convert them into a
form that makes them look as if they had come from a Tk script instead of
a Python script.
_tkinter (C)
These commands and their arguments will be passed to a C function in the
_tkinter - note the underscore - extension module.
Tk Widgets (C and Tcl)
This C function is able to make calls into other C modules, including the C
functions that make up the Tk library. Tk is implemented in C and some Tcl.
The Tcl part of the Tk widgets is used to bind certain default behaviors to
widgets, and is executed once at the point where the Python tkinter
package is imported. (The user never sees this stage).
Tk (C)
The Tk part of the Tk Widgets implement the final mapping to ...
Options control things like the color and border width of a widget. Options can
be set in three ways:
At object creation time, using keyword arguments
fred=Button(self,fg="red",bg="blue")
After object creation, treating the option name like a dictionary index
fred["fg"]="red"fred["bg"]="blue"
Use the config() method to update multiple attrs subsequent to object creation
fred.config(fg="red",bg="blue")
For a complete explanation of a given option and its behavior, see the Tk man
pages for the widget in question.
Note that the man pages list “STANDARD OPTIONS” and “WIDGET SPECIFIC OPTIONS”
for each widget. The former is a list of options that are common to many
widgets, the latter are the options that are idiosyncratic to that particular
widget. The Standard Options are documented on the options(3) man
page.
No distinction between standard and widget-specific options is made in this
document. Some options don’t apply to some kinds of widgets. Whether a given
widget responds to a particular option depends on the class of the widget;
buttons have a command option, labels do not.
The options supported by a given widget are listed in that widget’s man page, or
can be queried at runtime by calling the config() method without
arguments, or by calling the keys() method on that widget. The return
value of these calls is a dictionary whose key is the name of the option as a
string (for example, 'relief') and whose values are 5-tuples.
Some options, like bg are synonyms for common options with long names
(bg is shorthand for “background”). Passing the config() method the name
of a shorthand option will return a 2-tuple, not 5-tuple. The 2-tuple passed
back will contain the name of the synonym and the “real” option (such as
('bg','background')).
The packer is one of Tk’s geometry-management mechanisms. Geometry managers
are used to specify the relative positioning of the positioning of widgets
within their container - their mutual master. In contrast to the more
cumbersome placer (which is used less commonly, and we do not cover here), the
packer takes qualitative relationship specification - above, to the left of,
filling, etc - and works everything out to determine the exact placement
coordinates for you.
The size of any master widget is determined by the size of the “slave widgets”
inside. The packer is used to control where slave widgets appear inside the
master into which they are packed. You can pack widgets into frames, and frames
into other frames, in order to achieve the kind of layout you desire.
Additionally, the arrangement is dynamically adjusted to accommodate incremental
changes to the configuration, once it is packed.
Note that widgets do not appear until they have had their geometry specified
with a geometry manager. It’s a common early mistake to leave out the geometry
specification, and then be surprised when the widget is created but nothing
appears. A widget will appear only after it has had, for example, the packer’s
pack() method applied to it.
The pack() method can be called with keyword-option/value pairs that control
where the widget is to appear within its container, and how it is to behave when
the main application window is resized. Here are some examples:
fred.pack()# defaults to side = "top"fred.pack(side="left")fred.pack(expand=1)
The current-value setting of some widgets (like text entry widgets) can be
connected directly to application variables by using special options. These
options are variable, textvariable, onvalue, offvalue, and
value. This connection works both ways: if the variable changes for any
reason, the widget it’s connected to will be updated to reflect the new value.
Unfortunately, in the current implementation of tkinter it is not
possible to hand over an arbitrary Python variable to a widget through a
variable or textvariable option. The only kinds of variables for which
this works are variables that are subclassed from a class called Variable,
defined in tkinter.
There are many useful subclasses of Variable already defined:
StringVar, IntVar, DoubleVar, and
BooleanVar. To read the current value of such a variable, call the
get() method on it, and to change its value you call the set()
method. If you follow this protocol, the widget will always track the value of
the variable, with no further intervention on your part.
For example:
classApp(Frame):def__init__(self,master=None):Frame.__init__(self,master)self.pack()self.entrythingy=Entry()self.entrythingy.pack()# here is the application variableself.contents=StringVar()# set it to some valueself.contents.set("this is a variable")# tell the entry widget to watch this variableself.entrythingy["textvariable"]=self.contents# and here we get a callback when the user hits return.# we will have the program print out the value of the# application variable when the user hits returnself.entrythingy.bind('<Key-Return>',self.print_contents)defprint_contents(self,event):print("hi. contents of entry is now ---->",self.contents.get())
In Tk, there is a utility command, wm, for interacting with the window
manager. Options to the wm command allow you to control things like titles,
placement, icon bitmaps, and the like. In tkinter, these commands have
been implemented as methods on the Wm class. Toplevel widgets are
subclassed from the Wm class, and so can call the Wm methods
directly.
To get at the toplevel window that contains a given widget, you can often just
refer to the widget’s master. Of course if the widget has been packed inside of
a frame, the master won’t represent a toplevel window. To get at the toplevel
window that contains an arbitrary widget, you can call the _root() method.
This method begins with an underscore to denote the fact that this function is
part of the implementation, and not an interface to Tk functionality.
Here are some examples of typical usage:
fromtkinterimport*classApp(Frame):def__init__(self,master=None):Frame.__init__(self,master)self.pack()# create the applicationmyapp=App()## here are method calls to the window manager class#myapp.master.title("My Do-Nothing Application")myapp.master.maxsize(1000,400)# start the programmyapp.mainloop()
Legal values are points of the compass: "n", "ne", "e", "se",
"s", "sw", "w", "nw", and also "center".
bitmap
There are eight built-in, named bitmaps: 'error', 'gray25',
'gray50', 'hourglass', 'info', 'questhead', 'question',
'warning'. To specify an X bitmap filename, give the full path to the file,
preceded with an @, as in "@/usr/contrib/bitmap/gumby.bit".
boolean
You can pass integers 0 or 1 or the strings "yes" or "no" .
callback
This is any Python function that takes no arguments. For example:
Colors can be given as the names of X colors in the rgb.txt file, or as strings
representing RGB values in 4 bit: "#RGB", 8 bit: "#RRGGBB", 12 bit”
"#RRRGGGBBB", or 16 bit "#RRRRGGGGBBBB" ranges, where R,G,B here
represent any legal hex digit. See page 160 of Ousterhout’s book for details.
cursor
The standard X cursor names from cursorfont.h can be used, without the
XC_ prefix. For example to get a hand cursor (XC_hand2), use the
string "hand2". You can also specify a bitmap and mask file of your own.
See page 179 of Ousterhout’s book.
distance
Screen distances can be specified in either pixels or absolute distances.
Pixels are given as numbers and absolute distances as strings, with the trailing
character denoting units: c for centimetres, i for inches, m for
millimetres, p for printer’s points. For example, 3.5 inches is expressed
as "3.5i".
font
Tk uses a list font name format, such as {courier10bold}. Font sizes with
positive numbers are measured in points; sizes with negative numbers are
measured in pixels.
geometry
This is a string of the form widthxheight, where width and height are
measured in pixels for most widgets (in characters for widgets displaying text).
For example: fred["geometry"]="200x100".
justify
Legal values are the strings: "left", "center", "right", and
"fill".
region
This is a string with four space-delimited elements, each of which is a legal
distance (see above). For example: "2345" and "3i2i4.5i2i" and
"3c2c4c10.43c" are all legal regions.
relief
Determines what the border style of a widget will be. Legal values are:
"raised", "sunken", "flat", "groove", and "ridge".
scrollcommand
This is almost always the set() method of some scrollbar widget, but can
be any widget method that takes a single argument.
The bind method from the widget command allows you to watch for certain events
and to have a callback function trigger when that event type occurs. The form
of the bind method is:
def bind(self, sequence, func, add=''):
where:
sequence
is a string that denotes the target kind of event. (See the bind man page and
page 201 of John Ousterhout’s book for details).
func
is a Python function, taking one argument, to be invoked when the event occurs.
An Event instance will be passed as the argument. (Functions deployed this way
are commonly known as callbacks.)
add
is optional, either '' or '+'. Passing an empty string denotes that
this binding is to replace any other bindings that this event is associated
with. Passing a '+' means that this function is to be added to the list
of functions bound to this event type.
Notice how the widget field of the event is being accessed in the
turnRed() callback. This field contains the widget that caught the X
event. The following table lists the other event fields you can access, and how
they are denoted in Tk, which can be useful when referring to the Tk man pages.
A number of widgets require “index” parameters to be passed. These are used to
point at a specific place in a Text widget, or to particular characters in an
Entry widget, or to particular menu items in a Menu widget.
Entry widget indexes (index, view index, etc.)
Entry widgets have options that refer to character positions in the text being
displayed. You can use these tkinter functions to access these special
points in text widgets:
AtEnd()
refers to the last position in the text
AtInsert()
refers to the point where the text cursor is
AtSelFirst()
indicates the beginning point of the selected text
AtSelLast()
denotes the last point of the selected text and finally
At(x[, y])
refers to the character at pixel location x, y (with y not used in the
case of a text entry widget, which contains a single line of text).
Text widget indexes
The index notation for Text widgets is very rich and is best described in the Tk
man pages.
Menu indexes (menu.invoke(), menu.entryconfig(), etc.)
Some options and methods for menus manipulate specific menu entries. Anytime a
menu index is needed for an option or a parameter, you may pass in:
an integer which refers to the numeric position of the entry in the widget,
counted from the top, starting with 0;
the string "active", which refers to the menu position that is currently
under the cursor;
the string "last" which refers to the last menu item;
An integer preceded by @, as in @6, where the integer is interpreted
as a y pixel coordinate in the menu’s coordinate system;
the string "none", which indicates no menu entry at all, most often used
with menu.activate() to deactivate all entries, and finally,
a text string that is pattern matched against the label of the menu entry, as
scanned from the top of the menu to the bottom. Note that this index type is
considered after all the others, which means that matches for menu items
labelled last, active, or none may be interpreted as the above
literals, instead.
Bitmap/Pixelmap images can be created through the subclasses of
tkinter.Image:
BitmapImage can be used for X11 bitmap data.
PhotoImage can be used for GIF and PPM/PGM color bitmaps.
Either type of image is created through either the file or the data
option (other options are available as well).
The image object can then be used wherever an image option is supported by
some widget (e.g. labels, buttons, menus). In these cases, Tk will not keep a
reference to the image. When the last Python reference to the image object is
deleted, the image data is deleted as well, and Tk will display an empty box
wherever the image was used.
The tkinter.ttk module provides access to the Tk themed widget set,
introduced in Tk 8.5. If Python has not been compiled against Tk 8.5, this
module can still be accessed if Tile has been installed. The former
method using Tk 8.5 provides additional benefits including anti-aliased font
rendering under X11 and window transparency (requiring a composition
window manager on X11).
The basic idea for tkinter.ttk is to separate, to the extent possible,
the code implementing a widget’s behavior from the code implementing its
appearance.
To override the basic Tk widgets, the import should follow the Tk import:
fromtkinterimport*fromtkinter.ttkimport*
That code causes several tkinter.ttk widgets (Button,
Checkbutton, Entry, Frame, Label,
LabelFrame, Menubutton, PanedWindow,
Radiobutton, Scale and Scrollbar) to
automatically replace the Tk widgets.
This has the direct benefit of using the new widgets which gives a better
look and feel across platforms; however, the replacement widgets are not
completely compatible. The main difference is that widget options such as
“fg”, “bg” and others related to widget styling are no
longer present in Ttk widgets. Instead, use the ttk.Style class
for improved styling effects.
Ttk comes with 17 widgets, eleven of which already existed in tkinter:
Button, Checkbutton, Entry, Frame,
Label, LabelFrame, Menubutton, PanedWindow,
Radiobutton, Scale and Scrollbar. The other six are
new: Combobox, Notebook, Progressbar,
Separator, Sizegrip and Treeview. And all them are
subclasses of Widget.
Using the Ttk widgets gives the application an improved look and feel.
As discussed above, there are differences in how the styling is coded.
All the ttk Widgets accepts the following options:
Option
Description
class
Specifies the window class. The class is used when querying
the option database for the window’s other options, to
determine the default bindtags for the window, and to select
the widget’s default layout and style. This is a read-only
which may only be specified when the window is created
cursor
Specifies the mouse cursor to be used for the widget. If set
to the empty string (the default), the cursor is inherited
for the parent widget.
takefocus
Determines whether the window accepts the focus during
keyboard traversal. 0, 1 or an empty string is returned.
If 0 is returned, it means that the window should be skipped
entirely during keyboard traversal. If 1, it means that the
window should receive the input focus as long as it is
viewable. And an empty string means that the traversal
scripts make the decision about whether or not to focus
on the window.
The following options are supported by widgets that are controlled by a
scrollbar.
option
description
xscrollcommand
Used to communicate with horizontal scrollbars.
When the view in the widget’s window change, the widget
will generate a Tcl command based on the scrollcommand.
Usually this option consists of the method
Scrollbar.set() of some scrollbar. This will cause
the scrollbar to be updated whenever the view in the
window changes.
yscrollcommand
Used to communicate with vertical scrollbars.
For some more information, see above.
The following options are supported by labels, buttons and other button-like
widgets.
option
description
text
Specifies a text string to be displayed inside the widget.
textvariable
Specifies a name whose value will be used in place of the
text option resource.
underline
If set, specifies the index (0-based) of a character to
underline in the text string. The underline character is
used for mnemonic activation.
image
Specifies an image to display. This is a list of 1 or more
elements. The first element is the default image name. The
rest of the list if a sequence of statespec/value pairs as
defined by Style.map(), specifying different images
to use when the widget is in a particular state or a
combination of states. All images in the list should have
the same size.
compound
Specifies how to display the image relative to the text,
in the case both text and images options are present.
Valid values are:
text: display text only
image: display image only
top, bottom, left, right: display image above, below,
left of, or right of the text, respectively.
none: the default. display the image if present,
otherwise the text.
width
If greater than zero, specifies how much space, in
character widths, to allocate for the text label, if less
than zero, specifies a minimum width. If zero or
unspecified, the natural width of the text label is used.
May be set to “normal” or “disabled” to control the “disabled”
state bit. This is a write-only option: setting it changes the
widget state, but the Widget.state() method does not
affect this option.
The widget state is a bitmap of independent state flags.
flag
description
active
The mouse cursor is over the widget and pressing a mouse
button will cause some action to occur
disabled
Widget is disabled under program control
focus
Widget has keyboard focus
pressed
Widget is being pressed
selected
“On”, “true”, or “current” for things like Checkbuttons and
radiobuttons
background
Windows and Mac have a notion of an “active” or foreground
window. The background state is set for widgets in a
background window, and cleared for those in the foreground
window
readonly
Widget should not allow user modification
alternate
A widget-specific alternate display format
invalid
The widget’s value is invalid
A state specification is a sequence of state names, optionally prefixed with
an exclamation point indicating that the bit is off.
Test the widget’s state. If a callback is not specified, returns True
if the widget state matches statespec and False otherwise. If callback
is specified then it is called with args if widget state matches
statespec.
Modify or inquire widget state. If statespec is specified, sets the
widget state according to it and return a new statespec indicating
which flags were changed. If statespec is not specified, returns
the currently-enabled state flags.
The ttk.Combobox widget combines a text field with a pop-down list of
values. This widget is a subclass of Entry.
Besides the methods inherited from Widget: Widget.cget(),
Widget.configure(), Widget.identify(), Widget.instate()
and Widget.state(), and the following inherited from Entry:
Entry.bbox(), Entry.delete(), Entry.icursor(),
Entry.index(), Entry.inset(), Entry.selection(),
Entry.xview(), it has some other methods, described at
ttk.Combobox.
This widget accepts the following specific options:
option
description
exportselection
Boolean value. If set, the widget selection is linked
to the Window Manager selection (which can be returned
by invoking Misc.selection_get, for example).
justify
Specifies how the text is aligned within the widget.
One of “left”, “center”, or “right”.
height
Specifies the height of the pop-down listbox, in rows.
postcommand
A script (possibly registered with Misc.register) that
is called immediately before displaying the values. It
may specify which values to display.
state
One of “normal”, “readonly”, or “disabled”. In the
“readonly” state, the value may not be edited directly,
and the user can only selection of the values from the
dropdown list. In the “normal” state, the text field is
directly editable. In the “disabled” state, no
interaction is possible.
textvariable
Specifies a name whose value is linked to the widget
value. Whenever the value associated with that name
changes, the widget value is updated, and vice versa.
See tkinter.StringVar.
values
Specifies the list of values to display in the
drop-down listbox.
width
Specifies an integer value indicating the desired width
of the entry window, in average-size characters of the
widget’s font.
If newindex is specified, sets the combobox value to the element
position newindex. Otherwise, returns the index of the current value or
-1 if the current value is not in the values list.
Ttk Notebook widget manages a collection of windows and displays a single
one at a time. Each child window is associated with a tab, which the user
may select to change the currently-displayed window.
This widget accepts the following specific options:
option
description
height
If present and greater than zero, specifies the desired height
of the pane area (not including internal padding or tabs).
Otherwise, the maximum height of all panes is used.
padding
Specifies the amount of extra space to add around the outside
of the notebook. The padding is a list up to four length
specifications left top right bottom. If fewer than four
elements are specified, bottom defaults to top, right defaults
to left, and top defaults to left.
width
If present and greater than zero, specified the desired width
of the pane area (not including internal padding). Otherwise,
the maximum width of all panes is used.
Either “normal”, “disabled” or “hidden”. If “disabled”, then
the tab is not selectable. If “hidden”, then the tab is not
shown.
sticky
Specifies how the child window is positioned within the pane
area. Value is a string containing zero or more of the
characters “n”, “s”, “e” or “w”. Each letter refers to a
side (north, south, east or west) that the child window will
stick to, as per the grid() geometry manager.
padding
Specifies the amount of extra space to add between the
notebook and this pane. Syntax is the same as for the option
padding used by this widget.
text
Specifies a text to be displayed in the tab.
image
Specifies an image to display in the tab. See the option
image described in Widget.
compound
Specifies how to display the image relative to the text, in
the case both options text and image are present. See
Label Options for legal values.
underline
Specifies the index (0-based) of a character to underline in
the text string. The underlined character is used for
mnemonic activation if Notebook.enable_traversal() is
called.
The tab will not be displayed, but the associated window remains
managed by the notebook and its configuration remembered. Hidden tabs
may be restored with the add() command.
pos is either the string “end”, an integer index, or the name of a
managed child. If child is already managed by the notebook, moves it to
the specified position.
See Tab Options for the list of available options.
The associated child window will be displayed, and the
previously-selected window (if different) is unmapped. If tab_id is
omitted, returns the widget name of the currently selected pane.
Query or modify the options of the specific tab_id.
If kw is not given, returns a dictionary of the tab option values. If
option is specified, returns the value of that option. Otherwise,
sets the options to the corresponding values.
Enable keyboard traversal for a toplevel window containing this notebook.
This will extend the bindings for the toplevel window containing the
notebook as follows:
Control-Tab: selects the tab following the currently selected one.
Shift-Control-Tab: selects the tab preceding the currently selected one.
Alt-K: where K is the mnemonic (underlined) character of any tab, will
select that tab.
Multiple notebooks in a single toplevel may be enabled for traversal,
including nested notebooks. However, notebook traversal only works
properly if all panes have the notebook they are in as master.
The ttk.Progressbar widget shows the status of a long-running
operation. It can operate in two modes: 1) the determinate mode which shows the
amount completed relative to the total amount of work to be done and 2) the
indeterminate mode which provides an animated display to let the user know that
work is progressing.
This widget accepts the following specific options:
option
description
orient
One of “horizontal” or “vertical”. Specifies the orientation
of the progress bar.
length
Specifies the length of the long axis of the progress bar
(width if horizontal, height if vertical).
mode
One of “determinate” or “indeterminate”.
maximum
A number specifying the maximum value. Defaults to 100.
value
The current value of the progress bar. In “determinate” mode,
this represents the amount of work completed. In
“indeterminate” mode, it is interpreted as modulo maximum;
that is, the progress bar completes one “cycle” when its value
increases by maximum.
variable
A name which is linked to the option value. If specified, the
value of the progress bar is automatically set to the value of
this name whenever the latter is modified.
phase
Read-only option. The widget periodically increments the value
of this option whenever its value is greater than 0 and, in
determinate mode, less than maximum. This option may be used
by the current theme to provide additional animation effects.
Begin autoincrement mode: schedules a recurring timer event that calls
Progressbar.step() every interval milliseconds. If omitted,
interval defaults to 50 milliseconds.
On MacOS X, toplevel windows automatically include a built-in size grip
by default. Adding a Sizegrip is harmless, since the built-in
grip will just mask the widget.
If the containing toplevel’s position was specified relative to the right
or bottom of the screen (e.g. ....), the Sizegrip widget will
not resize the window.
The ttk.Treeview widget displays a hierarchical collection of items.
Each item has a textual label, an optional image, and an optional list of data
values. The data values are displayed in successive columns after the tree
label.
The order in which data values are displayed may be controlled by setting
the widget option displaycolumns. The tree widget can also display column
headings. Columns may be accessed by number or symbolic names listed in the
widget option columns. See Column Identifiers.
Each item is identified by an unique name. The widget will generate item IDs
if they are not supplied by the caller. There is a distinguished root item,
named {}. The root item itself is not displayed; its children appear at the
top level of the hierarchy.
Each item also has a list of tags, which can be used to associate event bindings
with individual items and control the appearance of the item.
This widget accepts the following specific options:
option
description
columns
A list of column identifiers, specifying the number of
columns and their names.
displaycolumns
A list of column identifiers (either symbolic or
integer indices) specifying which data columns are
displayed and the order in which they appear, or the
string “#all”.
height
Specifies the number of rows which should be visible.
Note: the requested width is determined from the sum
of the column widths.
padding
Specifies the internal padding for the widget. The
padding is a list of up to four length specifications.
selectmode
Controls how the built-in class bindings manage the
selection. One of “extended”, “browse” or “none”.
If set to “extended” (the default), multiple items may
be selected. If “browse”, only a single item will be
selected at a time. If “none”, the selection will not
be changed.
Note that the application code and tag bindings can set
the selection however they wish, regardless of the
value of this option.
show
A list containing zero or more of the following values,
specifying which elements of the tree to display.
tree: display tree labels in column #0.
headings: display the heading row.
The default is “tree headings”, i.e., show all
elements.
Note: Column #0 always refers to the tree column,
even if show=”tree” is not specified.
The following item options may be specified for items in the insert and item
widget commands.
option
description
text
The textual label to display for the item.
image
A Tk Image, displayed to the left of the label.
values
The list of values associated with the item.
Each item should have the same number of values as the widget
option columns. If there are fewer values than columns, the
remaining values are assumed empty. If there are more values
than columns, the extra values are ignored.
open
True/False value indicating whether the item’s children should
be displayed or hidden.
Column identifiers take any of the following forms:
A symbolic name from the list of columns option.
An integer n, specifying the nth data column.
A string of the form #n, where n is an integer, specifying the nth display
column.
Notes:
Item’s option values may be displayed in a different order than the order
in which they are stored.
Column #0 always refers to the tree column, even if show=”tree” is not
specified.
A data column number is an index into an item’s option values list; a display
column number is the column number in the tree where the values are displayed.
Tree labels are displayed in column #0. If option displaycolumns is not set,
then data column n is displayed in column #n+1. Again, column #0 always
refers to the tree column.
Returns the bounding box (relative to the treeview widget’s window) of
the specified item in the form (x, y, width, height).
If column is specified, returns the bounding box of that cell. If the
item is not visible (i.e., if it is a descendant of a closed item or is
scrolled offscreen), returns an empty string.
Children present in item that are not present in newchildren are
detached from the tree. No items in newchildren may be an ancestor of
item. Note that not specifying newchildren results in detaching
item‘s children.
Query or modify the options for the specified column.
If kw is not given, returns a dict of the column option values. If
option is specified then the value for that option is returned.
Otherwise, sets the options to the corresponding values.
The valid options/values are:
id
Returns the column name. This is a read-only option.
anchor: One of the standard Tk anchor values.
Specifies how the text in this column should be aligned with respect
to the cell.
minwidth: width
The minimum width of the column in pixels. The treeview widget will
not make the column any smaller than specified by this option when
the widget is resized or the user drags a column.
stretch: True/False
Specifies whether the column’s width should be adjusted when
the widget is resized.
width: width
The width of the column in pixels.
To configure the tree column, call this with column = “#0”
Query or modify the heading options for the specified column.
If kw is not given, returns a dict of the heading option values. If
option is specified then the value for that option is returned.
Otherwise, sets the options to the corresponding values.
The valid options/values are:
text: text
The text to display in the column heading.
image: imageName
Specifies an image to display to the right of the column heading.
anchor: anchor
Specifies how the heading text should be aligned. One of the standard
Tk anchor values.
command: callback
A callback to be invoked when the heading label is pressed.
To configure the tree column heading, call this with column = “#0”.
Returns a description of the specified component under the point given
by x and y, or the empty string if no such component is present at
that position.
Creates a new item and returns the item identifier of the newly created
item.
parent is the item ID of the parent item, or the empty string to create
a new top-level item. index is an integer, or the value “end”,
specifying where in the list of parent’s children to insert the new item.
If index is less than or equal to zero, the new node is inserted at
the beginning; if index is greater than or equal to the current number
of children, it is inserted at the end. If iid is specified, it is used
as the item identifier; iid must not already exist in the tree.
Otherwise, a new unique identifier is generated.
See Item Options for the list of available points.
Query or modify the options for the specified item.
If no options are given, a dict with options/values for the item is
returned.
If option is specified then the value for that option is returned.
Otherwise, sets the options to the corresponding values as given by kw.
Moves item to position index in parent‘s list of children.
It is illegal to move an item under one of its descendants. If index is
less than or equal to zero, item is moved to the beginning; if greater
than or equal to the number of children, it is moved to the end. If item
was detached it is reattached.
With one argument, returns a dictionary of column/value pairs for the
specified item. With two arguments, returns the current value of the
specified column. With three arguments, sets the value of given
column in given item to the specified value.
Bind a callback for the given event sequence to the tag tagname.
When an event is delivered to an item, the callbacks for each of the
item’s tags option are called.
Query or modify the options for the specified tagname.
If kw is not given, returns a dict of the option settings for
tagname. If option is specified, returns the value for that option
for the specified tagname. Otherwise, sets the options to the
corresponding values for the given tagname.
If item is specified, returns 1 or 0 depending on whether the specified
item has the given tagname. Otherwise, returns a list of all items
that have the specified tag.
Each widget in ttk is assigned a style, which specifies the set of
elements making up the widget and how they are arranged, along with dynamic
and default settings for element options. By default the style name is the
same as the widget’s class name, but it may be overriden by the widget’s style
option. If you don’t know the class name of a widget, use the method
Misc.winfo_class() (somewidget.winfo_class()).
Query or sets dynamic values of the specified option(s) in style.
Each key in kw is an option and each value should be a list or a
tuple (usually) containing statespecs grouped in tuples, lists, or
some other preference. A statespec is a compound of one
or more states and then a value.
Note that the order of the (states, value) sequences for an option does
matter, if the order is changed to [('active','blue'),('pressed','red')] in the foreground option, for example, the result would be a
blue foreground when the widget were in active or pressed states.
If state is specified, it is expected to be a sequence of one or more
states. If the default argument is set, it is used as a fallback value
in case no specification for option is found.
Define the widget layout for given style. If layoutspec is omitted,
return the layout specification for given style.
layoutspec, if specified, is expected to be a list or some other
sequence type (excluding strings), where each item should be a tuple and
the first item is the layout name and the second item should have the
format described described in Layouts.
To understand the format, see the following example (it is not
intended to do anything useful):
Create a new element in the current theme, of the given etype which is
expected to be either “image”, “from” or “vsapi”. The latter is only
available in Tk 8.6a for Windows XP and Vista and is not described here.
If “image” is used, args should contain the default image name followed
by statespec/value pairs (this is the imagespec), and kw may have the
following options:
border=padding
padding is a list of up to four integers, specifying the left, top,
right, and bottom borders, respectively.
height=height
Specifies a minimum height for the element. If less than zero, the
base image’s height is used as a default.
padding=padding
Specifies the element’s interior padding. Defaults to border’s value
if not specified.
sticky=spec
Specifies how the image is placed within the final parcel. spec
contains zero or more characters “n”, “s”, “w”, or “e”.
width=width
Specifies a minimum width for the element. If less than zero, the
base image’s width is used as a default.
If “from” is used as the value of etype,
element_create() will clone an existing
element. args is expected to contain a themename, from which
the element will be cloned, and optionally an element to clone from.
If this element to clone from is not specified, an empty element will
be used. kw is discarded.
It is an error if themename already exists. If parent is specified,
the new theme will inherit styles, elements and layouts from the parent
theme. If settings are present they are expected to have the same
syntax used for theme_settings().
Temporarily sets the current theme to themename, apply specified
settings and then restore the previous theme.
Each key in settings is a style and each value may contain the keys
‘configure’, ‘map’, ‘layout’ and ‘element create’ and they are expected
to have the same format as specified by the methods
Style.configure(), Style.map(), Style.layout() and
Style.element_create() respectively.
As an example, let’s change the Combobox for the default theme a bit:
If themename is not given, returns the theme in use. Otherwise, sets
the current theme to themename, refreshes all widgets and emits a
<<ThemeChanged>> event.
A layout can be just None, if it takes no options, or a dict of
options specifying how to arrange the element. The layout mechanism
uses a simplified version of the pack geometry manager: given an
initial cavity, each element is allocated a parcel. Valid
options/values are:
side: whichside
Specifies which side of the cavity to place the element; one of
top, right, bottom or left. If omitted, the element occupies the
entire cavity.
sticky: nswe
Specifies where the element is placed inside its allocated parcel.
unit: 0 or 1
If set to 1, causes the element and all of its descendants to be treated as
a single element for the purposes of Widget.identify() et al. It’s
used for things like scrollbar thumbs with grips.
children: [sublayout... ]
Specifies a list of elements to place inside the element. Each
element is a tuple (or other sequence type) where the first item is
the layout name, and the other is a Layout.
The tkinter.tix (Tk Interface Extension) module provides an additional
rich set of widgets. Although the standard Tk library has many useful widgets,
they are far from complete. The tkinter.tix library provides most of the
commonly needed widgets that are missing from standard Tk: HList,
ComboBox, Control (a.k.a. SpinBox) and an assortment of
scrollable widgets.
tkinter.tix also includes many more widgets that are generally useful in
a wide range of applications: NoteBook, FileEntry,
PanedWindow, etc; there are more than 40 of them.
With all these new widgets, you can introduce new interaction techniques into
applications, creating more useful and more intuitive user interfaces. You can
design your application by choosing the most appropriate widgets to match the
special needs of your application and users.
Tix applications for development of Tix and Tkinter programs. Tide applications
work under Tk or Tkinter, and include TixInspect, an inspector to
remotely modify and debug Tix/Tk/Tkinter applications.
class tkinter.tix.Tk(screenName=None, baseName=None, className='Tix')¶
Toplevel widget of Tix which represents mostly the main window of an
application. It has an associated Tcl interpreter.
Classes in the tkinter.tix module subclasses the classes in the
tkinter. The former imports the latter, so to use tkinter.tix
with Tkinter, all you need to do is to import one module. In general, you
can just import tkinter.tix, and replace the toplevel call to
tkinter.Tk with tix.Tk:
To use tkinter.tix, you must have the Tix widgets installed, usually
alongside your installation of the Tk widgets. To test your installation, try
the following:
If this fails, you have a Tk installation problem which must be resolved before
proceeding. Use the environment variable TIX_LIBRARY to point to the
installed Tix library directory, and make sure you have the dynamic
object library (tix8183.dll or libtix8183.so) in the same
directory that contains your Tk dynamic object library (tk8183.dll or
libtk8183.so). The directory with the dynamic object library should also
have a file called pkgIndex.tcl (case sensitive), which contains the
line:
A Balloon that
pops up over a widget to provide help. When the user moves the cursor inside a
widget to which a Balloon widget has been bound, a small pop-up window with a
descriptive message will be shown on the screen.
The ComboBox
widget is similar to the combo box control in MS Windows. The user can select a
choice by either typing in the entry subwidget or selecting from the listbox
subwidget.
The Control
widget is also known as the SpinBox widget. The user can adjust the
value by pressing the two arrow buttons or by entering the value directly into
the entry. The new value will be checked against the user-defined upper and
lower limits.
The LabelEntry
widget packages an entry widget and a label into one mega widget. It can be used
be used to simplify the creation of “entry-form” type of interface.
The LabelFrame
widget packages a frame widget and a label into one mega widget. To create
widgets inside a LabelFrame widget, one creates the new widgets relative to the
frame subwidget and manage them inside the frame subwidget.
The PopupMenu
widget can be used as a replacement of the tk_popup command. The advantage
of the TixPopupMenu widget is it requires less application code
to manipulate.
The DirList
widget displays a list view of a directory, its previous directories and its
sub-directories. The user can choose one of the directories displayed in the
list or change to another directory.
The DirTree
widget displays a tree view of a directory, its previous directories and its
sub-directories. The user can choose one of the directories displayed in the
list or change to another directory.
The DirSelectDialog
widget presents the directories in the file system in a dialog window. The user
can use this dialog window to navigate through the file system to select the
desired directory.
The DirSelectBox is similar to the standard Motif(TM)
directory-selection box. It is generally used for the user to choose a
directory. DirSelectBox stores the directories mostly recently selected into
a ComboBox widget so that they can be quickly selected again.
The ExFileSelectBox
widget is usually embedded in a tixExFileSelectDialog widget. It provides an
convenient method for the user to select files. The style of the
ExFileSelectBox widget is very similar to the standard file dialog on
MS Windows 3.1.
The FileSelectBox
is similar to the standard Motif(TM) file-selection box. It is generally used
for the user to choose a file. FileSelectBox stores the files mostly recently
selected into a ComboBox widget so that they can be quickly selected
again.
The FileEntry
widget can be used to input a filename. The user can type in the filename
manually. Alternatively, the user can press the button widget that sits next to
the entry, which will bring up a file selection dialog.
The HList widget
can be used to display any data that have a hierarchical structure, for example,
file system directory trees. The list entries are indented and connected by
branch lines according to their places in the hierarchy.
The CheckList
widget displays a list of items to be selected by the user. CheckList acts
similarly to the Tk checkbutton or radiobutton widgets, except it is capable of
handling many more items than checkbuttons or radiobuttons.
The Tree widget
can be used to display hierarchical data in a tree form. The user can adjust the
view of the tree by opening or closing parts of the tree.
The TList widget
can be used to display data in a tabular format. The list entries of a
TList widget are similar to the entries in the Tk listbox widget. The
main differences are (1) the TList widget can display the list entries
in a two dimensional format and (2) you can use graphical images as well as
multiple colors and fonts for the list entries.
The PanedWindow
widget allows the user to interactively manipulate the sizes of several panes.
The panes can be arranged either vertically or horizontally. The user changes
the sizes of the panes by dragging the resize handle between two panes.
The ListNoteBook
widget is very similar to the TixNoteBook widget: it can be used to
display many windows in a limited space using a notebook metaphor. The notebook
is divided into a stack of pages (windows). At one time only one of these pages
can be shown. The user can navigate through these pages by choosing the name of
the desired page in the hlist subwidget.
The NoteBook
widget can be used to display many windows in a limited space using a notebook
metaphor. The notebook is divided into a stack of pages. At one time only one of
these pages can be shown. The user can navigate through these pages by choosing
the visual “tabs” at the top of the NoteBook widget.
pixmap
capabilities to all tkinter.tix and tkinter widgets to create
color images from XPM files.
Compound image
types can be used to create images that consists of multiple horizontal lines;
each line is composed of a series of items (texts, bitmaps, images or spaces)
arranged from left to right. For example, a compound image can be used to
display a bitmap and a text string simultaneously in a Tk Button
widget.
The tix commands provide
access to miscellaneous elements of Tix‘s internal state and the
Tix application context. Most of the information manipulated by these
methods pertains to the application as a whole, or to a screen or display,
rather than to a particular window.
To view the current settings, the common usage is:
Query or modify the configuration options of the Tix application context. If no
option is specified, returns a dictionary all of the available options. If
option is specified with no value, then the method returns a list describing the
one named option (this list will be identical to the corresponding sublist of
the value returned if no option is specified). If one or more option-value
pairs are specified, then the method modifies the given option(s) to have the
given value(s); in this case the method returns an empty string. Option may be
any of the configuration options.
Locates a bitmap file of the name name.xpm or name in one of the bitmap
directories (see the tix_addbitmapdir() method). By using
tix_getbitmap(), you can avoid hard coding the pathnames of the bitmap
files in your application. When successful, it returns the complete pathname of
the bitmap file, prefixed with the character @. The returned value can be
used to configure the bitmap option of the Tk and Tix widgets.
Tix maintains a list of directories under which the tix_getimage() and
tix_getbitmap() methods will search for image files. The standard bitmap
directory is $TIX_LIBRARY/bitmaps. The tix_addbitmapdir() method
adds directory into this list. By using this method, the image files of an
applications can also be located using the tix_getimage() or
tix_getbitmap() method.
Returns the file selection dialog that may be shared among different calls from
this application. This method will create a file selection dialog widget when
it is called the first time. This dialog will be returned by all subsequent
calls to tix_filedialog(). An optional dlgclass parameter can be passed
as a string to specified what type of file selection dialog widget is desired.
Possible options are tix, FileSelectDialog or tixExFileSelectDialog.
Locates an image file of the name name.xpm, name.xbm or
name.ppm in one of the bitmap directories (see the
tix_addbitmapdir() method above). If more than one file with the same name
(but different extensions) exist, then the image type is chosen according to the
depth of the X display: xbm images are chosen on monochrome displays and color
images are chosen on color displays. By using tix_getimage(), you can
avoid hard coding the pathnames of the image files in your application. When
successful, this method returns the name of the newly created image, which can
be used to configure the image option of the Tk and Tix widgets.
Resets the scheme and fontset of the Tix application to newScheme and
newFontSet, respectively. This affects only those widgets created after this
call. Therefore, it is best to call the resetoptions method before the creation
of any widgets in a Tix application.
The optional parameter newScmPrio can be given to reset the priority level of
the Tk options set by the Tix schemes.
Because of the way Tk handles the X option database, after Tix has been has
imported and inited, it is not possible to reset the color schemes and font sets
using the tix_config() method. Instead, the tix_resetoptions()
method must be used.
The tkinter.scrolledtext module provides a class of the same name which
implements a basic text widget which has a vertical scroll bar configured to do
the “right thing.” Using the ScrolledText class is a lot easier than
setting up a text widget and scroll bar directly. The constructor is the same
as that of the tkinter.Text class.
The text widget and scrollbar are packed together in a Frame, and the
methods of the Grid and Pack geometry managers are acquired
from the Frame object. This allows the ScrolledText widget to
be used directly to achieve most normal geometry management behavior.
Should more specific control be necessary, the following attributes are
available:
After a block-opening statement, the next line is indented by 4 spaces (in the
Python Shell window by one tab). After certain keywords (break, return etc.)
the next line is dedented. In leading indentation, Backspace deletes up
to 4 spaces if they are there. Tab inserts 1-4 spaces (in the Python
Shell window one tab). See also the indent/dedent region commands in the edit
menu.
The coloring is applied in a background “thread,” so you may occasionally see
uncolorized text. To change the color scheme, edit the [Colors] section in
config.txt.
Upon startup with the -s option, IDLE will execute the file referenced by
the environment variables IDLESTARTUP or PYTHONSTARTUP.
Idle first checks for IDLESTARTUP; if IDLESTARTUP is present the file
referenced is run. If IDLESTARTUP is not present, Idle checks for
PYTHONSTARTUP. Files referenced by these environment variables are
convenient places to store functions that are used frequently from the Idle
shell, or for executing import statements to import common modules.
In addition, Tk also loads a startup file if it is present. Note that the
Tk file is loaded unconditionally. This additional file is .Idle.py and is
looked for in the user’s home directory. Statements in this file will be
executed in the Tk namespace, so this file is not useful for importing functions
to be used from Idle’s Python shell.
idle.py [-c command] [-d] [-e] [-s] [-t title] [arg] ...
-c command run this command
-d enable debugger
-e edit mode; arguments are files to be edited
-s run $IDLESTARTUP or $PYTHONSTARTUP first
-t title set title of shell window
If there are arguments:
If -e is used, arguments are files opened for editing and
sys.argv reflects the arguments passed to IDLE itself.
Otherwise, if -c is used, all arguments are placed in
sys.argv[1:...], with sys.argv[0] set to '-c'.
Otherwise, if neither -e nor -c is used, the first
argument is a script which is executed with the remaining arguments in
sys.argv[1:...] and sys.argv[0] set to the script name. If the script
name is ‘-‘, no script is executed but an interactive Python session is started;
the arguments are still available in sys.argv.
is a toolkit for building high-level compound widgets in Python using the
tkinter package. It consists of a set of base classes and a library of
flexible and extensible megawidgets built on this foundation. These megawidgets
include notebooks, comboboxes, selection widgets, paned widgets, scrolled
widgets, dialog windows, etc. Also, with the Pmw.Blt interface to BLT, the
busy, graph, stripchart, tabset and vector commands are be available.
The initial ideas for Pmw were taken from the Tk itcl extensions [incrTk] by Michael McLennan and [incrWidgets] by Mark Ulferts. Several of the
megawidgets are direct translations from the itcl to Python. It offers most of
the range of widgets that [incrWidgets] does, and is almost as complete as
Tix, lacking however Tix’s fast HList widget for drawing trees.
is a library that allows you to write new Tkinter widgets in pure Python. The
WCK framework gives you full control over widget creation, configuration, screen
appearance, and event handling. WCK widgets can be very fast and light-weight,
since they can operate directly on Python data structures, without having to
transfer data through the Tk/Tcl layer.
The major cross-platform (Windows, Mac OS X, Unix-like) GUI toolkits that are
also available for Python:
is a set of bindings for the GTK widget set. It
provides an object oriented interface that is slightly higher level than
the C one. It comes with many more widgets than Tkinter provides, and has
good Python-specific reference documentation. There are also bindings to
GNOME. One well known PyGTK application is
PythonCAD. An online tutorial is available.
PyQt is a sip-wrapped binding to the Qt toolkit. Qt is an
extensive C++ GUI application development framework that is
available for Unix, Windows and Mac OS X. sip is a tool
for generating bindings for C++ libraries as Python classes, and
is specifically designed for Python. The PyQt3 bindings have a
book, GUI Programming with Python: QT Edition by Boudewijn
Rempt. The PyQt4 bindings also have a book, Rapid GUI Programming
with Python and Qt, by Mark
Summerfield.
wxPython is a cross-platform GUI toolkit for Python that is built around
the popular wxWidgets (formerly wxWindows)
C++ toolkit. It provides a native look and feel for applications on
Windows, Mac OS X, and Unix systems by using each platform’s native
widgets where ever possible, (GTK+ on Unix-like systems). In addition to
an extensive set of widgets, wxPython provides classes for online
documentation and context sensitive help, printing, HTML viewing,
low-level device context drawing, drag and drop, system clipboard access,
an XML-based resource format and more, including an ever growing library
of user-contributed modules. wxPython has a book, wxPython in Action, by Noel Rappin and
Robin Dunn.
PyGTK, PyQt, and wxPython, all have a modern look and feel and more
widgets than Tkinter. In addition, there are many other GUI toolkits for
Python, both cross-platform, and platform-specific. See the GUI Programming page in the Python Wiki for a
much more complete list, and also for links to documents where the
different GUI toolkits are compared.
The modules described in this chapter help you write software. For example, the
pydoc module takes a module and generates documentation based on the
module’s contents. The doctest and unittest modules contains
frameworks for writing unit tests that automatically exercise code and verify
that the expected output is produced. 2to3 can translate Python 2.x
source code into valid Python 3.x code.
The list of modules described in this chapter is:
pydoc — Documentation generator and online help system¶
The pydoc module automatically generates documentation from Python
modules. The documentation can be presented as pages of text on the console,
served to a Web browser, or saved to HTML files.
The built-in function help() invokes the online help system in the
interactive interpreter, which uses pydoc to generate its documentation
as text on the console. The same text documentation can also be viewed from
outside the Python interpreter by running pydoc as a script at the
operating system’s command prompt. For example, running
pydoc sys
at a shell prompt will display documentation on the sys module, in a
style similar to the manual pages shown by the Unix man command. The
argument to pydoc can be the name of a function, module, or package,
or a dotted reference to a class, method, or function within a module or module
in a package. If the argument to pydoc looks like a path (that is,
it contains the path separator for your operating system, such as a slash in
Unix), and refers to an existing Python source file, then documentation is
produced for that file.
Note
In order to find objects and their documentation, pydoc imports the
module(s) to be documented. Therefore, any code on module level will be
executed on that occasion. Use an if__name__=='__main__': guard to
only execute code when a file is invoked as a script and not just imported.
Specifying a -w flag before the argument will cause HTML documentation
to be written out to a file in the current directory, instead of displaying text
on the console.
Specifying a -k flag before the argument will search the synopsis
lines of all available modules for the keyword given as the argument, again in a
manner similar to the Unix man command. The synopsis line of a
module is the first line of its documentation string.
You can also use pydoc to start an HTTP server on the local machine
that will serve documentation to visiting Web browsers. pydoc -p 1234
will start a HTTP server on port 1234, allowing you to browse the
documentation at http://localhost:1234/ in your preferred Web browser.
Specifying 0 as the port number will select an arbitrary unused port.
pydoc -g will start the server and additionally bring up a
small tkinter-based graphical interface to help you search for
documentation pages. The -g option is deprecated, since the server can
now be controlled directly from HTTP clients.
pydoc -b will start the server and additionally open a web
browser to a module index page. Each served page has a navigation bar at the
top where you can Get help on an individual item, Search all modules with a
keyword in their synopsis line, and go to the Module index, Topics and
Keywords pages.
When pydoc generates documentation, it uses the current environment
and path to locate modules. Thus, invoking pydoc spam
documents precisely the version of the module you would get if you started the
Python interpreter and typed importspam.
Module docs for core modules are assumed to reside in
http://docs.python.org/X.Y/library/ where X and Y are the
major and minor version numbers of the Python interpreter. This can
be overridden by setting the PYTHONDOCS environment variable
to a different URL or to a local directory containing the Library
Reference Manual pages.
Changed in version 3.2:
Changed in version 3.2: Added the -b option, deprecated the -g option.
The doctest module searches for pieces of text that look like interactive
Python sessions, and then executes those sessions to verify that they work
exactly as shown. There are several common ways to use doctest:
To check that a module’s docstrings are up-to-date by verifying that all
interactive examples still work as documented.
To perform regression testing by verifying that interactive examples from a
test file or a test object work as expected.
To write tutorial documentation for a package, liberally illustrated with
input-output examples. Depending on whether the examples or the expository text
are emphasized, this has the flavor of “literate testing” or “executable
documentation”.
Here’s a complete but small example module:
"""This is the "example" module.The example module supplies one function, factorial(). For example,>>> factorial(5)120"""deffactorial(n):"""Return the factorial of n, an exact integer >= 0. >>> [factorial(n) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> factorial(30) 265252859812191058636308480000000 >>> factorial(-1) Traceback (most recent call last): ... ValueError: n must be >= 0 Factorials of floats are OK, but the float must be an exact integer: >>> factorial(30.1) Traceback (most recent call last): ... ValueError: n must be exact integer >>> factorial(30.0) 265252859812191058636308480000000 It must also not be ridiculously large: >>> factorial(1e100) Traceback (most recent call last): ... OverflowError: n too large """importmathifnotn>=0:raiseValueError("n must be >= 0")ifmath.floor(n)!=n:raiseValueError("n must be exact integer")ifn+1==n:# catch a value like 1e300raiseOverflowError("n too large")result=1factor=2whilefactor<=n:result*=factorfactor+=1returnresultif__name__=="__main__":importdoctestdoctest.testmod()
If you run example.py directly from the command line, doctest
works its magic:
$ python example.py
$
There’s no output! That’s normal, and it means all the examples worked. Pass
-v to the script, and doctest prints a detailed log of what
it’s trying, and prints a summary at the end:
$ python example.py -v
Trying:
factorial(5)
Expecting:
120
ok
Trying:
[factorial(n) for n in range(6)]
Expecting:
[1, 1, 2, 6, 24, 120]
ok
And so on, eventually ending with:
Trying:
factorial(1e100)
Expecting:
Traceback (most recent call last):
...
OverflowError: n too large
ok
2 items passed all tests:
1 tests in __main__
8 tests in __main__.factorial
9 tests in 2 items.
9 passed and 0 failed.
Test passed.
$
That’s all you need to know to start making productive use of doctest!
Jump in. The following sections provide full details. Note that there are many
examples of doctests in the standard Python test suite and libraries.
Especially useful examples can be found in the standard test file
Lib/test/test_doctest.py.
Running the module as a script causes the examples in the docstrings to get
executed and verified:
python M.py
This won’t display anything unless an example fails, in which case the failing
example(s) and the cause(s) of the failure(s) are printed to stdout, and the
final line of output is ***TestFailed***Nfailures., where N is the
number of examples that failed.
Run it with the -v switch instead:
python M.py -v
and a detailed report of all examples tried is printed to standard output, along
with assorted summaries at the end.
You can force verbose mode by passing verbose=True to testmod(), or
prohibit it by passing verbose=False. In either of those cases,
sys.argv is not examined by testmod() (so passing -v or not
has no effect).
There is also a command line shortcut for running testmod(). You can
instruct the Python interpreter to run the doctest module directly from the
standard library and pass the module name(s) on the command line:
python -m doctest -v example.py
This will import example.py as a standalone module and run
testmod() on it. Note that this may not work correctly if the file is
part of a package and imports other submodules from that package.
Another simple application of doctest is testing interactive examples in a text
file. This can be done with the testfile() function:
importdoctestdoctest.testfile("example.txt")
That short script executes and verifies any interactive Python examples
contained in the file example.txt. The file content is treated as if it
were a single giant docstring; the file doesn’t need to contain a Python
program! For example, perhaps example.txt contains this:
The ``example`` module
======================
Using ``factorial``
-------------------
This is an example text file in reStructuredText format. First import
``factorial`` from the ``example`` module:
>>> from example import factorial
Now use it:
>>> factorial(6)
120
Running doctest.testfile("example.txt") then finds the error in this
documentation:
File "./example.txt", line 14, in example.txt
Failed example:
factorial(6)
Expected:
120
Got:
720
As with testmod(), testfile() won’t display anything unless an
example fails. If an example does fail, then the failing example(s) and the
cause(s) of the failure(s) are printed to stdout, using the same format as
testmod().
By default, testfile() looks for files in the calling module’s directory.
See section Basic API for a description of the optional arguments
that can be used to tell it to look for files in other locations.
Like testmod(), testfile()‘s verbosity can be set with the
-v command-line switch or with the optional keyword argument
verbose.
There is also a command line shortcut for running testfile(). You can
instruct the Python interpreter to run the doctest module directly from the
standard library and pass the file name(s) on the command line:
python -m doctest -v example.txt
Because the file name does not end with .py, doctest infers that
it must be run with testfile(), not testmod().
This section examines in detail how doctest works: which docstrings it looks at,
how it finds interactive examples, what execution context it uses, how it
handles exceptions, and how option flags can be used to control its behavior.
This is the information that you need to know to write doctest examples; for
information about actually running doctest on these examples, see the following
sections.
The module docstring, and all function, class and method docstrings are
searched. Objects imported into the module are not searched.
In addition, if M.__test__ exists and “is true”, it must be a dict, and each
entry maps a (string) name to a function object, class object, or string.
Function and class object docstrings found from M.__test__ are searched, and
strings are treated as if they were docstrings. In output, a key K in
M.__test__ appears with name
<name of M>.__test__.K
Any classes found are recursively searched similarly, to test docstrings in
their contained methods and nested classes.
In most cases a copy-and-paste of an interactive console session works fine,
but doctest isn’t trying to do an exact emulation of any specific Python shell.
Any expected output must immediately follow the final '>>>' or '...'
line containing the code, and the expected output (if any) extends to the next
'>>>' or all-whitespace line.
The fine print:
Expected output cannot contain an all-whitespace line, since such a line is
taken to signal the end of expected output. If expected output does contain a
blank line, put <BLANKLINE> in your doctest example each place a blank line
is expected.
All hard tab characters are expanded to spaces, using 8-column tab stops.
Tabs in output generated by the tested code are not modified. Because any
hard tabs in the sample output are expanded, this means that if the code
output includes hard tabs, the only way the doctest can pass is if the
NORMALIZE_WHITESPACE option or directive is in effect.
Alternatively, the test can be rewritten to capture the output and compare it
to an expected value as part of the test. This handling of tabs in the
source was arrived at through trial and error, and has proven to be the least
error prone way of handling them. It is possible to use a different
algorithm for handling tabs by writing a custom DocTestParser class.
Output to stdout is captured, but not output to stderr (exception tracebacks
are captured via a different means).
If you continue a line via backslashing in an interactive session, or for any
other reason use a backslash, you should use a raw docstring, which will
preserve your backslashes exactly as you type them:
>>> deff(x):... r'''Backslashes in a raw docstring: m\n'''>>> print(f.__doc__)Backslashes in a raw docstring: m\n
Otherwise, the backslash will be interpreted as part of the string. For example,
the “\” above would be interpreted as a newline character. Alternatively, you
can double each backslash in the doctest version (and not use a raw string):
>>> deff(x):... '''Backslashes in a raw docstring: m\\n'''>>> print(f.__doc__)Backslashes in a raw docstring: m\n
The starting column doesn’t matter:
>>> assert"Easy!" >>> import math >>> math.floor(1.9) 1
and as many leading whitespace characters are stripped from the expected output
as appeared in the initial '>>>' line that started the example.
By default, each time doctest finds a docstring to test, it uses a
shallow copy of M‘s globals, so that running tests doesn’t change the
module’s real globals, and so that one test in M can’t leave behind
crumbs that accidentally allow another test to work. This means examples can
freely use any names defined at top-level in M, and names defined earlier
in the docstring being run. Examples cannot see names defined in other
docstrings.
You can force use of your own dict as the execution context by passing
globs=your_dict to testmod() or testfile() instead.
No problem, provided that the traceback is the only output produced by the
example: just paste in the traceback. [1] Since tracebacks contain details
that are likely to change rapidly (for example, exact file paths and line
numbers), this is one case where doctest works hard to be flexible in what it
accepts.
Simple example:
>>> [1,2,3].remove(42)Traceback (most recent call last):
File "<stdin>", line 1, in ?ValueError: list.remove(x): x not in list
That doctest succeeds if ValueError is raised, with the list.remove(x):xnotinlist detail as shown.
The expected output for an exception must start with a traceback header, which
may be either of the following two lines, indented the same as the first line of
the example:
The traceback header is followed by an optional traceback stack, whose contents
are ignored by doctest. The traceback stack is typically omitted, or copied
verbatim from an interactive session.
The traceback stack is followed by the most interesting part: the line(s)
containing the exception type and detail. This is usually the last line of a
traceback, but can extend across multiple lines if the exception has a
multi-line detail:
>>> raiseValueError('multi\n line\ndetail')Traceback (most recent call last):
File "<stdin>", line 1, in ?ValueError: multi linedetail
The last three lines (starting with ValueError) are compared against the
exception’s type and detail, and the rest are ignored.
Best practice is to omit the traceback stack, unless it adds significant
documentation value to the example. So the last example is probably better as:
>>> raiseValueError('multi\n line\ndetail')Traceback (most recent call last):...ValueError: multi linedetail
Note that tracebacks are treated very specially. In particular, in the
rewritten example, the use of ... is independent of doctest’s
ELLIPSIS option. The ellipsis in that example could be left out, or
could just as well be three (or three hundred) commas or digits, or an indented
transcript of a Monty Python skit.
Some details you should read once, but won’t need to remember:
Doctest can’t guess whether your expected output came from an exception
traceback or from ordinary printing. So, e.g., an example that expects
ValueError:42isprime will pass whether ValueError is actually
raised or if the example merely prints that traceback text. In practice,
ordinary output rarely begins with a traceback header line, so this doesn’t
create real problems.
Each line of the traceback stack (if present) must be indented further than
the first line of the example, or start with a non-alphanumeric character.
The first line following the traceback header indented the same and starting
with an alphanumeric is taken to be the start of the exception detail. Of
course this does the right thing for genuine tracebacks.
When the IGNORE_EXCEPTION_DETAIL doctest option is specified,
everything following the leftmost colon and any module information in the
exception name is ignored.
The interactive shell omits the traceback header line for some
SyntaxErrors. But doctest uses the traceback header line to
distinguish exceptions from non-exceptions. So in the rare case where you need
to test a SyntaxError that omits the traceback header, you will need to
manually add the traceback header line to your test example.
For some SyntaxErrors, Python displays the character position of the
syntax error, using a ^ marker:
>>> 11
File "<stdin>", line 111^SyntaxError: invalid syntax
Since the lines showing the position of the error come before the exception type
and detail, they are not checked by doctest. For example, the following test
would pass, even though it puts the ^ marker in the wrong location:
>>> 11
File "<stdin>", line 111^SyntaxError: invalid syntax
A number of option flags control various aspects of doctest’s behavior.
Symbolic names for the flags are supplied as module constants, which can be
or’ed together and passed to various functions. The names can also be used in
doctest directives (see below).
The first group of options define test semantics, controlling aspects of how
doctest decides whether actual output matches an example’s expected output:
By default, if an expected output block contains just 1, an actual output
block containing just 1 or just True is considered to be a match, and
similarly for 0 versus False. When DONT_ACCEPT_TRUE_FOR_1 is
specified, neither substitution is allowed. The default behavior caters to that
Python changed the return type of many functions from integer to boolean;
doctests expecting “little integer” output still work in these cases. This
option will probably go away, but not for several years.
By default, if an expected output block contains a line containing only the
string <BLANKLINE>, then that line will match a blank line in the actual
output. Because a genuinely blank line delimits the expected output, this is
the only way to communicate that a blank line is expected. When
DONT_ACCEPT_BLANKLINE is specified, this substitution is not allowed.
When specified, all sequences of whitespace (blanks and newlines) are treated as
equal. Any sequence of whitespace within the expected output will match any
sequence of whitespace within the actual output. By default, whitespace must
match exactly. NORMALIZE_WHITESPACE is especially useful when a line of
expected output is very long, and you want to wrap it across multiple lines in
your source.
When specified, an ellipsis marker (...) in the expected output can match
any substring in the actual output. This includes substrings that span line
boundaries, and empty substrings, so it’s best to keep usage of this simple.
Complicated uses can lead to the same kinds of “oops, it matched too much!”
surprises that .* is prone to in regular expressions.
When specified, an example that expects an exception passes if an exception of
the expected type is raised, even if the exception detail does not match. For
example, an example expecting ValueError:42 will pass if the actual
exception raised is ValueError:3*14, but will fail, e.g., if
TypeError is raised.
It will also ignore the module name used in Python 3 doctest reports. Hence
both these variations will work regardless of whether the test is run under
Python 2.7 or Python 3.2 (or later versions):
Note that ELLIPSIS can also be used to ignore the
details of the exception message, but such a test may still fail based
on whether or not the module details are printed as part of the
exception name. Using IGNORE_EXCEPTION_DETAIL and the details
from Python 2.3 is also the only clear way to write a doctest that doesn’t
care about the exception detail yet continues to pass under Python 2.3 or
earlier (those releases do not support doctest directives and ignore them
as irrelevant comments). For example,
>>> (1,2)[3]='moo'Traceback (most recent call last):
File "<stdin>", line 1, in ?TypeError: object doesn't support item assignment
passes under Python 2.3 and later Python versions, even though the detail
changed in Python 2.4 to say “does not” instead of “doesn’t”.
Changed in version 3.2:
Changed in version 3.2: IGNORE_EXCEPTION_DETAIL now also ignores any information relating
to the module containing the exception under test.
When specified, do not run the example at all. This can be useful in contexts
where doctest examples serve as both documentation and test cases, and an
example should be included for documentation purposes, but should not be
checked. E.g., the example’s output might be random; or the example might
depend on resources which would be unavailable to the test driver.
The SKIP flag can also be used for temporarily “commenting out” examples.
When specified, differences are computed by difflib.Differ, using the same
algorithm as the popular ndiff.py utility. This is the only method that
marks differences within lines as well as across lines. For example, if a line
of expected output contains digit 1 where actual output contains letter
l, a line is inserted with a caret marking the mismatching column positions.
When specified, display the first failing example in each doctest, but suppress
output for all remaining examples. This will prevent doctest from reporting
correct examples that break because of earlier failures; but it might also hide
incorrect examples that fail independently of the first failure. When
REPORT_ONLY_FIRST_FAILURE is specified, the remaining examples are
still run, and still count towards the total number of failures reported; only
the output is suppressed.
A bitmask or’ing together all the reporting flags above.
“Doctest directives” may be used to modify the option flags for individual
examples. Doctest directives are expressed as a special Python comment
following an example’s source code:
Whitespace is not allowed between the + or - and the directive option
name. The directive option name can be any of the option flag names explained
above.
An example’s doctest directives modify doctest’s behavior for that single
example. Use + to enable the named behavior, or - to disable it.
Without the directive it would fail, both because the actual output doesn’t have
two blanks before the single-digit list elements, and because the actual output
is on a single line. This test also passes, and also requires a directive to do
so:
>>> print(list(range(20)))[0, 1, ..., 18, 19]
Multiple directives can be used on a single physical line, separated by commas:
>>> print(list(range(20)))[0, 1, ..., 18, 19]
If multiple directive comments are used for a single example, then they are
combined:
>>> print(list(range(20)))... [0, 1, ..., 18, 19]
As the previous example shows, you can add ... lines to your example
containing only directives. This can be useful when an example is too long for
a directive to comfortably fit on the same line:
Note that since all options are disabled by default, and directives apply only
to the example they appear in, enabling options (via + in a directive) is
usually the only meaningful choice. However, option flags can also be passed to
functions that run doctests, establishing different defaults. In such cases,
disabling an option via - in a directive can be useful.
There’s also a way to register new option flag names, although this isn’t useful
unless you intend to extend doctest internals via subclassing:
Create a new option flag with a given name, and return the new flag’s integer
value. register_optionflag() can be used when subclassing
OutputChecker or DocTestRunner to create new options that are
supported by your subclasses. register_optionflag() should always be
called using the following idiom:
doctest is serious about requiring exact matches in expected output. If
even a single character doesn’t match, the test fails. This will probably
surprise you a few times, as you learn exactly what Python does and doesn’t
guarantee about output. For example, when printing a dict, Python doesn’t
guarantee that the key-value pairs will be printed in any particular order, so a
test like
Another bad idea is to print things that embed an object address, like
>>> id(1.0)# certain to fail some of the time7948648>>> classC:pass>>> C()# the default repr() for instances embeds an address<__main__.C instance at 0x00AC18F0>
The ELLIPSIS directive gives a nice approach for the last example:
>>> C()<__main__.C instance at 0x...>
Floating-point numbers are also subject to small output variations across
platforms, because Python defers to the platform C library for float formatting,
and C libraries vary widely in quality here.
>>> 1./7# risky0.14285714285714285>>> print(1./7)# safer0.142857142857>>> print(round(1./7,6))# much safer0.142857
Numbers of the form I/2.**J are safe across all platforms, and I often
contrive doctest examples to produce numbers of that form:
>>> 3./4# utterly safe0.75
Simple fractions are also easier for people to understand, and that makes for
better documentation.
All arguments except filename are optional, and should be specified in keyword
form.
Test examples in the file named filename. Return (failure_count,test_count).
Optional argument module_relative specifies how the filename should be
interpreted:
If module_relative is True (the default), then filename specifies an
OS-independent module-relative path. By default, this path is relative to the
calling module’s directory; but if the package argument is specified, then it
is relative to that package. To ensure OS-independence, filename should use
/ characters to separate path segments, and may not be an absolute path
(i.e., it may not begin with /).
If module_relative is False, then filename specifies an OS-specific
path. The path may be absolute or relative; relative paths are resolved with
respect to the current working directory.
Optional argument name gives the name of the test; by default, or if None,
os.path.basename(filename) is used.
Optional argument package is a Python package or the name of a Python package
whose directory should be used as the base directory for a module-relative
filename. If no package is specified, then the calling module’s directory is
used as the base directory for module-relative filenames. It is an error to
specify package if module_relative is False.
Optional argument globs gives a dict to be used as the globals when executing
examples. A new shallow copy of this dict is created for the doctest, so its
examples start with a clean slate. By default, or if None, a new empty dict
is used.
Optional argument extraglobs gives a dict merged into the globals used to
execute examples. This works like dict.update(): if globs and
extraglobs have a common key, the associated value in extraglobs appears in
the combined dict. By default, or if None, no extra globals are used. This
is an advanced feature that allows parameterization of doctests. For example, a
doctest can be written for a base class, using a generic name for the class,
then reused to test any number of subclasses by passing an extraglobs dict
mapping the generic name to the subclass to be tested.
Optional argument verbose prints lots of stuff if true, and prints only
failures if false; by default, or if None, it’s true if and only if '-v'
is in sys.argv.
Optional argument report prints a summary at the end when true, else prints
nothing at the end. In verbose mode, the summary is detailed, else the summary
is very brief (in fact, empty if all tests passed).
Optional argument raise_on_error defaults to false. If true, an exception is
raised upon the first failure or unexpected exception in an example. This
allows failures to be post-mortem debugged. Default behavior is to continue
running examples.
Optional argument parser specifies a DocTestParser (or subclass) that
should be used to extract tests from the files. It defaults to a normal parser
(i.e., DocTestParser()).
Optional argument encoding specifies an encoding that should be used to
convert the file to unicode.
All arguments are optional, and all except for m should be specified in
keyword form.
Test examples in docstrings in functions and classes reachable from module m
(or module __main__ if m is not supplied or is None), starting with
m.__doc__.
Also test examples reachable from dict m.__test__, if it exists and is not
None. m.__test__ maps names (strings) to functions, classes and
strings; function and class docstrings are searched for examples; strings are
searched directly, as if they were docstrings.
Only docstrings attached to objects belonging to module m are searched.
Return (failure_count,test_count).
Optional argument name gives the name of the module; by default, or if
None, m.__name__ is used.
Optional argument exclude_empty defaults to false. If true, objects for which
no doctests are found are excluded from consideration. The default is a backward
compatibility hack, so that code still using doctest.master.summarize() in
conjunction with testmod() continues to get output for objects with no
tests. The exclude_empty argument to the newer DocTestFinder
constructor defaults to true.
Optional arguments extraglobs, verbose, report, optionflags,
raise_on_error, and globs are the same as for function testfile()
above, except that globs defaults to m.__dict__.
There’s also a function to run the doctests associated with a single object.
This function is provided for backward compatibility. There are no plans to
deprecate it, but it’s rarely useful:
Test examples associated with object f; for example, f may be a module,
function, or class object.
A shallow copy of dictionary argument globs is used for the execution context.
Optional argument name is used in failure messages, and defaults to
"NoName".
If optional argument verbose is true, output is generated even if there are no
failures. By default, output is generated only in case of an example failure.
Optional argument compileflags gives the set of flags that should be used by
the Python compiler when running the examples. By default, or if None,
flags are deduced corresponding to the set of future features found in globs.
Optional argument optionflags works as for function testfile() above.
As your collection of doctest’ed modules grows, you’ll want a way to run all
their doctests systematically. doctest provides two functions that can
be used to create unittest test suites from modules and text files
containing doctests. To integrate with unittest test discovery, include
a load_tests() function in your test module:
Convert doctest tests from one or more text files to a
unittest.TestSuite.
The returned unittest.TestSuite is to be run by the unittest framework
and runs the interactive examples in each file. If an example in any file
fails, then the synthesized unit test fails, and a failureException
exception is raised showing the name of the file containing the test and a
(sometimes approximate) line number.
Pass one or more paths (as strings) to text files to be examined.
Options may be provided as keyword arguments:
Optional argument module_relative specifies how the filenames in paths
should be interpreted:
If module_relative is True (the default), then each filename in
paths specifies an OS-independent module-relative path. By default, this
path is relative to the calling module’s directory; but if the package
argument is specified, then it is relative to that package. To ensure
OS-independence, each filename should use / characters to separate path
segments, and may not be an absolute path (i.e., it may not begin with
/).
If module_relative is False, then each filename in paths specifies
an OS-specific path. The path may be absolute or relative; relative paths
are resolved with respect to the current working directory.
Optional argument package is a Python package or the name of a Python
package whose directory should be used as the base directory for
module-relative filenames in paths. If no package is specified, then the
calling module’s directory is used as the base directory for module-relative
filenames. It is an error to specify package if module_relative is
False.
Optional argument setUp specifies a set-up function for the test suite.
This is called before running the tests in each file. The setUp function
will be passed a DocTest object. The setUp function can access the
test globals as the globs attribute of the test passed.
Optional argument tearDown specifies a tear-down function for the test
suite. This is called after running the tests in each file. The tearDown
function will be passed a DocTest object. The setUp function can
access the test globals as the globs attribute of the test passed.
Optional argument globs is a dictionary containing the initial global
variables for the tests. A new copy of this dictionary is created for each
test. By default, globs is a new empty dictionary.
Optional argument optionflags specifies the default doctest options for the
tests, created by or-ing together individual option flags. See section
Option Flags and Directives. See function set_unittest_reportflags() below
for a better way to set reporting options.
Optional argument parser specifies a DocTestParser (or subclass)
that should be used to extract tests from the files. It defaults to a normal
parser (i.e., DocTestParser()).
Optional argument encoding specifies an encoding that should be used to
convert the file to unicode.
The global __file__ is added to the globals provided to doctests loaded
from a text file using DocFileSuite().
The returned unittest.TestSuite is to be run by the unittest framework
and runs each doctest in the module. If any of the doctests fail, then the
synthesized unit test fails, and a failureException exception is raised
showing the name of the file containing the test and a (sometimes approximate)
line number.
Optional argument module provides the module to be tested. It can be a module
object or a (possibly dotted) module name. If not specified, the module calling
this function is used.
Optional argument globs is a dictionary containing the initial global
variables for the tests. A new copy of this dictionary is created for each
test. By default, globs is a new empty dictionary.
Optional argument extraglobs specifies an extra set of global variables, which
is merged into globs. By default, no extra globals are used.
Optional argument test_finder is the DocTestFinder object (or a
drop-in replacement) that is used to extract doctests from the module.
Optional arguments setUp, tearDown, and optionflags are the same as for
function DocFileSuite() above.
This function uses the same search technique as testmod().
Under the covers, DocTestSuite() creates a unittest.TestSuite out
of doctest.DocTestCase instances, and DocTestCase is a
subclass of unittest.TestCase. DocTestCase isn’t documented
here (it’s an internal detail), but studying its code can answer questions about
the exact details of unittest integration.
Similarly, DocFileSuite() creates a unittest.TestSuite out of
doctest.DocFileCase instances, and DocFileCase is a subclass
of DocTestCase.
So both ways of creating a unittest.TestSuite run instances of
DocTestCase. This is important for a subtle reason: when you run
doctest functions yourself, you can control the doctest options in
use directly, by passing option flags to doctest functions. However, if
you’re writing a unittest framework, unittest ultimately controls
when and how tests get run. The framework author typically wants to control
doctest reporting options (perhaps, e.g., specified by command line
options), but there’s no way to pass options through unittest to
doctest test runners.
For this reason, doctest also supports a notion of doctest
reporting flags specific to unittest support, via this function:
Argument flags or’s together option flags. See section
Option Flags and Directives. Only “reporting flags” can be used.
This is a module-global setting, and affects all future doctests run by module
unittest: the runTest() method of DocTestCase looks at
the option flags specified for the test case when the DocTestCase
instance was constructed. If no reporting flags were specified (which is the
typical and expected case), doctest‘s unittest reporting flags are
or’ed into the option flags, and the option flags so augmented are passed to the
DocTestRunner instance created to run the doctest. If any reporting
flags were specified when the DocTestCase instance was constructed,
doctest‘s unittest reporting flags are ignored.
The value of the unittest reporting flags in effect before the function
was called is returned by the function.
The basic API is a simple wrapper that’s intended to make doctest easy to use.
It is fairly flexible, and should meet most users’ needs; however, if you
require more fine-grained control over testing, or wish to extend doctest’s
capabilities, then you should use the advanced API.
The advanced API revolves around two container classes, which are used to store
the interactive examples extracted from doctest cases:
Example: A single Python statement, paired with its expected
output.
DocTest: A collection of Examples, typically extracted
from a single docstring or text file.
Additional processing classes are defined to find, parse, and run, and check
doctest examples:
DocTestFinder: Finds all docstrings in a given module, and uses a
DocTestParser to create a DocTest from every docstring that
contains interactive examples.
DocTestParser: Creates a DocTest object from a string (such
as an object’s docstring).
class doctest.DocTest(examples, globs, name, filename, lineno, docstring)¶
A collection of doctest examples that should be run in a single namespace. The
constructor arguments are used to initialize the attributes of the same names.
DocTest defines the following attributes. They are initialized by
the constructor, and should not be modified directly.
The namespace (aka globals) that the examples should be run in. This is a
dictionary mapping names to values. Any changes to the namespace made by the
examples (such as binding new variables) will be reflected in globs
after the test is run.
The line number within filename where this DocTest begins, or
None if the line number is unavailable. This line number is zero-based
with respect to the beginning of the file.
class doctest.Example(source, want, exc_msg=None, lineno=0, indent=0, options=None)¶
A single interactive example, consisting of a Python statement and its expected
output. The constructor arguments are used to initialize the attributes of
the same names.
Example defines the following attributes. They are initialized by
the constructor, and should not be modified directly.
A string containing the example’s source code. This source code consists of a
single Python statement, and always ends with a newline; the constructor adds
a newline when necessary.
The expected output from running the example’s source code (either from
stdout, or a traceback in case of exception). want ends with a
newline unless no output is expected, in which case it’s an empty string. The
constructor adds a newline when necessary.
The exception message generated by the example, if the example is expected to
generate an exception; or None if it is not expected to generate an
exception. This exception message is compared against the return value of
traceback.format_exception_only(). exc_msg ends with a newline
unless it’s None. The constructor adds a newline if needed.
The line number within the string containing this example where the example
begins. This line number is zero-based with respect to the beginning of the
containing string.
A dictionary mapping from option flags to True or False, which is used
to override default options for this example. Any option flags not contained
in this dictionary are left at their default value (as specified by the
DocTestRunner‘s optionflags). By default, no options are set.
class doctest.DocTestFinder(verbose=False, parser=DocTestParser(), recurse=True, exclude_empty=True)¶
A processing class used to extract the DocTests that are relevant to
a given object, from its docstring and the docstrings of its contained objects.
DocTests can currently be extracted from the following object types:
modules, functions, classes, methods, staticmethods, classmethods, and
properties.
The optional argument verbose can be used to display the objects searched by
the finder. It defaults to False (no output).
The optional argument parser specifies the DocTestParser object (or a
drop-in replacement) that is used to extract doctests from docstrings.
If the optional argument recurse is false, then DocTestFinder.find()
will only examine the given object, and not any contained objects.
If the optional argument exclude_empty is false, then
DocTestFinder.find() will include tests for objects with empty docstrings.
Return a list of the DocTests that are defined by obj‘s
docstring, or by any of its contained objects’ docstrings.
The optional argument name specifies the object’s name; this name will be
used to construct names for the returned DocTests. If name is
not specified, then obj.__name__ is used.
The optional parameter module is the module that contains the given object.
If the module is not specified or is None, then the test finder will attempt
to automatically determine the correct module. The object’s module is used:
As a default namespace, if globs is not specified.
To prevent the DocTestFinder from extracting DocTests from objects that are
imported from other modules. (Contained objects with modules other than
module are ignored.)
To find the name of the file containing the object.
To help find the line number of the object within its file.
If module is False, no attempt to find the module will be made. This is
obscure, of use mostly in testing doctest itself: if module is False, or
is None but cannot be found automatically, then all objects are considered
to belong to the (non-existent) module, so all contained objects will
(recursively) be searched for doctests.
The globals for each DocTest is formed by combining globs and
extraglobs (bindings in extraglobs override bindings in globs). A new
shallow copy of the globals dictionary is created for each DocTest.
If globs is not specified, then it defaults to the module’s __dict__, if
specified, or {} otherwise. If extraglobs is not specified, then it
defaults to {}.
Extract all doctest examples from the given string, and return them as a list
of Example objects. Line numbers are 0-based. The optional argument
name is a name identifying this string, and is only used for error messages.
Divide the given string into examples and intervening text, and return them as
a list of alternating Examples and strings. Line numbers for the
Examples are 0-based. The optional argument name is a name
identifying this string, and is only used for error messages.
class doctest.DocTestRunner(checker=None, verbose=None, optionflags=0)¶
A processing class used to execute and verify the interactive examples in a
DocTest.
The comparison between expected outputs and actual outputs is done by an
OutputChecker. This comparison may be customized with a number of
option flags; see section Option Flags and Directives for more information. If the
option flags are insufficient, then the comparison may also be customized by
passing a subclass of OutputChecker to the constructor.
The test runner’s display output can be controlled in two ways. First, an output
function can be passed to TestRunner.run(); this function will be called
with strings that should be displayed. It defaults to sys.stdout.write. If
capturing the output is not sufficient, then the display output can be also
customized by subclassing DocTestRunner, and overriding the methods
report_start(), report_success(),
report_unexpected_exception(), and report_failure().
The optional keyword argument checker specifies the OutputChecker
object (or drop-in replacement) that should be used to compare the expected
outputs to the actual outputs of doctest examples.
The optional keyword argument verbose controls the DocTestRunner‘s
verbosity. If verbose is True, then information is printed about each
example, as it is run. If verbose is False, then only failures are
printed. If verbose is unspecified, or None, then verbose output is used
iff the command-line switch -v is used.
The optional keyword argument optionflags can be used to control how the test
runner compares expected output to actual output, and how it displays failures.
For more information, see section Option Flags and Directives.
Report that the test runner is about to process the given example. This method
is provided to allow subclasses of DocTestRunner to customize their
output; it should not be called directly.
example is the example about to be processed. test is the test
containing example. out is the output function that was passed to
DocTestRunner.run().
Report that the given example ran successfully. This method is provided to
allow subclasses of DocTestRunner to customize their output; it
should not be called directly.
example is the example about to be processed. got is the actual output
from the example. test is the test containing example. out is the
output function that was passed to DocTestRunner.run().
Report that the given example failed. This method is provided to allow
subclasses of DocTestRunner to customize their output; it should not
be called directly.
example is the example about to be processed. got is the actual output
from the example. test is the test containing example. out is the
output function that was passed to DocTestRunner.run().
report_unexpected_exception(out, test, example, exc_info)¶
Report that the given example raised an unexpected exception. This method is
provided to allow subclasses of DocTestRunner to customize their
output; it should not be called directly.
example is the example about to be processed. exc_info is a tuple
containing information about the unexpected exception (as returned by
sys.exc_info()). test is the test containing example. out is the
output function that was passed to DocTestRunner.run().
Run the examples in test (a DocTest object), and display the
results using the writer function out.
The examples are run in the namespace test.globs. If clear_globs is
true (the default), then this namespace will be cleared after the test runs,
to help with garbage collection. If you would like to examine the namespace
after the test completes, then use clear_globs=False.
compileflags gives the set of flags that should be used by the Python
compiler when running the examples. If not specified, then it will default to
the set of future-import flags that apply to globs.
The output of each example is checked using the DocTestRunner‘s
output checker, and the results are formatted by the
DocTestRunner.report_*() methods.
A class used to check the whether the actual output from a doctest example
matches the expected output. OutputChecker defines two methods:
check_output(), which compares a given pair of outputs, and returns true
if they match; and output_difference(), which returns a string describing
the differences between two outputs.
Return True iff the actual output from an example (got) matches the
expected output (want). These strings are always considered to match if
they are identical; but depending on what option flags the test runner is
using, several non-exact match types are also possible. See section
Option Flags and Directives for more information about option flags.
Return a string describing the differences between the expected output for a
given example (example) and the actual output (got). optionflags is the
set of option flags used to compare want and got.
Doctest provides several mechanisms for debugging doctest examples:
Several functions convert doctests to executable Python programs, which can be
run under the Python debugger, pdb.
The DebugRunner class is a subclass of DocTestRunner that
raises an exception for the first failing example, containing information about
that example. This information can be used to perform post-mortem debugging on
the example.
You can add a call to pdb.set_trace() in a doctest example, and you’ll
drop into the Python debugger when that line is executed. Then you can inspect
current values of variables, and so on. For example, suppose a.py
contains just this module docstring:
Argument s is a string containing doctest examples. The string is converted
to a Python script, where doctest examples in s are converted to regular code,
and everything else is converted to Python comments. The generated script is
returned as a string. For example,
importdoctestprint(doctest.script_from_examples(r""" Set x and y to 1 and 2. >>> x, y = 1, 2 Print their sum: >>> print(x+y) 3"""))
displays:
# Set x and y to 1 and 2.x,y=1,2## Print their sum:print(x+y)# Expected:## 3
This function is used internally by other functions (see below), but can also be
useful when you want to transform an interactive Python session into a Python
script.
Argument module is a module object, or dotted name of a module, containing the
object whose doctests are of interest. Argument name is the name (within the
module) of the object with the doctests of interest. The result is a string,
containing the object’s docstring converted to a Python script, as described for
script_from_examples() above. For example, if module a.py
contains a top-level function f(), then
importa,doctestprint(doctest.testsource(a,"a.f"))
prints a script version of function f()‘s docstring, with doctests
converted to code, and the rest placed in comments.
The module and name arguments are the same as for function
testsource() above. The synthesized Python script for the named object’s
docstring is written to a temporary file, and then that file is run under the
control of the Python debugger, pdb.
A shallow copy of module.__dict__ is used for both local and global
execution context.
Optional argument pm controls whether post-mortem debugging is used. If pm
has a true value, the script file is run directly, and the debugger gets
involved only if the script terminates via raising an unhandled exception. If
it does, then post-mortem debugging is invoked, via pdb.post_mortem(),
passing the traceback object from the unhandled exception. If pm is not
specified, or is false, the script is run under the debugger from the start, via
passing an appropriate exec() call to pdb.run().
This is like function debug() above, except that a string containing
doctest examples is specified directly, via the src argument.
Optional argument pm has the same meaning as in function debug() above.
Optional argument globs gives a dictionary to use as both local and global
execution context. If not specified, or None, an empty dictionary is used.
If specified, a shallow copy of the dictionary is used.
The DebugRunner class, and the special exceptions it may raise, are of
most interest to testing framework authors, and will only be sketched here. See
the source code, and especially DebugRunner‘s docstring (which is a
doctest!) for more details:
class doctest.DebugRunner(checker=None, verbose=None, optionflags=0)¶
A subclass of DocTestRunner that raises an exception as soon as a
failure is encountered. If an unexpected exception occurs, an
UnexpectedException exception is raised, containing the test, the
example, and the original exception. If the output doesn’t match, then a
DocTestFailure exception is raised, containing the test, the example, and
the actual output.
For information about the constructor parameters and methods, see the
documentation for DocTestRunner in section Advanced API.
There are two exceptions that may be raised by DebugRunner instances:
exception doctest.DocTestFailure(test, example, got)¶
An exception raised by DocTestRunner to signal that a doctest example’s
actual output did not match its expected output. The constructor arguments are
used to initialize the attributes of the same names.
exception doctest.UnexpectedException(test, example, exc_info)¶
An exception raised by DocTestRunner to signal that a doctest
example raised an unexpected exception. The constructor arguments are used
to initialize the attributes of the same names.
As mentioned in the introduction, doctest has grown to have three primary
uses:
Checking examples in docstrings.
Regression testing.
Executable documentation / literate testing.
These uses have different requirements, and it is important to distinguish them.
In particular, filling your docstrings with obscure test cases makes for bad
documentation.
When writing a docstring, choose docstring examples with care. There’s an art to
this that needs to be learned—it may not be natural at first. Examples should
add genuine value to the documentation. A good example can often be worth many
words. If done with care, the examples will be invaluable for your users, and
will pay back the time it takes to collect them many times over as the years go
by and things change. I’m still amazed at how often one of my doctest
examples stops working after a “harmless” change.
Doctest also makes an excellent tool for regression testing, especially if you
don’t skimp on explanatory text. By interleaving prose and examples, it becomes
much easier to keep track of what’s actually being tested, and why. When a test
fails, good prose can make it much easier to figure out what the problem is, and
how it should be fixed. It’s true that you could write extensive comments in
code-based testing, but few programmers do. Many have found that using doctest
approaches instead leads to much clearer tests. Perhaps this is simply because
doctest makes writing prose a little easier than writing code, while writing
comments in code is a little harder. I think it goes deeper than just that:
the natural attitude when writing a doctest-based test is that you want to
explain the fine points of your software, and illustrate them with examples.
This in turn naturally leads to test files that start with the simplest
features, and logically progress to complications and edge cases. A coherent
narrative is the result, instead of a collection of isolated functions that test
isolated bits of functionality seemingly at random. It’s a different attitude,
and produces different results, blurring the distinction between testing and
explaining.
Regression testing is best confined to dedicated objects or files. There are
several options for organizing tests:
Write text files containing test cases as interactive examples, and test the
files using testfile() or DocFileSuite(). This is recommended,
although is easiest to do for new projects, designed from the start to use
doctest.
Define functions named _regrtest_topic that consist of single docstrings,
containing test cases for the named topics. These functions can be included in
the same file as the module, or separated out into a separate test file.
Define a __test__ dictionary mapping from regression test topics to
docstrings containing test cases.
Examples containing both expected output and an exception are not supported.
Trying to guess where one ends and the other begins is too error-prone, and that
also makes for a confusing test.
(If you are already familiar with the basic concepts of testing, you might want
to skip to the list of assert methods.)
The Python unit testing framework, sometimes referred to as “PyUnit,” is a
Python language version of JUnit, by Kent Beck and Erich Gamma. JUnit is, in
turn, a Java version of Kent’s Smalltalk testing framework. Each is the de
facto standard unit testing framework for its respective language.
unittest supports test automation, sharing of setup and shutdown code for
tests, aggregation of tests into collections, and independence of the tests from
the reporting framework. The unittest module provides classes that make
it easy to support these qualities for a set of tests.
To achieve this, unittest supports some important concepts:
test fixture
A test fixture represents the preparation needed to perform one or more
tests, and any associate cleanup actions. This may involve, for example,
creating temporary or proxy databases, directories, or starting a server
process.
test case
A test case is the smallest unit of testing. It checks for a specific
response to a particular set of inputs. unittest provides a base class,
TestCase, which may be used to create new test cases.
test suite
A test suite is a collection of test cases, test suites, or both. It is
used to aggregate tests that should be executed together.
test runner
A test runner is a component which orchestrates the execution of tests
and provides the outcome to the user. The runner may use a graphical interface,
a textual interface, or return a special value to indicate the results of
executing the tests.
The test case and test fixture concepts are supported through the
TestCase and FunctionTestCase classes; the former should be
used when creating new tests, and the latter can be used when integrating
existing test code with a unittest-driven framework. When building test
fixtures using TestCase, the setUp() and
tearDown() methods can be overridden to provide initialization
and cleanup for the fixture. With FunctionTestCase, existing functions
can be passed to the constructor for these purposes. When the test is run, the
fixture initialization is run first; if it succeeds, the cleanup method is run
after the test has been executed, regardless of the outcome of the test. Each
instance of the TestCase will only be used to run a single test method,
so a new fixture is created for each test.
Test suites are implemented by the TestSuite class. This class allows
individual tests and test suites to be aggregated; when the suite is executed,
all tests added directly to the suite and in “child” test suites are run.
A test runner is an object that provides a single method,
run(), which accepts a TestCase or TestSuite
object as a parameter, and returns a result object. The class
TestResult is provided for use as the result object. unittest
provides the TextTestRunner as an example test runner which reports
test results on the standard error stream by default. Alternate runners can be
implemented for other environments (such as graphical environments) without any
need to derive from a specific class.
Many new features were added to unittest in Python 2.7, including test
discovery. unittest2 allows you to use these features with earlier
versions of Python.
A special-interest-group for discussion of testing, and testing tools,
in Python.
The script Tools/unittestgui/unittestgui.py in the Python source distribution is
a GUI tool for test discovery and execution. This is intended largely for ease of use
for those new to unit testing. For production environments it is recommended that
tests be driven by a continuous integration system such as Hudson
or Buildbot.
The unittest module provides a rich set of tools for constructing and
running tests. This section demonstrates that a small subset of the tools
suffice to meet the needs of most users.
Here is a short script to test three functions from the random module:
importrandomimportunittestclassTestSequenceFunctions(unittest.TestCase):defsetUp(self):self.seq=list(range(10))deftest_shuffle(self):# make sure the shuffled sequence does not lose any elementsrandom.shuffle(self.seq)self.seq.sort()self.assertEqual(self.seq,list(range(10)))# should raise an exception for an immutable sequenceself.assertRaises(TypeError,random.shuffle,(1,2,3))deftest_choice(self):element=random.choice(self.seq)self.assertTrue(elementinself.seq)deftest_sample(self):withself.assertRaises(ValueError):random.sample(self.seq,20)forelementinrandom.sample(self.seq,5):self.assertTrue(elementinself.seq)if__name__=='__main__':unittest.main()
A testcase is created by subclassing unittest.TestCase. The three
individual tests are defined with methods whose names start with the letters
test. This naming convention informs the test runner about which methods
represent tests.
The crux of each test is a call to assertEqual() to check for an
expected result; assertTrue() to verify a condition; or
assertRaises() to verify that an expected exception gets raised.
These methods are used instead of the assert statement so the test
runner can accumulate all test results and produce a report.
When a setUp() method is defined, the test runner will run that
method prior to each test. Likewise, if a tearDown() method is
defined, the test runner will invoke that method after each test. In the
example, setUp() was used to create a fresh sequence for each
test.
The final block shows a simple way to run the tests. unittest.main()
provides a command-line interface to the test script. When run from the command
line, the above script produces an output that looks like this:
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s
OK
Instead of unittest.main(), there are other ways to run the tests with a
finer level of control, less terse output, and no requirement to be run from the
command line. For example, the last two lines may be replaced with:
Running the revised script from the interpreter or another script produces the
following output:
test_choice (__main__.TestSequenceFunctions) ... ok
test_sample (__main__.TestSequenceFunctions) ... ok
test_shuffle (__main__.TestSequenceFunctions) ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.110s
OK
The above examples show the most commonly used unittest features which
are sufficient to meet many everyday testing needs. The remainder of the
documentation explores the full feature set from first principles.
You can pass in a list with any combination of module names, and fully
qualified class or method names.
Test modules can be specified by file path as well:
python -m unittest tests/test_something.py
This allows you to use the shell filename completion to specify the test module.
The file specified must still be importable as a module. The path is converted
to a module name by removing the ‘.py’ and converting path separators into ‘.’.
If you want to execute a test file that isn’t importable as a module you should
execute the file directly instead.
You can run tests with more detail (higher verbosity) by passing in the -v flag:
python -m unittest -v test_module
When executed without arguments Test Discovery is started:
python -m unittest
For a list of all the command-line options:
python -m unittest -h
Changed in version 3.2:
Changed in version 3.2: In earlier versions it was only possible to run individual test methods and
not modules or classes.
The standard output and standard error streams are buffered during the test
run. Output during a passing test is discarded. Output is echoed normally
on test fail or error and is added to the failure messages.
Control-C during the test run waits for the current test to end and then
reports all the results so far. A second control-C raises the normal
KeyboardInterrupt exception.
See Signal Handling for the functions that provide this functionality.
Unittest supports simple test discovery. In order to be compatible with test
discovery, all of the test files must be modules or
packages importable from the top-level directory of
the project (this means that their filenames must be valid
identifiers).
Test discovery is implemented in TestLoader.discover(), but can also be
used from the command line. The basic command-line usage is:
cd project_directory
python -m unittest discover
Note
As a shortcut, python-munittest is the equivalent of
python-munittestdiscover. If you want to pass arguments to test
discovery the discover sub-command must be used explicitly.
The discover sub-command has the following options:
As well as being a path it is possible to pass a package name, for example
myproject.subpackage.test, as the start directory. The package name you
supply will then be imported and its location on the filesystem will be used
as the start directory.
Caution
Test discovery loads tests by importing them. Once test discovery has found
all the test files from the start directory you specify it turns the paths
into package names to import. For example foo/bar/baz.py will be
imported as foo.bar.baz.
If you have a package installed globally and attempt test discovery on
a different copy of the package then the import could happen from the
wrong place. If this happens test discovery will warn you and exit.
If you supply the start directory as a package name rather than a
path to a directory then discover assumes that whichever location it
imports from is the location you intended, so you will not get the
warning.
Test modules and packages can customize test loading and discovery by through
the load_tests protocol.
The basic building blocks of unit testing are test cases — single
scenarios that must be set up and checked for correctness. In unittest,
test cases are represented by unittest.TestCase instances.
To make your own test cases you must write subclasses of
TestCase or use FunctionTestCase.
An instance of a TestCase-derived class is an object that can
completely run a single test method, together with optional set-up and tidy-up
code.
The testing code of a TestCase instance should be entirely self
contained, such that it can be run either in isolation or in arbitrary
combination with any number of other test cases.
The simplest TestCase subclass will simply override the
runTest() method in order to perform specific testing code:
Note that in order to test something, we use the one of the assert*()
methods provided by the TestCase base class. If the test fails, an
exception will be raised, and unittest will identify the test case as a
failure. Any other exceptions will be treated as errors. This
helps you identify where the problem is: failures are caused by incorrect
results - a 5 where you expected a 6. Errors are caused by incorrect
code - e.g., a TypeError caused by an incorrect function call.
The way to run a test case will be described later. For now, note that to
construct an instance of such a test case, we call its constructor without
arguments:
testCase=DefaultWidgetSizeTestCase()
Now, such test cases can be numerous, and their set-up can be repetitive. In
the above case, constructing a Widget in each of 100 Widget test case
subclasses would mean unsightly duplication.
Luckily, we can factor out such set-up code by implementing a method called
setUp(), which the testing framework will automatically call for
us when we run the test:
importunittestclassSimpleWidgetTestCase(unittest.TestCase):defsetUp(self):self.widget=Widget('The widget')classDefaultWidgetSizeTestCase(SimpleWidgetTestCase):defrunTest(self):self.assertEqual(self.widget.size(),(50,50),'incorrect default size')classWidgetResizeTestCase(SimpleWidgetTestCase):defrunTest(self):self.widget.resize(100,150)self.assertEqual(self.widget.size(),(100,150),'wrong size after resize')
If the setUp() method raises an exception while the test is
running, the framework will consider the test to have suffered an error, and the
runTest() method will not be executed.
Similarly, we can provide a tearDown() method that tidies up
after the runTest() method has been run:
If setUp() succeeded, the tearDown() method will
be run whether runTest() succeeded or not.
Such a working environment for the testing code is called a fixture.
Often, many small test cases will use the same fixture. In this case, we would
end up subclassing SimpleWidgetTestCase into many small one-method
classes such as DefaultWidgetSizeTestCase. This is time-consuming and
discouraging, so in the same vein as JUnit, unittest provides a simpler
mechanism:
importunittestclassWidgetTestCase(unittest.TestCase):defsetUp(self):self.widget=Widget('The widget')deftearDown(self):self.widget.dispose()self.widget=Nonedeftest_default_size(self):self.assertEqual(self.widget.size(),(50,50),'incorrect default size')deftest_resize(self):self.widget.resize(100,150)self.assertEqual(self.widget.size(),(100,150),'wrong size after resize')
Here we have not provided a runTest() method, but have instead
provided two different test methods. Class instances will now each run one of
the test_*() methods, with self.widget created and destroyed
separately for each instance. When creating an instance we must specify the
test method it is to run. We do this by passing the method name in the
constructor:
Test case instances are grouped together according to the features they test.
unittest provides a mechanism for this: the test suite,
represented by unittest‘s TestSuite class:
For the ease of running tests, as we will see later, it is a good idea to
provide in each test module a callable object that returns a pre-built test
suite:
Since it is a common pattern to create a TestCase subclass with many
similarly named test functions, unittest provides a TestLoader
class that can be used to automate the process of creating a test suite and
populating it with individual tests. For example,
will create a test suite that will run WidgetTestCase.test_default_size() and
WidgetTestCase.test_resize. TestLoader uses the 'test' method
name prefix to identify test methods automatically.
Note that the order in which the various test cases will be run is
determined by sorting the test function names with respect to the
built-in ordering for strings.
Often it is desirable to group suites of test cases together, so as to run tests
for the whole system at once. This is easy, since TestSuite instances
can be added to a TestSuite just as TestCase instances can be
added to a TestSuite:
You can place the definitions of test cases and test suites in the same modules
as the code they are to test (such as widget.py), but there are several
advantages to placing the test code in a separate module, such as
test_widget.py:
The test module can be run standalone from the command line.
The test code can more easily be separated from shipped code.
There is less temptation to change test code to fit the code it tests without
a good reason.
Test code should be modified much less frequently than the code it tests.
Tested code can be refactored more easily.
Tests for modules written in C must be in separate modules anyway, so why not
be consistent?
If the testing strategy changes, there is no need to change the source code.
Some users will find that they have existing test code that they would like to
run from unittest, without converting every old test function to a
TestCase subclass.
For this reason, unittest provides a FunctionTestCase class.
This subclass of TestCase can be used to wrap an existing test
function. Set-up and tear-down functions can also be provided.
To make migrating existing test suites easier, unittest supports tests
raising AssertionError to indicate test failure. However, it is
recommended that you use the explicit TestCase.fail*() and
TestCase.assert*() methods instead, as future versions of unittest
may treat AssertionError differently.
Note
Even though FunctionTestCase can be used to quickly convert an
existing test base over to a unittest-based system, this approach is
not recommended. Taking the time to set up proper TestCase
subclasses will make future test refactorings infinitely easier.
In some cases, the existing tests may have been written using the doctest
module. If so, doctest provides a DocTestSuite class that can
automatically build unittest.TestSuite instances from the existing
doctest-based tests.
Unittest supports skipping individual test methods and even whole classes of
tests. In addition, it supports marking a test as a “expected failure,” a test
that is broken and will fail, but shouldn’t be counted as a failure on a
TestResult.
Skipping a test is simply a matter of using the skip()decorator
or one of its conditional variants.
Basic skipping looks like this:
classMyTestCase(unittest.TestCase):@unittest.skip("demonstrating skipping")deftest_nothing(self):self.fail("shouldn't happen")@unittest.skipIf(mylib.__version__<(1,3),"not supported in this library version")deftest_format(self):# Tests that work for only a certain version of the library.pass@unittest.skipUnless(sys.platform.startswith("win"),"requires Windows")deftest_windows_support(self):# windows specific testing codepass
This is the output of running the example above in verbose mode:
test_format (__main__.MyTestCase) ... skipped 'not supported in this library version'
test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping'
test_windows_support (__main__.MyTestCase) ... skipped 'requires Windows'
----------------------------------------------------------------------
Ran 3 tests in 0.005s
OK (skipped=3)
Classes can be skipped just like methods:
@skip("showing class skipping")classMySkippedTestCase(unittest.TestCase):deftest_not_run(self):pass
TestCase.setUp() can also skip the test. This is useful when a resource
that needs to be set up is not available.
It’s easy to roll your own skipping decorators by making a decorator that calls
skip() on the test when it wants it to be skipped. This decorator skips
the test unless the passed object has a certain attribute:
defskipUnlessHasattr(obj,attr):ifhasattr(obj,attr):returnlambdafunc:funcreturnunittest.skip("{0!r} doesn't have {1!r}".format(obj,attr))
The following decorators implement test skipping and expected failures:
Instances of the TestCase class represent the smallest testable units
in the unittest universe. This class is intended to be used as a base
class, with specific tests being implemented by concrete subclasses. This class
implements the interface needed by the test runner to allow it to drive the
test, and methods that the test code can use to check for and report various
kinds of failure.
Each instance of TestCase will run a single test method: the method
named methodName. If you remember, we had an earlier example that went
something like this:
Here, we create two instances of WidgetTestCase, each of which runs a
single test.
Changed in version 3.2:
Changed in version 3.2: TestCase can be instantiated successfully without providing a method
name. This makes it easier to experiment with TestCase from the
interactive interpreter.
methodName defaults to runTest().
TestCase instances provide three groups of methods: one group used
to run the test, another used by the test implementation to check conditions
and report failures, and some inquiry methods allowing information about the
test itself to be gathered.
Methods in the first group (running the test) are:
Method called to prepare the test fixture. This is called immediately
before calling the test method; any exception raised by this method will
be considered an error rather than a test failure. The default
implementation does nothing.
Method called immediately after the test method has been called and the
result recorded. This is called even if the test method raised an
exception, so the implementation in subclasses may need to be particularly
careful about checking internal state. Any exception raised by this
method will be considered an error rather than a test failure. This
method will only be called if the setUp() succeeds, regardless of
the outcome of the test method. The default implementation does nothing.
A class method called before tests in an individual class run.
setUpClass is called with the class as the only argument
and must be decorated as a classmethod():
A class method called after tests in an individual class have run.
tearDownClass is called with the class as the only argument
and must be decorated as a classmethod():
Run the test, collecting the result into the test result object passed as
result. If result is omitted or None, a temporary result
object is created (by calling the defaultTestResult() method) and
used. The result object is not returned to run()‘s caller.
The same effect may be had by simply calling the TestCase
instance.
Run the test without collecting the result. This allows exceptions raised
by the test to be propagated to the caller, and can be used to support
running tests under a debugger.
The TestCase class provides a number of methods to check for and
report failures, such as:
Test that first and second are equal. If the values do not
compare equal, the test will fail.
In addition, if first and second are the exact same type and one of
list, tuple, dict, set, frozenset or str or any type that a subclass
registers with addTypeEqualityFunc() the type specific equality
function will be called in order to generate a more useful default
error message (see also the list of type-specific methods).
Changed in version 3.1:
Changed in version 3.1: Added the automatic calling of type specific equality function.
Changed in version 3.2:
Changed in version 3.2: assertMultiLineEqual() added as the default type equality
function for comparing strings.
Note that this is equivalent to bool(expr)isTrue and not to exprisTrue (use assertIs(expr,True) for the latter). This method
should also be avoided when more specific methods are available (e.g.
assertEqual(a,b) instead of assertTrue(a==b)), because they
provide a better error message in case of failure.
Test that an exception is raised when callable is called with any
positional or keyword arguments that are also passed to
assertRaises(). The test passes if exception is raised, is an
error if another exception is raised, or fails if no exception is raised.
To catch any of a group of exceptions, a tuple containing the exception
classes may be passed as exception.
If only the exception argument is given, returns a context manager so
that the code under test can be written inline rather than as a function:
The context manager will store the caught exception object in its
exception attribute. This can be useful if the intention
is to perform additional checks on the exception raised:
Like assertRaises() but also tests that regex matches
on the string representation of the raised exception. regex may be
a regular expression object or a string containing a regular expression
suitable for use by re.search(). Examples:
Test that a warning is triggered when callable is called with any
positional or keyword arguments that are also passed to
assertWarns(). The test passes if warning is triggered and
fails if it isn’t. Also, any unexpected exception is an error.
To catch any of a group of warnings, a tuple containing the warning
classes may be passed as warnings.
If only the warning argument is given, returns a context manager so
that the code under test can be written inline rather than as a function:
withself.assertWarns(SomeWarning):do_something()
The context manager will store the caught warning object in its
warning attribute, and the source line which triggered the
warnings in the filename and lineno attributes.
This can be useful if the intention is to perform additional checks
on the exception raised:
Like assertWarns() but also tests that regex matches on the
message of the triggered warning. regex may be a regular expression
object or a string containing a regular expression suitable for use
by re.search(). Example:
self.assertWarnsRegex(DeprecationWarning,r'legacy_function\(\) is deprecated',legacy_function,'XYZ')
Test that first and second are approximately (or not approximately)
equal by computing the difference, rounding to the given number of
decimal places (default 7), and comparing to zero. Note that these
methods round the values to the given number of decimal places (i.e.
like the round() function) and not significant digits.
If delta is supplied instead of places then the difference
between first and second must be less (or more) than delta.
Supplying both delta and places raises a TypeError.
Changed in version 3.2:
Changed in version 3.2: assertAlmostEqual() automatically considers almost equal objects
that compare equal. assertNotAlmostEqual() automatically fails
if the objects compare equal. Added the delta keyword argument.
Test that a regex search matches (or does not match) text. In case
of failure, the error message will include the pattern and the text (or
the pattern and the part of text that unexpectedly matched). regex
may be a regular expression object or a string containing a regular
expression suitable for use by re.search().
New in version 3.1:
New in version 3.1: under the name assertRegexpMatches.
Changed in version 3.2:
Changed in version 3.2: The method assertRegexpMatches() has been renamed to
assertRegex().
Tests whether the key/value pairs in dictionary are a superset of
those in subset. If not, an error message listing the missing keys
and mismatched values is generated.
Note, the arguments are in the opposite order of what the method name
dictates. Instead, consider using the set-methods on dictionary
views, for example: d.keys()<=e.keys() or
d.items()<=d.items().
Test that sequence first contains the same elements as second,
regardless of their order. When they don’t, an error message listing the
differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and
second. It verifies whether each element has the same count in both
sequences. Equivalent to:
assertEqual(Counter(list(first)),Counter(list(second)))
but works with sequences of unhashable objects as well.
Test that sequence first contains the same elements as second,
regardless of their order. When they don’t, an error message listing
the differences between the sequences will be generated.
Duplicate elements are ignored when comparing first and second.
It is the equivalent of assertEqual(set(first),set(second))
but it works with sequences of unhashable objects as well. Because
duplicates are ignored, this method has been deprecated in favour of
assertCountEqual().
New in version 3.1:
New in version 3.1.
Deprecated since version 3.2:
Deprecated since version 3.2.
The assertEqual() method dispatches the equality check for objects of
the same type to different type-specific methods. These methods are already
implemented for most of the built-in types, but it’s also possible to
register new methods using addTypeEqualityFunc():
Registers a type-specific method called by assertEqual() to check
if two objects of exactly the same typeobj (not subclasses) compare
equal. function must take two positional arguments and a third msg=None
keyword argument just as assertEqual() does. It must raise
self.failureException(msg) when inequality
between the first two parameters is detected – possibly providing useful
information and explaining the inequalities in details in the error
message.
New in version 3.1:
New in version 3.1.
The list of type-specific methods automatically used by
assertEqual() are summarized in the following table. Note
that it’s usually not necessary to invoke these methods directly.
Test that the multiline string first is equal to the string second.
When not equal a diff of the two strings highlighting the differences
will be included in the error message. This method is used by default
when comparing strings with assertEqual().
Tests that two sequences are equal. If a seq_type is supplied, both
first and second must be instances of seq_type or a failure will
be raised. If the sequences are different an error message is
constructed that shows the difference between the two.
Tests that two lists or tuples are equal. If not an error message is
constructed that shows only the differences between the two. An error
is also raised if either of the parameters are of the wrong type.
These methods are used by default when comparing lists or tuples with
assertEqual().
Tests that two sets are equal. If not, an error message is constructed
that lists the differences between the sets. This method is used by
default when comparing sets or frozensets with assertEqual().
Fails if either of first or second does not have a set.difference()
method.
Test that two dictionaries are equal. If not, an error message is
constructed that shows the differences in the dictionaries. This
method will be used by default to compare dictionaries in
calls to assertEqual().
New in version 3.1:
New in version 3.1.
Finally the TestCase provides the following methods and attributes:
This class attribute gives the exception raised by the test method. If a
test framework needs to use a specialized exception, possibly to carry
additional information, it must subclass this exception in order to “play
fair” with the framework. The initial value of this attribute is
AssertionError.
If set to True then any explicit failure message you pass in to the
assert methods will be appended to the end of the
normal failure message. The normal messages contain useful information
about the objects involved, for example the message from assertEqual
shows you the repr of the two unequal objects. Setting this attribute
to True allows you to have a custom error message in addition to the
normal one.
This attribute defaults to True. If set to False then a custom message
passed to an assert method will silence the normal message.
The class setting can be overridden in individual tests by assigning an
instance attribute to True or False before calling the assert methods.
This attribute controls the maximum length of diffs output by assert
methods that report diffs on failure. It defaults to 80*8 characters.
Assert methods affected by this attribute are
assertSequenceEqual() (including all the sequence comparison
methods that delegate to it), assertDictEqual() and
assertMultiLineEqual().
Setting maxDiff to None means that there is no maximum length of
diffs.
New in version 3.2:
New in version 3.2.
Testing frameworks can use the following methods to collect information on
the test:
Return an instance of the test result class that should be used for this
test case class (if no other result instance is provided to the
run() method).
For TestCase instances, this will always be an instance of
TestResult; subclasses of TestCase should override this
as necessary.
Returns a description of the test, or None if no description
has been provided. The default implementation of this method
returns the first line of the test method’s docstring, if available,
or None.
Changed in version 3.1:
Changed in version 3.1: In 3.1 this was changed to add the test name to the short description
even in the presence of a docstring. This caused compatibility issues
with unittest extensions and adding the test name was moved to the
TextTestResult in Python 3.2.
Add a function to be called after tearDown() to cleanup resources
used during the test. Functions will be called in reverse order to the
order they are added (LIFO). They are called with any arguments and
keyword arguments passed into addCleanup() when they are
added.
If setUp() fails, meaning that tearDown() is not called,
then any cleanup functions added will still be called.
This method is called unconditionally after tearDown(), or
after setUp() if setUp() raises an exception.
It is responsible for calling all the cleanup functions added by
addCleanup(). If you need cleanup functions to be called
prior to tearDown() then you can call doCleanups()
yourself.
doCleanups() pops methods off the stack of cleanup
functions one at a time, so it can be called at any time.
New in version 3.1:
New in version 3.1.
class unittest.FunctionTestCase(testFunc, setUp=None, tearDown=None, description=None)¶
This class implements the portion of the TestCase interface which
allows the test runner to drive the test, but does not provide the methods
which test code can use to check and report errors. This is used to create
test cases using legacy test code, allowing it to be integrated into a
unittest-based test framework.
For historical reasons, some of the TestCase methods had one or more
aliases that are now deprecated. The following table lists the correct names
along with their deprecated aliases:
This class represents an aggregation of individual tests cases and test suites.
The class presents the interface needed by the test runner to allow it to be run
as any other test case. Running a TestSuite instance is the same as
iterating over the suite, running each test individually.
If tests is given, it must be an iterable of individual test cases or other
test suites that will be used to build the suite initially. Additional methods
are provided to add test cases and suites to the collection later on.
TestSuite objects behave much like TestCase objects, except
they do not actually implement a test. Instead, they are used to aggregate
tests into groups of tests that should be run together. Some additional
methods are available to add tests to TestSuite instances:
Run the tests associated with this suite, collecting the result into the
test result object passed as result. Note that unlike
TestCase.run(), TestSuite.run() requires the result object to
be passed in.
Run the tests associated with this suite without collecting the
result. This allows exceptions raised by the test to be propagated to the
caller and can be used to support running tests under a debugger.
Tests grouped by a TestSuite are always accessed by iteration.
Subclasses can lazily provide tests by overriding __iter__(). Note
that this method maybe called several times on a single suite
(for example when counting tests or comparing for equality)
so the tests returned must be the same for repeated iterations.
Changed in version 3.2:
Changed in version 3.2: In earlier versions the TestSuite accessed tests directly rather
than through iteration, so overriding __iter__() wasn’t sufficient
for providing tests.
In the typical usage of a TestSuite object, the run() method
is invoked by a TestRunner rather than by the end-user test harness.
The TestLoader class is used to create test suites from classes and
modules. Normally, there is no need to create an instance of this class; the
unittest module provides an instance that can be shared as
unittest.defaultTestLoader. Using a subclass or instance, however, allows
customization of some configurable properties.
Return a suite of all tests cases contained in the given module. This
method searches module for classes derived from TestCase and
creates an instance of the class for each test method defined for the
class.
Note
While using a hierarchy of TestCase-derived classes can be
convenient in sharing fixtures and helper functions, defining test
methods on base classes that are not intended to be instantiated
directly does not play well with this method. Doing so, however, can
be useful when the fixtures are different and defined in subclasses.
If a module provides a load_tests function it will be called to
load the tests. This allows modules to customize test loading.
This is the load_tests protocol.
Changed in version 3.2:
Changed in version 3.2: Support for load_tests added.
Return a suite of all tests cases given a string specifier.
The specifier name is a “dotted name” that may resolve either to a
module, a test case class, a test method within a test case class, a
TestSuite instance, or a callable object which returns a
TestCase or TestSuite instance. These checks are
applied in the order listed here; that is, a method on a possible test
case class will be picked up as “a test method within a test case class”,
rather than “a callable object”.
For example, if you have a module SampleTests containing a
TestCase-derived class SampleTestCase with three test
methods (test_one(), test_two(), and test_three()), the
specifier 'SampleTests.SampleTestCase' would cause this method to
return a suite which will run all three test methods. Using the specifier
'SampleTests.SampleTestCase.test_two' would cause it to return a test
suite which will run only the test_two() test method. The specifier
can refer to modules and packages which have not been imported; they will
be imported as a side-effect.
The method optionally resolves name relative to the given module.
Similar to loadTestsFromName(), but takes a sequence of names rather
than a single name. The return value is a test suite which supports all
the tests defined for each name.
Find and return all test modules from the specified start directory,
recursing into subdirectories to find them. Only test files that match
pattern will be loaded. (Using shell style pattern matching.) Only
module names that are importable (i.e. are valid Python identifiers) will
be loaded.
All test modules must be importable from the top level of the project. If
the start directory is not the top level directory then the top level
directory must be specified separately.
If importing a module fails, for example due to a syntax error, then this
will be recorded as a single error and discovery will continue.
If a test package name (directory with __init__.py) matches the
pattern then the package will be checked for a load_tests
function. If this exists then it will be called with loader, tests,
pattern.
If load_tests exists then discovery does not recurse into the package,
load_tests is responsible for loading all tests in the package.
The pattern is deliberately not stored as a loader attribute so that
packages can continue discovery themselves. top_level_dir is stored so
load_tests does not need to pass this argument in to
loader.discover().
start_dir can be a dotted module name as well as a directory.
New in version 3.2:
New in version 3.2.
The following attributes of a TestLoader can be configured either by
subclassing or assignment on an instance:
Callable object that constructs a test suite from a list of tests. No
methods on the resulting object are needed. The default value is the
TestSuite class.
This class is used to compile information about which tests have succeeded
and which have failed.
A TestResult object stores the results of a set of tests. The
TestCase and TestSuite classes ensure that results are
properly recorded; test authors do not need to worry about recording the
outcome of tests.
Testing frameworks built on top of unittest may want access to the
TestResult object generated by running a set of tests for reporting
purposes; a TestResult instance is returned by the
TestRunner.run() method for this purpose.
TestResult instances have the following attributes that will be of
interest when inspecting the results of running a set of tests:
A list containing 2-tuples of TestCase instances and strings
holding formatted tracebacks. Each tuple represents a test which raised an
unexpected exception.
A list containing 2-tuples of TestCase instances and strings
holding formatted tracebacks. Each tuple represents a test where a failure
was explicitly signalled using the TestCase.fail*() or
TestCase.assert*() methods.
If set to true, sys.stdout and sys.stderr will be buffered in between
startTest() and stopTest() being called. Collected output will
only be echoed onto the real sys.stdout and sys.stderr if the test
fails or errors. Any output is also attached to the failure / error message.
This method can be called to signal that the set of tests being run should
be aborted by setting the shouldStop attribute to True.
TestRunner objects should respect this flag and return without
running any additional tests.
For example, this feature is used by the TextTestRunner class to
stop the test framework when the user signals an interrupt from the
keyboard. Interactive tools which provide TestRunner
implementations can use this in a similar manner.
The following methods of the TestResult class are used to maintain
the internal data structures, and may be extended in subclasses to support
additional reporting requirements. This is particularly useful in building
tools which support interactive reporting while tests are being run.
Called when the test case test raises an unexpected exception err is a
tuple of the form returned by sys.exc_info(): (type,value,traceback).
The default implementation appends a tuple (test,formatted_err) to
the instance’s errors attribute, where formatted_err is a
formatted traceback derived from err.
Called when the test case test signals a failure. err is a tuple of
the form returned by sys.exc_info(): (type,value,traceback).
The default implementation appends a tuple (test,formatted_err) to
the instance’s failures attribute, where formatted_err is a
formatted traceback derived from err.
Called when the test case test fails, but was marked with the
expectedFailure() decorator.
The default implementation appends a tuple (test,formatted_err) to
the instance’s expectedFailures attribute, where formatted_err
is a formatted traceback derived from err.
Instance of the TestLoader class intended to be shared. If no
customization of the TestLoader is needed, this instance can be used
instead of repeatedly creating new instances.
class unittest.TextTestRunner(stream=None, descriptions=True, verbosity=1, runnerclass=None, warnings=None)¶
A basic test runner implementation that outputs results to a stream. If stream
is None, the default, sys.stderr is used as the output stream. This class
has a few configurable parameters, but is essentially very simple. Graphical
applications which run test suites should provide alternate implementations.
By default this runner shows DeprecationWarning,
PendingDeprecationWarning, and ImportWarning even if they are
ignored by default. Deprecation warnings caused by
deprecated unittest methods are also
special-cased and, when the warning filters are 'default' or 'always',
they will appear only once per-module, in order to avoid too many warning
messages. This behavior can be overridden using the -Wd or
-Wa options and leaving warnings to None.
Changed in version 3.2:
Changed in version 3.2: Added the warnings argument.
Changed in version 3.2:
Changed in version 3.2: The default stream is set to sys.stderr at instantiation time rather
than import time.
This method returns the instance of TestResult used by run().
It is not intended to be called directly, but can be overridden in
subclasses to provide a custom TestResult.
_makeResult() instantiates the class or callable passed in the
TextTestRunner constructor as the resultclass argument. It
defaults to TextTestResult if no resultclass is provided.
The result class is instantiated with the following arguments:
A command-line program that runs a set of tests; this is primarily for making
test modules conveniently executable. The simplest use for this function is to
include the following line at the end of a test script:
if__name__=='__main__':unittest.main()
You can run tests with more detailed information by passing in the verbosity
argument:
if__name__=='__main__':unittest.main(verbosity=2)
The testRunner argument can either be a test runner class or an already
created instance of it. By default main calls sys.exit() with
an exit code indicating success or failure of the tests run.
main supports being used from the interactive interpreter by passing in the
argument exit=False. This displays the result on standard output without
calling sys.exit():
The failfast, catchbreak and buffer parameters have the same
effect as the same-name command-line options.
The warning argument specifies the warning filter
that should be used while running the tests. If it’s not specified, it will
remain None if a -W option is passed to python,
otherwise it will be set to 'default'.
Calling main actually returns an instance of the TestProgram class.
This stores the result of the tests run as the result attribute.
Changed in version 3.1:
Changed in version 3.1: The exit parameter was added.
Changed in version 3.2:
Changed in version 3.2: The verbosity, failfast, catchbreak, buffer
and warnings parameters were added.
Modules or packages can customize how tests are loaded from them during normal
test runs or test discovery by implementing a function called load_tests.
loader is the instance of TestLoader doing the loading.
standard_tests are the tests that would be loaded by default from the
module. It is common for test modules to only want to add or remove tests
from the standard set of tests.
The third argument is used when loading packages as part of test discovery.
A typical load_tests function that loads tests from a specific set of
TestCase classes may look like:
If discovery is started, either from the command line or by calling
TestLoader.discover(), with a pattern that matches a package
name then the package __init__.py will be checked for load_tests.
Note
The default pattern is ‘test*.py’. This matches all Python files
that start with ‘test’ but won’t match any test directories.
A pattern like ‘test*’ will match test packages as well as
modules.
If the package __init__.py defines load_tests then it will be
called and discovery not continued into the package. load_tests
is called with the following arguments:
load_tests(loader,standard_tests,pattern)
This should return a TestSuite representing all the tests
from the package. (standard_tests will only contain tests
collected from __init__.py.)
Because the pattern is passed into load_tests the package is free to
continue (and potentially modify) test discovery. A ‘do nothing’
load_tests function for a test package would look like:
defload_tests(loader,standard_tests,pattern):# top level directory cached on loader instancethis_dir=os.path.dirname(__file__)package_tests=loader.discover(start_dir=this_dir,pattern=pattern)standard_tests.addTests(package_tests)returnstandard_tests
Class and module level fixtures are implemented in TestSuite. When
the test suite encounters a test from a new class then tearDownClass()
from the previous class (if there is one) is called, followed by
setUpClass() from the new class.
Similarly if a test is from a different module from the previous test then
tearDownModule from the previous module is run, followed by
setUpModule from the new module.
After all the tests have run the final tearDownClass and
tearDownModule are run.
Note that shared fixtures do not play well with [potential] features like test
parallelization and they break test isolation. They should be used with care.
The default ordering of tests created by the unittest test loaders is to group
all tests from the same modules and classes together. This will lead to
setUpClass / setUpModule (etc) being called exactly once per class and
module. If you randomize the order, so that tests from different modules and
classes are adjacent to each other, then these shared fixture functions may be
called multiple times in a single test run.
Shared fixtures are not intended to work with suites with non-standard
ordering. A BaseTestSuite still exists for frameworks that don’t want to
support shared fixtures.
If there are any exceptions raised during one of the shared fixture functions
the test is reported as an error. Because there is no corresponding test
instance an _ErrorHolder object (that has the same interface as a
TestCase) is created to represent the error. If you are just using
the standard unittest test runner then this detail doesn’t matter, but if you
are a framework author it may be relevant.
If you want the setUpClass and tearDownClass on base classes called
then you must call up to them yourself. The implementations in
TestCase are empty.
If an exception is raised during a setUpClass then the tests in the class
are not run and the tearDownClass is not run. Skipped classes will not
have setUpClass or tearDownClass run. If the exception is a
SkipTest exception then the class will be reported as having been skipped
instead of as an error.
If an exception is raised in a setUpModule then none of the tests in the
module will be run and the tearDownModule will not be run. If the exception is a
SkipTest exception then the module will be reported as having been skipped
instead of as an error.
The -c/--catch command-line option to unittest,
along with the catchbreak parameter to unittest.main(), provide
more friendly handling of control-C during a test run. With catch break
behavior enabled control-C will allow the currently running test to complete,
and the test run will then end and report all the results so far. A second
control-c will raise a KeyboardInterrupt in the usual way.
The control-c handling signal handler attempts to remain compatible with code or
tests that install their own signal.SIGINT handler. If the unittest
handler is called but isn’t the installed signal.SIGINT handler,
i.e. it has been replaced by the system under test and delegated to, then it
calls the default handler. This will normally be the expected behavior by code
that replaces an installed handler and delegates to it. For individual tests
that need unittest control-c handling disabled the removeHandler()
decorator can be used.
There are a few utility functions for framework authors to enable control-c
handling functionality within test frameworks.
Install the control-c handler. When a signal.SIGINT is received
(usually in response to the user pressing control-c) all registered results
have stop() called.
Register a TestResult object for control-c handling. Registering a
result stores a weak reference to it, so it doesn’t prevent the result from
being garbage collected.
Registering a TestResult object has no side-effects if control-c
handling is not enabled, so test frameworks can unconditionally register
all results they create independently of whether or not handling is enabled.
When called without arguments this function removes the control-c handler
if it has been installed. This function can also be used as a test decorator
to temporarily remove the handler whilst the test is being executed:
2to3 is a Python program that reads Python 2.x source code and applies a series
of fixers to transform it into valid Python 3.x code. The standard library
contains a rich set of fixers that will handle almost all code. 2to3 supporting
library lib2to3 is, however, a flexible and generic library, so it is
possible to write your own fixers for 2to3. lib2to3 could also be
adapted to custom applications in which Python code needs to be edited
automatically.
2to3 will usually be installed with the Python interpreter as a script. It is
also located in the Tools/scripts directory of the Python root.
2to3’s basic arguments are a list of files or directories to transform. The
directories are to recursively traversed for Python sources.
Here is a sample Python 2.x source file, example.py:
defgreet(name):print"Hello, {0}!".format(name)print"What's your name?"name=raw_input()greet(name)
It can be converted to Python 3.x code via 2to3 on the command line:
$ 2to3 example.py
A diff against the original source file is printed. 2to3 can also write the
needed modifications right back to the source file. (A backup of the original
file is made unless -n is also given.) Writing the changes back is
enabled with the -w flag:
$ 2to3 -w example.py
After transformation, example.py looks like this:
defgreet(name):print("Hello, {0}!".format(name))print("What's your name?")name=input()greet(name)
Comments and exact indentation are preserved throughout the translation process.
By default, 2to3 runs a set of predefined fixers. The
-l flag lists all available fixers. An explicit set of fixers to run
can be given with -f. Likewise the -x explicitly disables a
fixer. The following example runs only the imports and has_key fixers:
$ 2to3 -f imports -f has_key example.py
This command runs every fixer except the apply fixer:
$ 2to3 -x apply example.py
Some fixers are explicit, meaning they aren’t run by default and must be
listed on the command line to be run. Here, in addition to the default fixers,
the idioms fixer is run:
$ 2to3 -f all -f idioms example.py
Notice how passing all enables all default fixers.
Sometimes 2to3 will find a place in your source code that needs to be changed,
but 2to3 cannot fix automatically. In this case, 2to3 will print a warning
beneath the diff for a file. You should address the warning in order to have
compliant 3.x code.
2to3 can also refactor doctests. To enable this mode, use the -d
flag. Note that only doctests will be refactored. This also doesn’t require
the module to be valid Python. For example, doctest like examples in a reST
document could also be refactored with this option.
The -v option enables output of more information on the translation
process.
Since some print statements can be parsed as function calls or statements, 2to3
cannot always read files containing the print function. When 2to3 detects the
presence of the from__future__importprint_function compiler directive, it
modifies its internal grammar to interpret print() as a function. This
change can also be enabled manually with the -p flag. Use
-p to run fixers on code that already has had its print statements
converted.
Each step of transforming code is encapsulated in a fixer. The command 2to3-l lists them. As documented above, each can be turned on
and off individually. They are described here in more detail.
This optional fixer performs several transformations that make Python code
more idiomatic. Type comparisons like type(x)isSomeClass and
type(x)==SomeClass are converted to isinstance(x,SomeClass).
while1 becomes whileTrue. This fixer also tries to make use of
sorted() in appropriate places. For example, this block
Removes imports of itertools.ifilter(), itertools.izip(), and
itertools.imap(). Imports of itertools.ifilterfalse() are also
changed to itertools.filterfalse().
Changes usage of itertools.ifilter(), itertools.izip(), and
itertools.imap() to their built-in equivalents.
itertools.ifilterfalse() is changed to itertools.filterfalse().
Converts calls to various functions in the operator module to other,
but equivalent, function calls. When needed, the appropriate import
statements are added, e.g. importcollections. The following mapping
are made:
Converts raiseE,V to raiseE(V), and raiseE,V,T to raiseE(V).with_traceback(T). If E is a tuple, the translation will be
incorrect because substituting tuples for exceptions has been removed in 3.0.
The test package is meant for internal use by Python only. It is
documented for the benefit of the core developers of Python. Any use of
this package outside of Python’s standard library is discouraged as code
mentioned here can change or be removed without notice between releases of
Python.
The test package contains all regression tests for Python as well as the
modules test.support and test.regrtest.
test.support is used to enhance your tests while
test.regrtest drives the testing suite.
Each module in the test package whose name starts with test_ is a
testing suite for a specific module or feature. All new tests should be written
using the unittest or doctest module. Some older tests are
written using a “traditional” testing style that compares output printed to
sys.stdout; this style of test is considered deprecated.
It is preferred that tests that use the unittest module follow a few
guidelines. One is to name the test module by starting it with test_ and end
it with the name of the module being tested. The test methods in the test module
should start with test_ and end with a description of what the method is
testing. This is needed so that the methods are recognized by the test driver as
test methods. Also, no documentation string for the method should be included. A
comment (such as #TestsfunctionreturnsonlyTrueorFalse) should be used
to provide documentation for test methods. This is done because documentation
strings get printed out if they exist and thus what test is being run is not
stated.
A basic boilerplate is often used:
importunittestfromtestimportsupportclassMyTestCase1(unittest.TestCase):# Only use setUp() and tearDown() if necessarydefsetUp(self):...codetoexecuteinpreparationfortests...deftearDown(self):...codetoexecutetocleanupaftertests...deftest_feature_one(self):# Test feature one....testingcode...deftest_feature_two(self):# Test feature two....testingcode......moretestmethods...classMyTestCase2(unittest.TestCase):...samestructureasMyTestCase1......moretestclasses...deftest_main():support.run_unittest(MyTestCase1,MyTestCase2,...listothertests...)if__name__=='__main__':test_main()
This boilerplate code allows the testing suite to be run by test.regrtest
as well as on its own as a script.
The goal for regression testing is to try to break code. This leads to a few
guidelines to be followed:
The testing suite should exercise all classes, functions, and constants. This
includes not just the external API that is to be presented to the outside
world but also “private” code.
Whitebox testing (examining the code being tested when the tests are being
written) is preferred. Blackbox testing (testing only the published user
interface) is not complete enough to make sure all boundary and edge cases
are tested.
Make sure all possible values are tested including invalid ones. This makes
sure that not only all valid values are acceptable but also that improper
values are handled correctly.
Exhaust as many code paths as possible. Test where branching occurs and thus
tailor input to make sure as many different paths through the code are taken.
Add an explicit test for any bugs discovered for the tested code. This will
make sure that the error does not crop up again if the code is changed in the
future.
Make sure to clean up after your tests (such as close and remove all temporary
files).
If a test is dependent on a specific condition of the operating system then
verify the condition already exists before attempting the test.
Import as few modules as possible and do it as soon as possible. This
minimizes external dependencies of tests and also minimizes possible anomalous
behavior from side-effects of importing a module.
Try to maximize code reuse. On occasion, tests will vary by something as small
as what type of input is used. Minimize code duplication by subclassing a
basic test class with a class that specifies the input:
The test package can be run as a script to drive Python’s regression
test suite, thanks to the -m option: python -m test. Under
the hood, it uses test.regrtest; the call python -m
test.regrtest used in previous Python versions still works).
Running the script by itself automatically starts running all regression
tests in the test package. It does this by finding all modules in the
package whose name starts with test_, importing them, and executing the
function test_main() if present. The names of tests to execute may also
be passed to the script. Specifying a single regression test (python
-m test test_spam) will minimize output and only print
whether the test passed or failed and thus minimize output.
Running test directly allows what resources are available for
tests to use to be set. You do this by using the -u command-line
option. Run python -m test -uall to turn on all
resources; specifying all as an option for -u enables all
possible resources. If all but one resource is desired (a more common case), a
comma-separated list of resources that are not desired may be listed after
all. The command python -m test -uall,-audio,-largefile
will run test with all resources except the audio and
largefile resources. For a list of all resources and more command-line
options, run python -m test -h.
Some other ways to execute the regression tests depend on what platform the
tests are being executed on. On Unix, you can run make test at the
top-level directory where Python was built. On Windows,
executing rt.bat from your PCBuild directory will run all
regression tests.
The test.support module provides support for Python’s regression
test suite.
Note
test.support is not a public module. It is documented here to help
Python developers write tests. The API of this module is subject to change
without backwards compatibility concerns between releases.
True when verbose output is enabled. Should be checked when more
detailed information is desired about a running test. verbose is set by
test.regrtest.
Raise ResourceDenied if resource is not available. msg is the
argument to ResourceDenied if it is raised. Always returns
True if called by a function whose __name__ is '__main__'.
Used when tests are executed by test.regrtest.
Return the path to the file named filename. If no match is found
filename is returned. This does not equal a failure since it could be the
path to the file.
Execute unittest.TestCase subclasses passed to the function. The
function scans the classes for methods starting with the prefix test_
and executes the tests individually.
It is also legal to pass strings as parameters; these should be keys in
sys.modules. Each associated module will be scanned by
unittest.TestLoader.loadTestsFromModule(). This is usually seen in the
following test_main() function:
deftest_main():support.run_unittest(__name__)
This will run all tests defined in the named module.
A convenience wrapper for warnings.catch_warnings() that makes it
easier to test that a warning was correctly raised. It is approximately
equivalent to calling warnings.catch_warnings(record=True) with
warnings.simplefilter() set to always and with the option to
automatically validate the results that are recorded.
check_warnings accepts 2-tuples of the form ("messageregexp",WarningCategory) as positional arguments. If one or more filters are
provided, or if the optional keyword argument quiet is False,
it checks to make sure the warnings are as expected: each specified filter
must match at least one of the warnings raised by the enclosed code or the
test fails, and if any warnings are raised that do not match any of the
specified filters the test fails. To disable the first of these checks,
set quiet to True.
If no arguments are specified, it defaults to:
check_warnings(("",Warning),quiet=True)
In this case all warnings are caught and no errors are raised.
On entry to the context manager, a WarningRecorder instance is
returned. The underlying warnings list from
catch_warnings() is available via the recorder object’s
warnings attribute. As a convenience, the attributes of the object
representing the most recent warning can also be accessed directly through
the recorder object (see example below). If no warning has been raised,
then any of the attributes that would otherwise be expected on an object
representing a warning will return None.
The recorder object also has a reset() method, which clears the
warnings list.
The context manager is designed to be used like this:
withcheck_warnings(("assertion is always true",SyntaxWarning),("",UserWarning)):exec('assert(False, "Hey!")')warnings.warn(UserWarning("Hide me!"))
In this case if either warning was not raised, or some other warning was
raised, check_warnings() would raise an error.
When a test needs to look more deeply into the warnings, rather than
just checking whether or not they occurred, code like this can be used:
This is a context manager that runs the with statement body using
a StringIO.StringIO object as sys.stdout. That object can be
retrieved using the as clause of the with statement.
This function imports and returns a fresh copy of the named Python module
by removing the named module from sys.modules before doing the import.
Note that unlike reload(), the original module is not affected by
this operation.
fresh is an iterable of additional module names that are also removed
from the sys.modules cache before doing the import.
blocked is an iterable of module names that are replaced with 0
in the module cache during the import to ensure that attempts to import
them raise ImportError.
The named module and any modules named in the fresh and blocked
parameters are saved before starting the import and then reinserted into
sys.modules when the fresh import is complete.
Module and package deprecation messages are suppressed during this import
if deprecated is True.
This function will raise unittest.SkipTest is the named module
cannot be imported.
Example use:
# Get copies of the warnings module for testing without# affecting the version being used by the rest of the test suite# One copy uses the C implementation, the other is forced to use# the pure Python fallback implementationpy_warnings=import_fresh_module('warnings',blocked=['_warnings'])c_warnings=import_fresh_module('warnings',fresh=['_warnings'])
New in version 3.1:
New in version 3.1.
The test.support module defines the following classes:
class test.support.TransientResource(exc, **kwargs)¶
Instances are a context manager that raises ResourceDenied if the
specified exception type is raised. Any keyword arguments are treated as
attribute/value pairs to be compared against any exception raised within the
with statement. Only if all pairs match properly against
attributes on the exception is ResourceDenied raised.
Class used to temporarily set or unset environment variables. Instances can
be used as a context manager and have a complete dictionary interface for
querying/modifying the underlying os.environ. After exit from the
context manager all changes to environment variables done through this
instance will be rolled back.
Changed in version 3.1:
Changed in version 3.1: Added dictionary interface.
These libraries help you with Python development: the debugger enables you to
step through code, analyze stack frames and set breakpoints etc., and the
profilers run code and give you a detailed breakdown of execution times,
allowing you to identify bottlenecks in your programs.
class bdb.Breakpoint(self, file, line, temporary=0, cond=None, funcname=None)¶
This class implements temporary breakpoints, ignore counts, disabling and
(re-)enabling, and conditionals.
Breakpoints are indexed by number through a list called bpbynumber
and by (file,line) pairs through bplist. The former points to a
single instance of class Breakpoint. The latter points to a list of
such instances since there may be more than one breakpoint per line.
When creating a breakpoint, its associated filename should be in canonical
form. If a funcname is defined, a breakpoint hit will be counted when the
first line of that function is executed. A conditional breakpoint always
counts a hit.
Delete the breakpoint from the list associated to a file/line. If it is
the last breakpoint in that position, it also deletes the entry for the
file/line.
The Bdb class acts as a generic Python debugger base class.
This class takes care of the details of the trace facility; a derived class
should implement user interaction. The standard debugger class
(pdb.Pdb) is an example.
The skip argument, if given, must be an iterable of glob-style
module name patterns. The debugger will not step into frames that
originate in a module that matches one of these patterns. Whether a
frame is considered to originate in a certain module is determined
by the __name__ in the frame globals.
New in version 3.1:
New in version 3.1: The skip argument.
The following methods of Bdb normally don’t need to be overridden.
Auxiliary method for getting a filename in a canonical form, that is, as a
case-normalized (on case-insensitive filesystems) absolute path, stripped
of surrounding angle brackets.
This function is installed as the trace function of debugged frames. Its
return value is the new trace function (in most cases, that is, itself).
The default implementation decides how to dispatch a frame, depending on
the type of event (passed as a string) that is about to be executed.
event can be one of the following:
"line": A new line of code is going to be executed.
"call": A function is about to be called, or another code block
entered.
"return": A function or other code block is about to return.
"exception": An exception has occurred.
"c_call": A C function is about to be called.
"c_return": A C function has returned.
"c_exception": A C function has raised an exception.
For the Python events, specialized functions (see below) are called. For
the C events, no action is taken.
The arg parameter depends on the previous event.
See the documentation for sys.settrace() for more information on the
trace function. For more information on code and frame objects, refer to
标准类型层次.
If the debugger should stop on the current line, invoke the
user_line() method (which should be overridden in subclasses).
Raise a BdbQuit exception if the Bdb.quitting flag is set
(which can be set from user_line()). Return a reference to the
trace_dispatch() method for further tracing in that scope.
If the debugger should stop on this function call, invoke the
user_call() method (which should be overridden in subclasses).
Raise a BdbQuit exception if the Bdb.quitting flag is set
(which can be set from user_call()). Return a reference to the
trace_dispatch() method for further tracing in that scope.
If the debugger should stop on this function return, invoke the
user_return() method (which should be overridden in subclasses).
Raise a BdbQuit exception if the Bdb.quitting flag is set
(which can be set from user_return()). Return a reference to the
trace_dispatch() method for further tracing in that scope.
If the debugger should stop at this exception, invokes the
user_exception() method (which should be overridden in subclasses).
Raise a BdbQuit exception if the Bdb.quitting flag is set
(which can be set from user_exception()). Return a reference to the
trace_dispatch() method for further tracing in that scope.
Normally derived classes don’t override the following methods, but they may
if they want to redefine the definition of stopping and breakpoints.
This method checks if there is a breakpoint in the filename and line
belonging to frame or, at least, in the current function. If the
breakpoint is a temporary one, this method deletes it.
Set the quitting attribute to True. This raises BdbQuit in
the next call to one of the dispatch_*() methods.
Derived classes and clients can call the following methods to manipulate
breakpoints. These methods return a string containing an error message if
something went wrong, or None if all is well.
Set a new breakpoint. If the lineno line doesn’t exist for the
filename passed as argument, return an error message. The filename
should be in canonical form, as described in the canonic() method.
Return a breakpoint specified by the given number. If arg is a string,
it will be converted to a number. If arg is a non-numeric string, if
the given breakpoint never existed or has been deleted, a
ValueError is raised.
Check whether we should break here, depending on the way the breakpoint b
was set.
If it was set via line number, it checks if b.line is the same as the one
in the frame also passed as argument. If the breakpoint was set via function
name, we have to check we are in the right frame (the right function) and if
we are in its first executable line.
Determine if there is an effective (active) breakpoint at this line of code.
Return a tuple of the breakpoint and a boolean that indicates if it is ok
to delete a temporary breakpoint. Return (None,None) if there is no
matching breakpoint.
The module pdb defines an interactive source code debugger for Python
programs. It supports setting (conditional) breakpoints and single stepping at
the source line level, inspection of stack frames, source code listing, and
evaluation of arbitrary Python code in the context of any stack frame. It also
supports post-mortem debugging and can be called under program control.
The debugger is extensible – it is actually defined as the class Pdb.
This is currently undocumented but easily understood by reading the source. The
extension interface uses the modules bdb and cmd.
The debugger’s prompt is (Pdb). Typical usage to run a program under control
of the debugger is:
pdb.py can also be invoked as a script to debug other scripts. For
example:
python3 -m pdb myscript.py
When invoked as a script, pdb will automatically enter post-mortem debugging if
the program being debugged exits abnormally. After post-mortem debugging (or
after normal exit of the program), pdb will restart the program. Automatic
restarting preserves pdb’s state (such as breakpoints) and in most cases is more
useful than quitting the debugger upon program’s exit.
New in version 3.2:
New in version 3.2: pdb.py now accepts a -c option that executes commands as if given
in a .pdbrc file, see Debugger Commands.
The typical usage to break into the debugger from a running program is to
insert
importpdb;pdb.set_trace()
at the location you want to break into the debugger. You can then step through
the code following this statement, and continue running without the debugger
using the continue command.
The typical usage to inspect a crashed program is:
>>> importpdb>>> importmymodule>>> mymodule.test()Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "./mymodule.py", line 4, in testtest2()
File "./mymodule.py", line 3, in test2print(spam)NameError: spam>>> pdb.pm()> ./mymodule.py(3)test2()-> print(spam)(Pdb)
The module defines the following functions; each enters the debugger in a
slightly different way:
Execute the statement (given as a string or a code object) under debugger
control. The debugger prompt appears before any code is executed; you can
set breakpoints and type continue, or you can step through the
statement using step or next (all these commands are
explained below). The optional globals and locals arguments specify the
environment in which the code is executed; by default the dictionary of the
module __main__ is used. (See the explanation of the built-in
exec() or eval() functions.)
Evaluate the expression (given as a string or a code object) under debugger
control. When runeval() returns, it returns the value of the
expression. Otherwise this function is similar to run().
Call the function (a function or method object, not a string) with the
given arguments. When runcall() returns, it returns whatever the
function call returned. The debugger prompt appears as soon as the function
is entered.
Enter the debugger at the calling stack frame. This is useful to hard-code a
breakpoint at a given point in a program, even if the code is not otherwise
being debugged (e.g. when an assertion fails).
Enter post-mortem debugging of the given traceback object. If no
traceback is given, it uses the one of the exception that is currently
being handled (an exception must be being handled if the default is to be
used).
Enter post-mortem debugging of the traceback found in
sys.last_traceback.
The run* functions and set_trace() are aliases for instantiating the
Pdb class and calling the method of the same name. If you want to
access further features, you have to do this yourself:
class pdb.Pdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False)¶
The completekey, stdin and stdout arguments are passed to the
underlying cmd.Cmd class; see the description there.
The skip argument, if given, must be an iterable of glob-style module name
patterns. The debugger will not step into frames that originate in a module
that matches one of these patterns. [1]
By default, Pdb sets a handler for the SIGINT signal (which is sent when the
user presses Ctrl-C on the console) when you give a continue command.
This allows you to break into the debugger again by pressing Ctrl-C. If you
want Pdb not to touch the SIGINT handler, set nosigint tot true.
Example call to enable tracing with skip:
importpdb;pdb.Pdb(skip=['django.*']).set_trace()
New in version 3.1:
New in version 3.1: The skip argument.
New in version 3.2:
New in version 3.2: The nosigint argument. Previously, a SIGINT handler was never set by
Pdb.
The commands recognized by the debugger are listed below. Most commands can be
abbreviated to one or two letters as indicated; e.g. h(elp) means that
either h or help can be used to enter the help command (but not he
or hel, nor H or Help or HELP). Arguments to commands must be
separated by whitespace (spaces or tabs). Optional arguments are enclosed in
square brackets ([]) in the command syntax; the square brackets must not be
typed. Alternatives in the command syntax are separated by a vertical bar
(|).
Entering a blank line repeats the last command entered. Exception: if the last
command was a list command, the next 11 lines are listed.
Commands that the debugger doesn’t recognize are assumed to be Python statements
and are executed in the context of the program being debugged. Python
statements can also be prefixed with an exclamation point (!). This is a
powerful way to inspect the program being debugged; it is even possible to
change a variable or call a function. When an exception occurs in such a
statement, the exception name is printed but the debugger’s state is not
changed.
The debugger supports aliases. Aliases can have
parameters which allows one a certain level of adaptability to the context under
examination.
Multiple commands may be entered on a single line, separated by ;;. (A
single ; is not used as it is the separator for multiple commands in a line
that is passed to the Python parser.) No intelligence is applied to separating
the commands; the input is split at the first ;; pair, even if it is in the
middle of a quoted string.
If a file .pdbrc exists in the user’s home directory or in the current
directory, it is read in and executed as if it had been typed at the debugger
prompt. This is particularly useful for aliases. If both files exist, the one
in the home directory is read first and aliases defined there can be overridden
by the local file.
Changed in version 3.2:
Changed in version 3.2: .pdbrc can now contain commands that continue debugging, such as
continue or next. Previously, these commands had no
effect.
Without argument, print the list of available commands. With a command as
argument, print help about that command. helppdb displays the full
documentation (the docstring of the pdb module). Since the command
argument must be an identifier, helpexec must be entered to get help on
the ! command.
With a lineno argument, set a break there in the current file. With a
function argument, set a break at the first executable statement within
that function. The line number may be prefixed with a filename and a colon,
to specify a breakpoint in another file (probably one that hasn’t been loaded
yet). The file is searched on sys.path. Note that each breakpoint
is assigned a number to which all the other breakpoint commands refer.
If a second argument is present, it is an expression which must evaluate to
true before the breakpoint is honored.
Without argument, list all breaks, including for each breakpoint, the number
of times that breakpoint has been hit, the current ignore count, and the
associated condition if any.
With a filename:lineno argument, clear all the breakpoints at this line.
With a space separated list of breakpoint numbers, clear those breakpoints.
Without argument, clear all breaks (but first ask confirmation).
Disable the breakpoints given as a space separated list of breakpoint
numbers. Disabling a breakpoint means it cannot cause the program to stop
execution, but unlike clearing a breakpoint, it remains in the list of
breakpoints and can be (re-)enabled.
Set the ignore count for the given breakpoint number. If count is omitted,
the ignore count is set to 0. A breakpoint becomes active when the ignore
count is zero. When non-zero, the count is decremented each time the
breakpoint is reached and the breakpoint is not disabled and any associated
condition evaluates to true.
Set a new condition for the breakpoint, an expression which must evaluate
to true before the breakpoint is honored. If condition is absent, any
existing condition is removed; i.e., the breakpoint is made unconditional.
Specify a list of commands for breakpoint number bpnumber. The commands
themselves appear on the following lines. Type a line containing just
end to terminate the commands. An example:
(Pdb) commands 1
(com) print some_variable
(com) end
(Pdb)
To remove all commands from a breakpoint, type commands and follow it
immediately with end; that is, give no commands.
With no bpnumber argument, commands refers to the last breakpoint set.
You can use breakpoint commands to start your program up again. Simply use
the continue command, or step, or any other command that resumes execution.
Specifying any command resuming execution (currently continue, step, next,
return, jump, quit and their abbreviations) terminates the command list (as if
that command was immediately followed by end). This is because any time you
resume execution (even with a simple next or step), you may encounter another
breakpoint–which could have its own command list, leading to ambiguities about
which list to execute.
If you use the ‘silent’ command in the command list, the usual message about
stopping at a breakpoint is not printed. This may be desirable for breakpoints
that are to print a specific message and then continue. If none of the other
commands print anything, you see no sign that the breakpoint was reached.
Continue execution until the next line in the current function is reached or
it returns. (The difference between next and step is
that step stops inside a called function, while next
executes called functions at (nearly) full speed, only stopping at the next
line in the current function.)
Without argument, continue execution until the line with a number greater
than the current one is reached.
With a line number, continue execution until a line with a number greater or
equal to that is reached. In both cases, also stop when the current frame
returns.
Changed in version 3.2:
Changed in version 3.2: Allow giving an explicit line number.
Set the next line that will be executed. Only available in the bottom-most
frame. This lets you jump back and execute code again, or jump forward to
skip code that you don’t want to run.
It should be noted that not all jumps are allowed – for instance it is not
possible to jump into the middle of a for loop or out of a
finally clause.
List source code for the current file. Without arguments, list 11 lines
around the current line or continue the previous listing. With . as
argument, list 11 lines around the current line. With one argument,
list 11 lines around at that line. With two arguments, list the given range;
if the second argument is less than the first, it is interpreted as a count.
The current line in the current frame is indicated by ->. If an
exception is being debugged, the line where the exception was originally
raised or propagated is indicated by >>, if it differs from the current
line.
Create an alias called name that executes command. The command must
not be enclosed in quotes. Replaceable parameters can be indicated by
%1, %2, and so on, while %* is replaced by all the parameters.
If no command is given, the current alias for name is shown. If no
arguments are given, all aliases are listed.
Aliases may be nested and can contain anything that can be legally typed at
the pdb prompt. Note that internal pdb commands can be overridden by
aliases. Such a command is then hidden until the alias is removed. Aliasing
is recursively applied to the first word of the command line; all other words
in the line are left alone.
As an example, here are two useful aliases (especially when placed in the
.pdbrc file):
# Print instance variables (usage "pi classInst")
alias pi for k in %1.__dict__.keys(): print("%1.",k,"=",%1.__dict__[k])
# Print instance variables in self
alias ps pi self
Execute the (one-line) statement in the context of the current stack frame.
The exclamation point can be omitted unless the first word of the statement
resembles a debugger command. To set a global variable, you can prefix the
assignment command with a global statement on the same line,
e.g.:
(Pdb) global list_options; list_options = ['-l']
(Pdb)
Restart the debugged Python program. If an argument is supplied, it is split
with shlex and the result is used as the new sys.argv.
History, breakpoints, actions and debugger options are preserved.
restart is an alias for run.
A profiler is a program that describes the run time performance of a
program, providing a variety of statistics. This documentation describes the
profiler functionality provided in the modules cProfile, profile
and pstats. This profiler provides deterministic profiling of
Python programs. It also provides a series of report generation tools to allow
users to rapidly examine the results of a profile operation.
The Python standard library provides two different profilers:
cProfile is recommended for most users; it’s a C extension with
reasonable overhead that makes it suitable for profiling long-running
programs. Based on lsprof, contributed by Brett Rosen and Ted
Czotter.
profile, a pure Python module whose interface is imitated by
cProfile. Adds significant overhead to profiled programs. If you’re
trying to extend the profiler in some way, the task might be easier with this
module.
The profile and cProfile modules export the same interface, so
they are mostly interchangeable; cProfile has a much lower overhead but
is newer and might not be available on all systems. cProfile is really a
compatibility layer on top of the internal _lsprof module.
Note
The profiler modules are designed to provide an execution profile for a given
program, not for benchmarking purposes (for that, there is timeit for
reasonably accurate results). This particularly applies to benchmarking
Python code against C code: the profilers introduce overhead for Python code,
but not for C-level functions, and so the C code would seem faster than any
Python one.
This section is provided for users that “don’t want to read the manual.” It
provides a very brief overview, and allows a user to rapidly perform profiling
on an existing application.
To profile an application with a main entry point of foo(), you would add
the following to your module:
importcProfilecProfile.run('foo()')
(Use profile instead of cProfile if the latter is not available on
your system.)
The above action would cause foo() to be run, and a series of informative
lines (the profile) to be printed. The above approach is most useful when
working with the interpreter. If you would like to save the results of a
profile into a file for later examination, you can supply a file name as the
second argument to the run() function:
importcProfilecProfile.run('foo()','fooprof')
The file cProfile.py can also be invoked as a script to profile another
script. For example:
python -m cProfile myscript.py
cProfile.py accepts two optional arguments on the command line:
cProfile.py [-o output_file] [-s sort_order]
-s only applies to standard output (-o is not supplied).
Look in the Stats documentation for valid sort values.
When you wish to review the profile, you should use the methods in the
pstats module. Typically you would load the statistics data as follows:
importpstatsp=pstats.Stats('fooprof')
The class Stats (the above code just created an instance of this class)
has a variety of methods for manipulating and printing the data that was just
read into p. When you ran cProfile.run() above, what was printed was
the result of three method calls:
p.strip_dirs().sort_stats(-1).print_stats()
The first method removed the extraneous path from all the module names. The
second method sorted all the entries according to the standard module/line/name
string that is printed. The third method printed out all the statistics. You
might try the following sort calls:
p.sort_stats('name')p.print_stats()
The first call will actually sort the list by function name, and the second call
will print out the statistics. The following are some interesting calls to
experiment with:
p.sort_stats('cumulative').print_stats(10)
This sorts the profile by cumulative time in a function, and then only prints
the ten most significant lines. If you want to understand what algorithms are
taking time, the above line is what you would use.
If you were looking to see what functions were looping a lot, and taking a lot
of time, you would do:
p.sort_stats('time').print_stats(10)
to sort according to time spent within each function, and then print the
statistics for the top ten functions.
You might also try:
p.sort_stats('file').print_stats('__init__')
This will sort all the statistics by file name, and then print out statistics
for only the class init methods (since they are spelled with __init__ in
them). As one final example, you could try:
p.sort_stats('time','cum').print_stats(.5,'init')
This line sorts statistics with a primary key of time, and a secondary key of
cumulative time, and then prints out some of the statistics. To be specific, the
list is first culled down to 50% (re: .5) of its original size, then only
lines containing init are maintained, and that sub-sub-list is printed.
If you wondered what functions called the above functions, you could now (p
is still sorted according to the last criteria) do:
p.print_callers(.5,'init')
and you would get a list of callers for each of the listed functions.
If you want more functionality, you’re going to have to read the manual, or
guess what the following functions do:
p.print_callees()p.add('fooprof')
Invoked as a script, the pstats module is a statistics browser for
reading and examining profile dumps. It has a simple line-oriented interface
(implemented using cmd) and interactive help.
Deterministic profiling is meant to reflect the fact that all function
call, function return, and exception events are monitored, and precise
timings are made for the intervals between these events (during which time the
user’s code is executing). In contrast, statistical profiling (which is
not done by this module) randomly samples the effective instruction pointer, and
deduces where time is being spent. The latter technique traditionally involves
less overhead (as the code does not need to be instrumented), but provides only
relative indications of where time is being spent.
In Python, since there is an interpreter active during execution, the presence
of instrumented code is not required to do deterministic profiling. Python
automatically provides a hook (optional callback) for each event. In
addition, the interpreted nature of Python tends to add so much overhead to
execution, that deterministic profiling tends to only add small processing
overhead in typical applications. The result is that deterministic profiling is
not that expensive, yet provides extensive run time statistics about the
execution of a Python program.
Call count statistics can be used to identify bugs in code (surprising counts),
and to identify possible inline-expansion points (high call counts). Internal
time statistics can be used to identify “hot loops” that should be carefully
optimized. Cumulative time statistics should be used to identify high level
errors in the selection of algorithms. Note that the unusual handling of
cumulative times in this profiler allows statistics for recursive
implementations of algorithms to be directly compared to iterative
implementations.
The primary entry point for the profiler is the global function
profile.run() (resp. cProfile.run()). It is typically used to create
any profile information. The reports are formatted and printed using methods of
the class pstats.Stats. The following is a description of all of these
standard entry points and functions. For a more in-depth view of some of the
code, consider reading the later section on Profiler Extensions, which includes
discussion of how to derive “better” profilers from the classes presented, or
reading the source code for these modules.
This function takes a single argument that can be passed to the exec()
function, and an optional file name. In all cases this routine attempts to
exec() its first argument, and gather profiling statistics from the
execution. If no file name is present, then this function automatically
prints a simple profiling report, sorted by the standard name string
(file/line/function-name) that is presented in each line. The following is a
typical output from such a call:
2706 function calls (2004 primitive calls) in 4.504 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
...
The first line indicates that 2706 calls were monitored. Of those calls, 2004
were primitive. We define primitive to mean that the call was not
induced via recursion. The next line: Orderedby:standardname, indicates
that the text string in the far right column was used to sort the output. The
column headings include:
ncalls
for the number of calls,
tottime
for the total time spent in the given function (and excluding time made in
calls to sub-functions),
percall
is the quotient of tottime divided by ncalls
cumtime
is the total time spent in this and all subfunctions (from invocation till
exit). This figure is accurate even for recursive functions.
percall
is the quotient of cumtime divided by primitive calls
filename:lineno(function)
provides the respective data of each function
When there are two numbers in the first column (for example, 43/3), then the
latter is the number of primitive calls, and the former is the actual number of
calls. Note that when the function does not recurse, these two values are the
same, and only the single figure is printed.
If sort is given, it can be one of 'stdname' (sort by filename:lineno),
'calls' (sort by number of calls), 'time' (sort by total time) or
'cumulative' (sort by cumulative time). The default is 'stdname'.
This function is similar to run(), with added arguments to supply the
globals and locals dictionaries for the command string.
Analysis of the profiler data is done using the pstats.Stats class.
class pstats.Stats(*filenames, stream=sys.stdout)¶
This class constructor creates an instance of a “statistics object” from a
filename (or set of filenames). Stats objects are manipulated by
methods, in order to print useful reports. You may specify an alternate output
stream by giving the keyword argument, stream.
The file selected by the above constructor must have been created by the
corresponding version of profile or cProfile. To be specific,
there is no file compatibility guaranteed with future versions of this
profiler, and there is no compatibility with files produced by other profilers.
If several files are provided, all the statistics for identical functions will
be coalesced, so that an overall view of several processes can be considered in
a single report. If additional files need to be combined with data in an
existing Stats object, the add() method can be used.
This method for the Stats class removes all leading path information
from file names. It is very useful in reducing the size of the printout to fit
within (close to) 80 columns. This method modifies the object, and the stripped
information is lost. After performing a strip operation, the object is
considered to have its entries in a “random” order, as it was just after object
initialization and loading. If strip_dirs() causes two function names to
be indistinguishable (they are on the same line of the same filename, and have
the same function name), then the statistics for these two entries are
accumulated into a single entry.
This method of the Stats class accumulates additional profiling
information into the current profiling object. Its arguments should refer to
filenames created by the corresponding version of profile.run() or
cProfile.run(). Statistics for identically named (re: file, line, name)
functions are automatically accumulated into single function statistics.
Save the data loaded into the Stats object to a file named filename.
The file is created if it does not exist, and is overwritten if it already
exists. This is equivalent to the method of the same name on the
profile.Profile and cProfile.Profile classes.
This method modifies the Stats object by sorting it according to the
supplied criteria. The argument is typically a string identifying the basis of
a sort (example: 'time' or 'name').
When more than one key is provided, then additional keys are used as secondary
criteria when there is equality in all keys selected before them. For example,
sort_stats('name','file') will sort all the entries according to their
function name, and resolve all ties (identical function names) by sorting by
file name.
Abbreviations can be used for any key names, as long as the abbreviation is
unambiguous. The following are the keys currently defined:
Valid Arg
Meaning
'calls'
call count
'cumulative'
cumulative time
'file'
file name
'module'
file name
'pcalls'
primitive call count
'line'
line number
'name'
function name
'nfl'
name/file/line
'stdname'
standard name
'time'
internal time
Note that all sorts on statistics are in descending order (placing most time
consuming items first), where as name, file, and line number searches are in
ascending order (alphabetical). The subtle distinction between 'nfl' and
'stdname' is that the standard name is a sort of the name as printed, which
means that the embedded line numbers get compared in an odd way. For example,
lines 3, 20, and 40 would (if the file names were the same) appear in the string
order 20, 3 and 40. In contrast, 'nfl' does a numeric compare of the line
numbers. In fact, sort_stats('nfl') is the same as sort_stats('name','file','line').
For backward-compatibility reasons, the numeric arguments -1, 0, 1,
and 2 are permitted. They are interpreted as 'stdname', 'calls',
'time', and 'cumulative' respectively. If this old style format
(numeric) is used, only one sort key (the numeric key) will be used, and
additional arguments will be silently ignored.
This method for the Stats class reverses the ordering of the basic list
within the object. Note that by default ascending vs descending order is
properly selected based on the sort key of choice.
This method for the Stats class prints out a report as described in the
profile.run() definition.
The order of the printing is based on the last sort_stats() operation done
on the object (subject to caveats in add() and strip_dirs()).
The arguments provided (if any) can be used to limit the list down to the
significant entries. Initially, the list is taken to be the complete set of
profiled functions. Each restriction is either an integer (to select a count of
lines), or a decimal fraction between 0.0 and 1.0 inclusive (to select a
percentage of lines), or a regular expression (to pattern match the standard
name that is printed; as of Python 1.5b1, this uses the Perl-style regular
expression syntax defined by the re module). If several restrictions are
provided, then they are applied sequentially. For example:
print_stats(.1,'foo:')
would first limit the printing to first 10% of list, and then only print
functions that were part of filename .*foo:. In contrast, the
command:
print_stats('foo:',.1)
would limit the list to all functions having file names .*foo:, and
then proceed to only print the first 10% of them.
This method for the Stats class prints a list of all functions that
called each function in the profiled database. The ordering is identical to
that provided by print_stats(), and the definition of the restricting
argument is also identical. Each caller is reported on its own line. The
format differs slightly depending on the profiler that produced the stats:
With profile, a number is shown in parentheses after each caller to
show how many times this specific call was made. For convenience, a second
non-parenthesized number repeats the cumulative time spent in the function
at the right.
With cProfile, each caller is preceded by three numbers: the number of
times this specific call was made, and the total and cumulative times spent in
the current function while it was invoked by this specific caller.
This method for the Stats class prints a list of all function that were
called by the indicated function. Aside from this reversal of direction of
calls (re: called vs was called by), the arguments and ordering are identical to
the print_callers() method.
One limitation has to do with accuracy of timing information. There is a
fundamental problem with deterministic profilers involving accuracy. The most
obvious restriction is that the underlying “clock” is only ticking at a rate
(typically) of about .001 seconds. Hence no measurements will be more accurate
than the underlying clock. If enough measurements are taken, then the “error”
will tend to average out. Unfortunately, removing this first error induces a
second source of error.
The second problem is that it “takes a while” from when an event is dispatched
until the profiler’s call to get the time actually gets the state of the
clock. Similarly, there is a certain lag when exiting the profiler event
handler from the time that the clock’s value was obtained (and then squirreled
away), until the user’s code is once again executing. As a result, functions
that are called many times, or call many functions, will typically accumulate
this error. The error that accumulates in this fashion is typically less than
the accuracy of the clock (less than one clock tick), but it can accumulate
and become very significant.
The problem is more important with profile than with the lower-overhead
cProfile. For this reason, profile provides a means of
calibrating itself for a given platform so that this error can be
probabilistically (on the average) removed. After the profiler is calibrated, it
will be more accurate (in a least square sense), but it will sometimes produce
negative numbers (when call counts are exceptionally low, and the gods of
probability work against you :-). ) Do not be alarmed by negative numbers in
the profile. They should only appear if you have calibrated your profiler,
and the results are actually better than without calibration.
The profiler of the profile module subtracts a constant from each event
handling time to compensate for the overhead of calling the time function, and
socking away the results. By default, the constant is 0. The following
procedure can be used to obtain a better constant for a given platform (see
discussion in section Limitations above).
The method executes the number of Python calls given by the argument, directly
and again under the profiler, measuring the time for both. It then computes the
hidden overhead per profiler event, and returns that as a float. For example,
on an 800 MHz Pentium running Windows 2000, and using Python’s time.clock() as
the timer, the magical number is about 12.5e-6.
The object of this exercise is to get a fairly consistent result. If your
computer is very fast, or your timer function has poor resolution, you might
have to pass 100000, or even 1000000, to get consistent results.
When you have a consistent answer, there are three ways you can use it:
importprofile# 1. Apply computed bias to all Profile instances created hereafter.profile.Profile.bias=your_computed_bias# 2. Apply computed bias to a specific Profile instance.pr=profile.Profile()pr.bias=your_computed_bias# 3. Specify computed bias in instance constructor.pr=profile.Profile(bias=your_computed_bias)
If you have a choice, you are better off choosing a smaller constant, and then
your results will “less often” show up as negative in profile statistics.
The Profile class of both modules, profile and cProfile,
were written so that derived classes could be developed to extend the profiler.
The details are not described here, as doing this successfully requires an
expert understanding of how the Profile class works internally. Study
the source code of the module carefully if you want to pursue this.
If all you want to do is change how current time is determined (for example, to
force use of wall-clock time or elapsed process time), pass the timing function
you want to the Profile class constructor:
pr=profile.Profile(your_time_func)
The resulting profiler will then call your_time_func().
profile.Profile
your_time_func() should return a single number, or a list of numbers whose
sum is the current time (like what os.times() returns). If the function
returns a single time number, or the list of returned numbers has length 2, then
you will get an especially fast version of the dispatch routine.
Be warned that you should calibrate the profiler class for the timer function
that you choose. For most machines, a timer that returns a lone integer value
will provide the best results in terms of low overhead during profiling.
(os.times() is pretty bad, as it returns a tuple of floating point
values). If you want to substitute a better timer in the cleanest fashion,
derive a class and hardwire a replacement dispatch method that best handles your
timer call, along with the appropriate calibration constant.
cProfile.Profile
your_time_func() should return a single number. If it returns
integers, you can also invoke the class constructor with a second argument
specifying the real duration of one unit of time. For example, if
your_integer_time_func() returns times measured in thousands of seconds,
you would construct the Profile instance as follows:
pr=profile.Profile(your_integer_time_func,0.001)
As the cProfile.Profile class cannot be calibrated, custom timer
functions should be used with care and should be as fast as possible. For the
best results with a custom timer, it might be necessary to hard-code it in the C
source of the internal _lsprof module.
timeit — Measure execution time of small code snippets¶
This module provides a simple way to time small bits of Python code. It has both
command line as well as callable interfaces. It avoids a number of common traps
for measuring execution times. See also Tim Peters’ introduction to the
“Algorithms” chapter in the Python Cookbook, published by O’Reilly.
The module defines the following public class:
class timeit.Timer(stmt='pass', setup='pass', timer=<timer function>)¶
Class for timing execution speed of small code snippets.
The constructor takes a statement to be timed, an additional statement used for
setup, and a timer function. Both statements default to 'pass'; the timer
function is platform-dependent (see the module doc string). stmt and setup
may also contain multiple statements separated by ; or newlines, as long as
they don’t contain multi-line string literals.
To measure the execution time of the first statement, use the timeit()
method. The repeat() method is a convenience to call timeit()
multiple times and return a list of results.
The stmt and setup parameters can also take objects that are callable
without arguments. This will embed calls to them in a timer function that
will then be executed by timeit(). Note that the timing overhead is a
little larger in this case because of the extra function calls.
t=Timer(...)# outside the try/excepttry:t.timeit(...)# or t.repeat(...)except:t.print_exc()
The advantage over the standard traceback is that source lines in the compiled
template will be displayed. The optional file argument directs where the
traceback is sent; it defaults to sys.stderr.
This is a convenience function that calls the timeit() repeatedly,
returning a list of results. The first argument specifies how many times to
call timeit(). The second argument specifies the number argument for
timeit().
Note
It’s tempting to calculate mean and standard deviation from the result vector
and report these. However, this is not very useful. In a typical case, the
lowest value gives a lower bound for how fast your machine can run the given
code snippet; higher values in the result vector are typically not caused by
variability in Python’s speed, but by other processes interfering with your
timing accuracy. So the min() of the result is probably the only number
you should be interested in. After that, you should look at the entire vector
and apply common sense rather than statistics.
Time number executions of the main statement. This executes the setup
statement once, and then returns the time it takes to execute the main statement
a number of times, measured in seconds as a float. The argument is the number
of times through the loop, defaulting to one million. The main statement, the
setup statement and the timer function to be used are passed to the constructor.
Note
By default, timeit() temporarily turns off garbage collection
during the timing. The advantage of this approach is that it makes
independent timings more comparable. This disadvantage is that GC may be
an important component of the performance of the function being measured.
If so, GC can be re-enabled as the first statement in the setup string.
For example:
timeit.Timer('for i in range(10): oct(i)','gc.enable()').timeit()
The module also defines two convenience functions:
Create a Timer instance with the given statement, setup code and timer
function and run its repeat() method with the given repeat count and
number executions.
A multi-line statement may be given by specifying each line as a separate
statement argument; indented lines are possible by enclosing an argument in
quotes and using leading spaces. Multiple -s options are treated
similarly.
If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
The default timer function is platform dependent. On Windows,
time.clock() has microsecond granularity but time.time()‘s
granularity is 1/60th of a second; on Unix, time.clock() has 1/100th of a
second granularity and time.time() is much more precise. On either
platform, the default timer functions measure wall clock time, not the CPU time.
This means that other processes running on the same computer may interfere with
the timing. The best thing to do when accurate timing is necessary is to repeat
the timing a few times and use the best time. The -r option is good
for this; the default of 3 repetitions is probably enough in most cases. On
Unix, you can use time.clock() to measure CPU time.
Note
There is a certain baseline overhead associated with executing a pass statement.
The code here doesn’t try to hide it, but you should be aware of it. The
baseline overhead can be measured by invoking the program without arguments.
The baseline overhead differs between Python versions! Also, to fairly compare
older Python versions to Python 2.3, you may want to use Python’s -O
option for the older versions to avoid timing SET_LINENO instructions.
Here are two example sessions (one using the command line, one using the module
interface) that compare the cost of using hasattr() vs.
try/except to test for missing and present object
attributes.
$ python -m timeit 'try:' ' str.__bool__' 'except AttributeError:' ' pass'
100000 loops, best of 3: 15.7 usec per loop
$ python -m timeit 'if hasattr(str, "__bool__"): pass'
100000 loops, best of 3: 4.26 usec per loop
$ python -m timeit 'try:' ' int.__bool__' 'except AttributeError:' ' pass'
1000000 loops, best of 3: 1.43 usec per loop
$ python -m timeit 'if hasattr(int, "__bool__"): pass'
100000 loops, best of 3: 2.23 usec per loop
The trace module allows you to trace program execution, generate
annotated statement coverage listings, print caller/callee relationships and
list functions executed during a program run. It can be used in another program
or from the command line.
At least one of the following options must be specified when invoking
trace. The --listfuncs option is mutually exclusive with
the --trace and --counts options . When
--listfuncs is provided, neither --counts nor
--trace are accepted, and vice versa.
Produce a set of annotated listing files upon program completion that shows
how many times each statement was executed. See also
--coverdir, --file and
--no-report below.
Do not generate annotated listings. This is useful if you intend to make
several runs with --count, and then produce a single set of
annotated listings at the end.
class trace.Trace(count=1, trace=1, countfuncs=0, countcallers=0, ignoremods=(), ignoredirs=(), infile=None, outfile=None, timing=False)¶
Create an object to trace execution of a single statement or expression. All
parameters are optional. count enables counting of line numbers. trace
enables line execution tracing. countfuncs enables listing of the
functions called during the run. countcallers enables call relationship
tracking. ignoremods is a list of modules or packages to ignore.
ignoredirs is a list of directories whose modules or packages should be
ignored. infile is the name of the file from which to read stored count
information. outfile is the name of the file in which to write updated
count information. timing enables a timestamp relative to when tracing was
started to be displayed.
Execute the command and gather statistics from the execution with
the current tracing parameters. cmd must be a string or code object,
suitable for passing into exec().
Execute the command and gather statistics from the execution with the
current tracing parameters, in the defined global and local
environments. If not defined, globals and locals default to empty
dictionaries.
Return a CoverageResults object that contains the cumulative
results of all previous calls to run, runctx and runfunc
for the given Trace instance. Does not reset the accumulated
trace results.
Write coverage results. Set show_missing to show lines that had no
hits. Set summary to include in the output the coverage summary per
module. coverdir specifies the directory into which the coverage
result files will be output. If None, the results for each source
file are placed in its directory.
A simple example demonstrating the use of the programmatic interface:
importsysimporttrace# create a Trace object, telling it what to ignore, and whether to# do tracing or line-counting or both.tracer=trace.Trace(ignoredirs=[sys.prefix,sys.exec_prefix],trace=0,count=1)# run the new command using the given tracertracer.run('main()')# make a report, placing output in /tmpr=tracer.results()r.write_results(show_missing=True,coverdir="/tmp")
The modules described in this chapter provide a wide range of services related
to the Python interpreter and its interaction with its environment. Here’s an
overview:
This module provides access to some variables used or maintained by the
interpreter and to functions that interact strongly with the interpreter. It is
always available.
The list of command line arguments passed to a Python script. argv[0] is the
script name (it is operating system dependent whether this is a full pathname or
not). If the command was executed using the -c command line option to
the interpreter, argv[0] is set to the string '-c'. If no script name
was passed to the Python interpreter, argv[0] is the empty string.
To loop over the standard input, or the list of files given on the
command line, see the fileinput module.
An indicator of the native byte order. This will have the value 'big' on
big-endian (most-significant byte first) platforms, and 'little' on
little-endian (least-significant byte first) platforms.
A tuple of strings giving the names of all modules that are compiled into this
Python interpreter. (This information is not available in any other way —
modules.keys() only lists the imported modules.)
Call func(*args), while tracing is enabled. The tracing state is saved,
and restored afterwards. This is intended to be called from a debugger from
a checkpoint, to recursively debug some other code.
Clear the internal type cache. The type cache is used to speed up attribute
and method lookups. Use the function only to drop unnecessary references
during reference leak debugging.
This function should be used for internal and specialized purposes only.
Return a dictionary mapping each thread’s identifier to the topmost stack frame
currently active in that thread at the time the function is called. Note that
functions in the traceback module can build the call stack given such a
frame.
This is most useful for debugging deadlock: this function does not require the
deadlocked threads’ cooperation, and such threads’ call stacks are frozen for as
long as they remain deadlocked. The frame returned for a non-deadlocked thread
may bear no relationship to that thread’s current activity by the time calling
code examines the frame.
This function should be used for internal and specialized purposes only.
If value is not None, this function prints repr(value) to
sys.stdout, and saves value in builtins._. If repr(value) is
not encodable to sys.stdout.encoding with sys.stdout.errors error
handler (which is probably 'strict'), encode it to
sys.stdout.encoding with 'backslashreplace' error handler.
sys.displayhook is called on the result of evaluating an expression
entered in an interactive Python session. The display of these values can be
customized by assigning another one-argument function to sys.displayhook.
Pseudo-code:
defdisplayhook(value):ifvalueisNone:return# Set '_' to None to avoid recursionbuiltins._=Nonetext=repr(value)try:sys.stdout.write(text)exceptUnicodeEncodeError:bytes=text.encode(sys.stdout.encoding,'backslashreplace')ifhasattr(sys.stdout,'buffer'):sys.stdout.buffer.write(bytes)else:text=bytes.decode(sys.stdout.encoding,'strict')sys.stdout.write(text)sys.stdout.write("\n")builtins._=value
Changed in version 3.2:
Changed in version 3.2: Use 'backslashreplace' error handler on UnicodeEncodeError.
This function prints out a given traceback and exception to sys.stderr.
When an exception is raised and uncaught, the interpreter calls
sys.excepthook with three arguments, the exception class, exception
instance, and a traceback object. In an interactive session this happens just
before control is returned to the prompt; in a Python program this happens just
before the program exits. The handling of such top-level exceptions can be
customized by assigning another three-argument function to sys.excepthook.
These objects contain the original values of displayhook and excepthook
at the start of the program. They are saved so that displayhook and
excepthook can be restored in case they happen to get replaced with broken
objects.
This function returns a tuple of three values that give information about the
exception that is currently being handled. The information returned is specific
both to the current thread and to the current stack frame. If the current stack
frame is not handling an exception, the information is taken from the calling
stack frame, or its caller, and so on until a stack frame is found that is
handling an exception. Here, “handling an exception” is defined as “executing
an except clause.” For any stack frame, only information about the exception
being currently handled is accessible.
If no exception is being handled anywhere on the stack, a tuple containing
three None values is returned. Otherwise, the values returned are
(type,value,traceback). Their meaning is: type gets the type of the
exception being handled (a subclass of BaseException); value gets
the exception instance (an instance of the exception type); traceback gets
a traceback object (see the Reference Manual) which encapsulates the call
stack at the point where the exception originally occurred.
Warning
Assigning the traceback return value to a local variable in a function
that is handling an exception will cause a circular reference. Since most
functions don’t need access to the traceback, the best solution is to use
something like exctype,value=sys.exc_info()[:2] to extract only the
exception type and value. If you do need the traceback, make sure to
delete it after use (best done with a try
... finally statement) or to call exc_info() in a
function that does not itself handle an exception.
Such cycles are normally automatically reclaimed when garbage collection
is enabled and they become unreachable, but it remains more efficient to
avoid creating cycles.
A string giving the site-specific directory prefix where the platform-dependent
Python files are installed; by default, this is also '/usr/local'. This can
be set at build time with the --exec-prefix argument to the
configure script. Specifically, all configuration files (e.g. the
pyconfig.h header file) are installed in the directory exec_prefix+'/lib/pythonversion/config', and shared library modules are installed in
exec_prefix+'/lib/pythonversion/lib-dynload', where version is equal to
version[:3].
Exit from Python. This is implemented by raising the SystemExit
exception, so cleanup actions specified by finally clauses of try
statements are honored, and it is possible to intercept the exit attempt at
an outer level.
The optional argument arg can be an integer giving the exit status
(defaulting to zero), or another type of object. If it is an integer, zero
is considered “successful termination” and any nonzero value is considered
“abnormal termination” by shells and the like. Most systems require it to be
in the range 0-127, and produce undefined results otherwise. Some systems
have a convention for assigning specific meanings to specific exit codes, but
these are generally underdeveloped; Unix programs generally use 2 for command
line syntax errors and 1 for all other kind of errors. If another type of
object is passed, None is equivalent to passing zero, and any other
object is printed to stderr and results in an exit code of 1. In
particular, sys.exit("someerrormessage") is a quick way to exit a
program when an error occurs.
Since exit() ultimately “only” raises an exception, it will only exit
the process when called from the main thread, and the exception is not
intercepted.
A structseq holding information about the float type. It contains low level
information about the precision and internal representation. The values
correspond to the various floating-point constants defined in the standard
header file float.h for the ‘C’ programming language; see section
5.2.4.2.2 of the 1999 ISO/IEC C standard [C99], ‘Characteristics of
floating types’, for details.
attribute
float.h macro
explanation
epsilon
DBL_EPSILON
difference between 1 and the least value greater
than 1 that is representable as a float
dig
DBL_DIG
maximum number of decimal digits that can be
faithfully represented in a float; see below
mant_dig
DBL_MANT_DIG
float precision: the number of base-radix
digits in the significand of a float
minimum integer e such that radix**(e-1) is
a normalized float
min_10_exp
DBL_MIN_10_EXP
minimum integer e such that 10**e is a
normalized float
radix
FLT_RADIX
radix of exponent representation
rounds
FLT_ROUNDS
constant representing rounding mode
used for arithmetic operations
The attribute sys.float_info.dig needs further explanation. If
s is any string representing a decimal number with at most
sys.float_info.dig significant digits, then converting s to a
float and back again will recover a string representing the same decimal
value:
>>> importsys>>> sys.float_info.dig15>>> s='3.14159265358979'# decimal string with 15 significant digits>>> format(float(s),'.15g')# convert to float and back -> same value'3.14159265358979'
But for strings with more than sys.float_info.dig significant digits,
this isn’t always true:
>>> s='9876543211234567'# 16 significant digits is too many!>>> format(float(s),'.16g')# conversion changes value'9876543211234568'
A string indicating how the repr() function behaves for
floats. If the string has value 'short' then for a finite
float x, repr(x) aims to produce a short string with the
property that float(repr(x))==x. This is the usual behaviour
in Python 3.1 and later. Otherwise, float_repr_style has value
'legacy' and repr(x) behaves in the same way as it did in
versions of Python prior to 3.1.
Return the current value of the flags that are used for dlopen() calls.
The flag constants are defined in the ctypes and DLFCN modules.
Availability: Unix.
Return the name of the encoding used to convert Unicode filenames into
system file names. The result value depends on the operating system:
On Mac OS X, the encoding is 'utf-8'.
On Unix, the encoding is the user’s preference according to the result of
nl_langinfo(CODESET), or 'utf-8' if nl_langinfo(CODESET) failed.
On Windows NT+, file names are Unicode natively, so no conversion is
performed. getfilesystemencoding() still returns 'mbcs', as
this is the encoding that applications should use when they explicitly
want to convert Unicode strings to byte strings that are equivalent when
used as file names.
On Windows 9x, the encoding is 'mbcs'.
Changed in version 3.2:
Changed in version 3.2: On Unix, use 'utf-8' instead of None if nl_langinfo(CODESET)
failed. getfilesystemencoding() result cannot be None.
Return the reference count of the object. The count returned is generally one
higher than you might expect, because it includes the (temporary) reference as
an argument to getrefcount().
Return the current value of the recursion limit, the maximum depth of the Python
interpreter stack. This limit prevents infinite recursion from causing an
overflow of the C stack and crashing Python. It can be set by
setrecursionlimit().
Return the size of an object in bytes. The object can be any type of
object. All built-in objects will return correct results, but this
does not have to hold true for third-party extensions as it is implementation
specific.
If given, default will be returned if the object does not provide means to
retrieve the size. Otherwise a TypeError will be raised.
getsizeof() calls the object’s __sizeof__ method and adds an
additional garbage collector overhead if the object is managed by the garbage
collector.
Return a frame object from the call stack. If optional integer depth is
given, return the frame object that many calls below the top of the stack. If
that is deeper than the call stack, ValueError is raised. The default
for depth is zero, returning the frame at the top of the call stack.
CPython implementation detail: This function should be used for internal and specialized purposes only.
It is not guaranteed to exist in all implementations of Python.
CPython implementation detail: The gettrace() function is intended only for implementing debuggers,
profilers, coverage tools and the like. Its behavior is part of the
implementation platform, rather than part of the language definition, and
thus may not be available in all Python implementations.
Return a named tuple describing the Windows version
currently running. The named elements are major, minor,
build, platform, service_pack, service_pack_minor,
service_pack_major, suite_mask, and product_type.
service_pack contains a string while all other values are
integers. The components can also be accessed by name, so
sys.getwindowsversion()[0] is equivalent to
sys.getwindowsversion().major. For compatibility with prior
versions, only the first 5 elements are retrievable by indexing.
platform may be one of the following values:
Constant
Platform
0(VER_PLATFORM_WIN32s)
Win32s on Windows 3.1
1(VER_PLATFORM_WIN32_WINDOWS)
Windows 95/98/ME
2(VER_PLATFORM_WIN32_NT)
Windows NT/2000/XP/x64
3(VER_PLATFORM_WIN32_CE)
Windows CE
product_type may be one of the following values:
Constant
Meaning
1(VER_NT_WORKSTATION)
The system is a workstation.
2(VER_NT_DOMAIN_CONTROLLER)
The system is a domain
controller.
3(VER_NT_SERVER)
The system is a server, but not
a domain controller.
This function wraps the Win32 GetVersionEx() function; see the
Microsoft documentation on OSVERSIONINFOEX() for more information
about these fields.
Availability: Windows.
Changed in version 3.2:
Changed in version 3.2: Changed to a named tuple and added service_pack_minor,
service_pack_major, suite_mask, and product_type.
The version number encoded as a single integer. This is guaranteed to increase
with each version, including proper support for non-production releases. For
example, to test that the Python interpreter is at least version 1.5.2, use:
ifsys.hexversion>=0x010502F0:# use some advanced feature...else:# use an alternative implementation or warn the user...
This is called hexversion since it only really looks meaningful when viewed
as the result of passing it to the built-in hex() function. The
struct sequence sys.version_info may be used for a more human-friendly
encoding of the same information.
The hexversion is a 32-bit number with the following layout:
Bits (big endian order)
Meaning
1-8
PY_MAJOR_VERSION (the 2 in
2.1.0a3)
9-16
PY_MINOR_VERSION (the 1 in
2.1.0a3)
17-24
PY_MICRO_VERSION (the 0 in
2.1.0a3)
25-28
PY_RELEASE_LEVEL (0xA for alpha,
0xB for beta, 0xC for release
candidate and 0xF for final)
29-32
PY_RELEASE_SERIAL (the 3 in
2.1.0a3, zero for final releases)
Enter string in the table of “interned” strings and return the interned string
– which is string itself or a copy. Interning strings is useful to gain a
little performance on dictionary lookup – if the keys in a dictionary are
interned, and the lookup key is interned, the key comparisons (after hashing)
can be done by a pointer compare instead of a string compare. Normally, the
names used in Python programs are automatically interned, and the dictionaries
used to hold module, class or instance attributes have interned keys.
Interned strings are not immortal; you must keep a reference to the return
value of intern() around to benefit from it.
These three variables are not always defined; they are set when an exception is
not handled and the interpreter prints an error message and a stack traceback.
Their intended use is to allow an interactive user to import a debugger module
and engage in post-mortem debugging without having to re-execute the command
that caused the error. (Typical use is importpdb;pdb.pm() to enter the
post-mortem debugger; see pdb module for
more information.)
The meaning of the variables is the same as that of the return values from
exc_info() above.
An integer giving the maximum value a variable of type Py_ssize_t can
take. It’s usually 2**31-1 on a 32-bit platform and 2**63-1 on a
64-bit platform.
An integer giving the largest supported code point for a Unicode character. The
value of this depends on the configuration option that specifies whether Unicode
characters are stored as UCS-2 or UCS-4.
A list of finder objects that have their find_module()
methods called to see if one of the objects can find the module to be
imported. The find_module() method is called at least with the
absolute name of the module being imported. If the module to be imported is
contained in package then the parent package’s __path__ attribute
is passed in as a second argument. The method returns None if
the module cannot be found, else returns a loader.
This is a dictionary that maps module names to modules which have already been
loaded. This can be manipulated to force reloading of modules and other tricks.
A list of strings that specifies the search path for modules. Initialized from
the environment variable PYTHONPATH, plus an installation-dependent
default.
As initialized upon program startup, the first item of this list, path[0],
is the directory containing the script that was used to invoke the Python
interpreter. If the script directory is not available (e.g. if the interpreter
is invoked interactively or if the script is read from standard input),
path[0] is the empty string, which directs Python to search modules in the
current directory first. Notice that the script directory is inserted before
the entries inserted as a result of PYTHONPATH.
A program is free to modify this list for its own purposes.
See also
Module site This describes how to use .pth files to extend
sys.path.
A list of callables that take a path argument to try to create a
finder for the path. If a finder can be created, it is to be
returned by the callable, else raise ImportError.
A dictionary acting as a cache for finder objects. The keys are
paths that have been passed to sys.path_hooks and the values are
the finders that are found. If a path is a valid file system path but no
explicit finder is found on sys.path_hooks then None is
stored to represent the implicit default finder should be used. If the path
is not an existing path then imp.NullImporter is set.
This string contains a platform identifier that can be used to append
platform-specific components to sys.path, for instance.
For most Unix systems, this is the lowercased OS name as returned by uname-s with the first part of the version as returned by uname-r appended,
e.g. 'sunos5', at the time when Python was built. Unless you want to
test for a specific system version, it is therefore recommended to use the
following idiom:
Changed in version 3.2.2: Since lots of code check for sys.platform=='linux2', and there is
no essential change between Linux 2.x and 3.x, sys.platform is always
set to 'linux2', even on Linux 3.x. In Python 3.3 and later, the
value will always be set to 'linux', so it is recommended to always
use the startswith idiom presented above.
A string giving the site-specific directory prefix where the platform
independent Python files are installed; by default, this is the string
'/usr/local'. This can be set at build time with the --prefix
argument to the configure script. The main collection of Python
library modules is installed in the directory prefix+'/lib/pythonversion'
while the platform independent header files (all except pyconfig.h) are
stored in prefix+'/include/pythonversion', where version is equal to
version[:3].
Strings specifying the primary and secondary prompt of the interpreter. These
are only defined if the interpreter is in interactive mode. Their initial
values in this case are '>>>' and '...'. If a non-string object is
assigned to either variable, its str() is re-evaluated each time the
interpreter prepares to read a new interactive command; this can be used to
implement a dynamic prompt.
If this is true, Python won’t try to write .pyc or .pyo files on the
import of source modules. This value is initially set to True or False
depending on the -B command line option and the PYTHONDONTWRITEBYTECODE
environment variable, but you can set it yourself to control bytecode file
generation.
Set the interpreter’s “check interval”. This integer value determines how often
the interpreter checks for periodic things such as thread switches and signal
handlers. The default is 100, meaning the check is performed every 100
Python virtual instructions. Setting it to a larger value may increase
performance for programs using threads. Setting it to a value <= 0 checks
every virtual instruction, maximizing responsiveness as well as overhead.
Deprecated since version 3.2:
Deprecated since version 3.2: This function doesn’t have an effect anymore, as the internal logic for
thread switching and asynchronous tasks has been rewritten. Use
setswitchinterval() instead.
Set the flags used by the interpreter for dlopen() calls, such as when
the interpreter loads extension modules. Among other things, this will enable a
lazy resolving of symbols when importing a module, if called as
sys.setdlopenflags(0). To share symbols across extension modules, call as
sys.setdlopenflags(ctypes.RTLD_GLOBAL). Symbolic names for the
flag modules can be either found in the ctypes module, or in the DLFCN
module. If DLFCN is not available, it can be generated from
/usr/include/dlfcn.h using the h2py script. Availability:
Unix.
Set the system’s profile function, which allows you to implement a Python source
code profiler in Python. See chapter The Python Profilers for more information on the
Python profiler. The system’s profile function is called similarly to the
system’s trace function (see settrace()), but it isn’t called for each
executed line of code (only on call and return, but the return event is reported
even when an exception has been set). The function is thread-specific, but
there is no way for the profiler to know about context switches between threads,
so it does not make sense to use this in the presence of multiple threads. Also,
its return value is not used, so it can simply return None.
Set the maximum depth of the Python interpreter stack to limit. This limit
prevents infinite recursion from causing an overflow of the C stack and crashing
Python.
The highest possible limit is platform-dependent. A user may need to set the
limit higher when they have a program that requires deep recursion and a platform
that supports a higher limit. This should be done with care, because a too-high
limit can lead to a crash.
Set the interpreter’s thread switch interval (in seconds). This floating-point
value determines the ideal duration of the “timeslices” allocated to
concurrently running Python threads. Please note that the actual value
can be higher, especially if long-running internal functions or methods
are used. Also, which thread becomes scheduled at the end of the interval
is the operating system’s decision. The interpreter doesn’t have its
own scheduler.
Set the system’s trace function, which allows you to implement a Python
source code debugger in Python. The function is thread-specific; for a
debugger to support multiple threads, it must be registered using
settrace() for each thread being debugged.
Trace functions should have three arguments: frame, event, and
arg. frame is the current stack frame. event is a string: 'call',
'line', 'return', 'exception', 'c_call', 'c_return', or
'c_exception'. arg depends on the event type.
The trace function is invoked (with event set to 'call') whenever a new
local scope is entered; it should return a reference to a local trace
function to be used that scope, or None if the scope shouldn’t be traced.
The local trace function should return a reference to itself (or to another
function for further tracing in that scope), or None to turn off tracing
in that scope.
The events have the following meaning:
'call'
A function is called (or some other code block entered). The
global trace function is called; arg is None; the return value
specifies the local trace function.
'line'
The interpreter is about to execute a new line of code or re-execute the
condition of a loop. The local trace function is called; arg is
None; the return value specifies the new local trace function. See
Objects/lnotab_notes.txt for a detailed explanation of how this
works.
'return'
A function (or other code block) is about to return. The local trace
function is called; arg is the value that will be returned, or None
if the event is caused by an exception being raised. The trace function’s
return value is ignored.
'exception'
An exception has occurred. The local trace function is called; arg is a
tuple (exception,value,traceback); the return value specifies the
new local trace function.
'c_call'
A C function is about to be called. This may be an extension function or
a built-in. arg is the C function object.
'c_return'
A C function has returned. arg is the C function object.
'c_exception'
A C function has raised an exception. arg is the C function object.
Note that as an exception is propagated down the chain of callers, an
'exception' event is generated at each level.
For more information on code and frame objects, refer to 标准类型层次.
CPython implementation detail: The settrace() function is intended only for implementing debuggers,
profilers, coverage tools and the like. Its behavior is part of the
implementation platform, rather than part of the language definition, and
thus may not be available in all Python implementations.
Activate dumping of VM measurements using the Pentium timestamp counter, if
on_flag is true. Deactivate these dumps if on_flag is off. The function is
available only if Python was compiled with --with-tsc. To understand
the output of this dump, read Python/ceval.c in the Python sources.
CPython implementation detail: This function is intimately bound to CPython implementation details and
thus not likely to be implemented elsewhere.
File objects corresponding to the interpreter’s standard
input, output and error streams. stdin is used for all interpreter input
except for scripts but including calls to input(). stdout is used
for the output of print() and expression statements and for the
prompts of input(). The interpreter’s own prompts
and (almost all of) its error messages go to stderr. stdout and
stderr needn’t be built-in file objects: any object is acceptable as long
as it has a write() method that takes a string argument. (Changing these
objects doesn’t affect the standard I/O streams of processes executed by
os.popen(), os.system() or the exec*() family of functions in
the os module.)
The standard streams are in text mode by default. To write or read binary
data to these, use the underlying binary buffer. For example, to write bytes
to stdout, use sys.stdout.buffer.write(b'abc'). Using
io.TextIOBase.detach() streams can be made binary by default. This
function sets stdin and stdout to binary:
These objects contain the original values of stdin, stderr and
stdout at the start of the program. They are used during finalization,
and could be useful to print to the actual standard stream no matter if the
sys.std* object has been redirected.
It can also be used to restore the actual files to known working file objects
in case they have been overwritten with a broken object. However, the
preferred way to do this is to explicitly save the previous stream before
replacing it, and restore the saved object.
Note
Under some conditions stdin, stdout and stderr as well as the
original values __stdin__, __stdout__ and __stderr__ can be
None. It is usually the case for Windows GUI apps that aren’t connected
to a console and Python apps started with pythonw.
A triple (repo, branch, version) representing the Subversion information of the
Python interpreter. repo is the name of the repository, 'CPython'.
branch is a string of one of the forms 'trunk', 'branches/name' or
'tags/name'. version is the output of svnversion, if the interpreter
was built from a Subversion checkout; it contains the revision number (range)
and possibly a trailing ‘M’ if there were local modifications. If the tree was
exported (or svnversion was not available), it is the revision of
Include/patchlevel.h if the branch is a tag. Otherwise, it is None.
Deprecated since version 3.2.1:
Deprecated since version 3.2.1: Python is now developed using
Mercurial. In recent Python 3.2 bugfix releases, subversion
therefore contains placeholder information. It is removed in Python
3.3.
When this variable is set to an integer value, it determines the maximum number
of levels of traceback information printed when an unhandled exception occurs.
The default is 1000. When set to 0 or less, all traceback information
is suppressed and only the exception type and value are printed.
A string containing the version number of the Python interpreter plus additional
information on the build number and compiler used. This string is displayed
when the interactive interpreter is started. Do not extract version information
out of it, rather, use version_info and the functions provided by the
platform module.
A tuple containing the five components of the version number: major, minor,
micro, releaselevel, and serial. All values except releaselevel are
integers; the release level is 'alpha', 'beta', 'candidate', or
'final'. The version_info value corresponding to the Python version 2.0
is (2,0,0,'final',0). The components can also be accessed by name,
so sys.version_info[0] is equivalent to sys.version_info.major
and so on.
Changed in version 3.1:
Changed in version 3.1: Added named component attributes.
This is an implementation detail of the warnings framework; do not modify this
value. Refer to the warnings module for more information on the warnings
framework.
The version number used to form registry keys on Windows platforms. This is
stored as string resource 1000 in the Python DLL. The value is normally the
first three characters of version. It is provided in the sys
module for informational purposes; modifying this value has no effect on the
registry keys used by Python. Availability: Windows.
A dictionary of the various implementation-specific flags passed through
the -X command-line option. Option names are either mapped to
their values, if given explicitly, or to True. Example:
$ ./python -Xa=b -Xc
Python 3.2a3+ (py3k, Oct 16 2010, 20:14:50)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys._xoptions
{'a': 'b', 'c': True}
CPython implementation detail: This is a CPython-specific way of accessing options passed through
-X. Other implementations may export them through other
means, or not at all.
The sysconfig module provides access to Python’s configuration
information like the list of installation paths and the configuration variables
relevant for the current platform.
A Python distribution contains a Makefile and a pyconfig.h
header file that are necessary to build both the Python binary itself and
third-party C extensions compiled using distutils.
Python uses an installation scheme that differs depending on the platform and on
the installation options. These schemes are stored in sysconfig under
unique identifiers based on the value returned by os.name.
Every new component that is installed using distutils or a
Distutils-based system will follow the same scheme to copy its file in the right
places.
Python currently supports seven schemes:
posix_prefix: scheme for Posix platforms like Linux or Mac OS X. This is
the default scheme used when Python or a component is installed.
posix_home: scheme for Posix platforms used when a home option is used
upon installation. This scheme is used when a component is installed through
Distutils with a specific home prefix.
posix_user: scheme for Posix platforms used when a component is installed
through Distutils and the user option is used. This scheme defines paths
located under the user home directory.
nt: scheme for NT platforms like Windows.
nt_user: scheme for NT platforms, when the user option is used.
os2: scheme for OS/2 platforms.
os2_home: scheme for OS/2 patforms, when the user option is used.
Each scheme is itself composed of a series of paths and each path has a unique
identifier. Python currently uses eight paths:
stdlib: directory containing the standard Python library files that are not
platform-specific.
platstdlib: directory containing the standard Python library files that are
platform-specific.
platlib: directory for site-specific, platform-specific files.
purelib: directory for site-specific, non-platform-specific files.
include: directory for non-platform-specific header files.
platinclude: directory for platform-specific header files.
scripts: directory for script files.
data: directory for data files.
sysconfig provides some functions to determine these paths.
Return an installation path corresponding to the path name, from the
install scheme named scheme.
name has to be a value from the list returned by get_path_names().
sysconfig stores installation paths corresponding to each path name,
for each platform, with variables to be expanded. For instance the stdlib
path for the nt scheme is: {base}/Lib.
get_path() will use the variables returned by get_config_vars()
to expand the path. All variables have default values for each platform so
one may call this function and get the default value.
If scheme is provided, it must be a value from the list returned by
get_path_names(). Otherwise, the default scheme for the current
platform is used.
If vars is provided, it must be a dictionary of variables that will update
the dictionary return by get_config_vars().
If expand is set to False, the path will not be expanded using the
variables.
Return a string that identifies the current platform.
This is used mainly to distinguish platform-specific build directories and
platform-specific built distributions. Typically includes the OS name and
version and the architecture (as supplied by os.uname()), although the
exact information included depends on the OS; e.g. for IRIX the architecture
isn’t particularly important (IRIX only runs on SGI hardware), but for Linux
the kernel version isn’t particularly important.
Examples of returned values:
linux-i586
linux-alpha (?)
solaris-2.6-sun4u
irix-5.3
irix64-6.2
Windows will return one of:
win-amd64 (64bit Windows on AMD64 (aka x86_64, Intel64, EM64T, etc)
win-ia64 (64bit Windows on Itanium)
win32 (all others - specifically, sys.platform is returned)
Mac OS X can return:
macosx-10.6-ppc
macosx-10.4-ppc64
macosx-10.3-i386
macosx-10.4-fat
For other non-POSIX platforms, currently just returns sys.platform.
fp is a file-like object pointing to the config.h-like file.
A dictionary containing name/value pairs is returned. If an optional
dictionary is passed in as the second argument, it is used instead of a new
dictionary, and updated with the values read in the file.
This module provides direct access to all ‘built-in’ identifiers of Python; for
example, builtins.open is the full name for the built-in function
open(). See 内置函数 and Built-in Constants for
documentation.
This module is not normally accessed explicitly by most applications, but can be
useful in modules that provide objects with the same name as a built-in value,
but in which the built-in of that name is also needed. For example, in a module
that wants to implement an open() function that wraps the built-in
open(), this module can be used directly:
importbuiltinsdefopen(path):f=builtins.open(path,'r')returnUpperCaser(f)classUpperCaser:'''Wrapper around a file that converts output to upper-case.'''def__init__(self,f):self._f=fdefread(self,count=-1):returnself._f.read(count).upper()# ...
As an implementation detail, most modules have the name __builtins__ made
available as part of their globals. The value of __builtins__ is normally
either this module or the value of this modules’s __dict__ attribute.
Since this is an implementation detail, it may not be used by alternate
implementations of Python.
This module represents the (otherwise anonymous) scope in which the
interpreter’s main program executes — commands read either from standard
input, from a script file, or from an interactive prompt. It is this
environment in which the idiomatic “conditional script” stanza causes a script
to run:
Warning messages are typically issued in situations where it is useful to alert
the user of some condition in a program, where that condition (normally) doesn’t
warrant raising an exception and terminating the program. For example, one
might want to issue a warning when a program uses an obsolete module.
Python programmers issue warnings by calling the warn() function defined
in this module. (C programmers use PyErr_WarnEx(); see
Exception Handling for details).
Warning messages are normally written to sys.stderr, but their disposition
can be changed flexibly, from ignoring all warnings to turning them into
exceptions. The disposition of warnings can vary based on the warning category
(see below), the text of the warning message, and the source location where it
is issued. Repetitions of a particular warning for the same source location are
typically suppressed.
There are two stages in warning control: first, each time a warning is issued, a
determination is made whether a message should be issued or not; next, if a
message is to be issued, it is formatted and printed using a user-settable hook.
The determination whether to issue a warning message is controlled by the
warning filter, which is a sequence of matching rules and actions. Rules can be
added to the filter by calling filterwarnings() and reset to its default
state by calling resetwarnings().
The printing of warning messages is done by calling showwarning(), which
may be overridden; the default implementation of this function formats the
message by calling formatwarning(), which is also available for use by
custom implementations.
See also
logging.captureWarnings() allows you to handle all warnings with
the standard logging infrastructure.
There are a number of built-in exceptions that represent warning categories.
This categorization is useful to be able to filter out groups of warnings. The
following warnings category classes are currently defined:
Base category for warnings related to
resource usage.
While these are technically built-in exceptions, they are documented here,
because conceptually they belong to the warnings mechanism.
User code can define additional warning categories by subclassing one of the
standard warning categories. A warning category must always be a subclass of
the Warning class.
The warnings filter controls whether warnings are ignored, displayed, or turned
into errors (raising an exception).
Conceptually, the warnings filter maintains an ordered list of filter
specifications; any specific warning is matched against each filter
specification in the list in turn until a match is found; the match determines
the disposition of the match. Each entry is a tuple of the form (action,
message, category, module, lineno), where:
action is one of the following strings:
Value
Disposition
"error"
turn matching warnings into exceptions
"ignore"
never print matching warnings
"always"
always print matching warnings
"default"
print the first occurrence of matching
warnings for each location where the warning
is issued
"module"
print the first occurrence of matching
warnings for each module where the warning
is issued
"once"
print only the first occurrence of matching
warnings, regardless of location
message is a string containing a regular expression that the warning message
must match (the match is compiled to always be case-insensitive).
category is a class (a subclass of Warning) of which the warning
category must be a subclass in order to match.
module is a string containing a regular expression that the module name must
match (the match is compiled to be case-sensitive).
lineno is an integer that the line number where the warning occurred must
match, or 0 to match all line numbers.
Since the Warning class is derived from the built-in Exception
class, to turn a warning into an error we simply raise category(message).
The warnings filter is initialized by -W options passed to the Python
interpreter command line. The interpreter saves the arguments for all
-W options without interpretation in sys.warnoptions; the
warnings module parses these when it is first imported (invalid options
are ignored, after printing a message to sys.stderr).
BytesWarning is ignored unless the -b option is given once or
twice; in this case this warning is either printed (-b) or turned into an
exception (-bb).
ResourceWarning is ignored unless Python was built in debug mode.
If you are using code that you know will raise a warning, such as a deprecated
function, but do not want to see the warning, then it is possible to suppress
the warning using the catch_warnings context manager:
While within the context manager all warnings will simply be ignored. This
allows you to use known-deprecated code without having to see the warning while
not suppressing the warning for other code that might not be aware of its use
of deprecated code. Note: this can only be guaranteed in a single-threaded
application. If two or more threads use the catch_warnings context
manager at the same time, the behavior is undefined.
To test warnings raised by code, use the catch_warnings context
manager. With it you can temporarily mutate the warnings filter to facilitate
your testing. For instance, do the following to capture all raised warnings to
check:
importwarningsdeffxn():warnings.warn("deprecated",DeprecationWarning)withwarnings.catch_warnings(record=True)asw:# Cause all warnings to always be triggered.warnings.simplefilter("always")# Trigger a warning.fxn()# Verify some thingsassertlen(w)==1assertissubclass(w[-1].category,DeprecationWarning)assert"deprecated"instr(w[-1].message)
One can also cause all warnings to be exceptions by using error instead of
always. One thing to be aware of is that if a warning has already been
raised because of a once/default rule, then no matter what filters are
set the warning will not be seen again unless the warnings registry related to
the warning has been cleared.
Once the context manager exits, the warnings filter is restored to its state
when the context was entered. This prevents tests from changing the warnings
filter in unexpected ways between tests and leading to indeterminate test
results. The showwarning() function in the module is also restored to
its original value. Note: this can only be guaranteed in a single-threaded
application. If two or more threads use the catch_warnings context
manager at the same time, the behavior is undefined.
When testing multiple operations that raise the same kind of warning, it
is important to test them in a manner that confirms each operation is raising
a new warning (e.g. set warnings to be raised as exceptions and check the
operations raise exceptions, check that the length of the warning list
continues to increase after each operation, or else delete the previous
entries from the warnings list before each new operation).
Warnings that are only of interest to the developer are ignored by default. As
such you should make sure to test your code with typically ignored warnings
made visible. You can do this from the command-line by passing -Wd
to the interpreter (this is shorthand for -Wdefault). This enables
default handling for all warnings, including those that are ignored by default.
To change what action is taken for encountered warnings you simply change what
argument is passed to -W, e.g. -Werror. See the
-W flag for more details on what is possible.
To programmatically do the same as -Wd, use:
warnings.simplefilter('default')
Make sure to execute this code as soon as possible. This prevents the
registering of what warnings have been raised from unexpectedly influencing how
future warnings are treated.
Having certain warnings ignored by default is done to prevent a user from
seeing warnings that are only of interest to the developer. As you do not
necessarily have control over what interpreter a user uses to run their code,
it is possible that a new version of Python will be released between your
release cycles. The new interpreter release could trigger new warnings in your
code that were not there in an older interpreter, e.g.
DeprecationWarning for a module that you are using. While you as a
developer want to be notified that your code is using a deprecated module, to a
user this information is essentially noise and provides no benefit to them.
The unittest module has been also updated to use the 'default'
filter while running tests.
Issue a warning, or maybe ignore it or raise an exception. The category
argument, if given, must be a warning category class (see above); it defaults to
UserWarning. Alternatively message can be a Warning instance,
in which case category will be ignored and message.__class__ will be used.
In this case the message text will be str(message). This function raises an
exception if the particular warning issued is changed into an error by the
warnings filter see above. The stacklevel argument can be used by wrapper
functions written in Python, like this:
This makes the warning refer to deprecation()‘s caller, rather than to the
source of deprecation() itself (since the latter would defeat the purpose
of the warning message).
This is a low-level interface to the functionality of warn(), passing in
explicitly the message, category, filename and line number, and optionally the
module name and the registry (which should be the __warningregistry__
dictionary of the module). The module name defaults to the filename with
.py stripped; if no registry is passed, the warning is never suppressed.
message must be a string and category a subclass of Warning or
message may be a Warning instance, in which case category will be
ignored.
module_globals, if supplied, should be the global namespace in use by the code
for which the warning is issued. (This argument is used to support displaying
source for modules found in zipfiles or other non-filesystem import
sources).
Write a warning to a file. The default implementation calls
formatwarning(message,category,filename,lineno,line) and writes the
resulting string to file, which defaults to sys.stderr. You may replace
this function with an alternative implementation by assigning to
warnings.showwarning.
line is a line of source code to be included in the warning
message; if line is not supplied, showwarning() will
try to read the line specified by filename and lineno.
Format a warning the standard way. This returns a string which may contain
embedded newlines and ends in a newline. line is a line of source code to
be included in the warning message; if line is not supplied,
formatwarning() will try to read the line specified by filename and
lineno.
Insert an entry into the list of warnings filter specifications. The entry is inserted at the front by default; if
append is true, it is inserted at the end. This checks the types of the
arguments, compiles the message and module regular expressions, and
inserts them as a tuple in the list of warnings filters. Entries closer to
the front of the list override entries later in the list, if both match a
particular warning. Omitted arguments default to a value that matches
everything.
Insert a simple entry into the list of warnings filter specifications. The meaning of the function parameters is as for
filterwarnings(), but regular expressions are not needed as the filter
inserted always matches any message in any module as long as the category and
line number match.
Reset the warnings filter. This discards the effect of all previous calls to
filterwarnings(), including that of the -W command line options
and calls to simplefilter().
class warnings.catch_warnings(*, record=False, module=None)¶
A context manager that copies and, upon exit, restores the warnings filter
and the showwarning() function.
If the record argument is False (the default) the context manager
returns None on entry. If record is True, a list is
returned that is progressively populated with objects as seen by a custom
showwarning() function (which also suppresses output to sys.stdout).
Each object in the list has attributes with the same names as the arguments to
showwarning().
The module argument takes a module that will be used instead of the
module returned when you import warnings whose filter will be
protected. This argument exists primarily for testing the warnings
module itself.
Note
The catch_warnings manager works by replacing and
then later restoring the module’s
showwarning() function and internal list of filter
specifications. This means the context manager is modifying
global state and therefore is not thread-safe.
This function is a decorator that can be used to define a factory
function for with statement context managers, without needing to
create a class or separate __enter__() and __exit__() methods.
A simple example (this is not recommended as a real way of generating HTML!):
The function being decorated must return a generator-iterator when
called. This iterator must yield exactly one value, which will be bound to
the targets in the with statement’s as clause, if any.
At the point where the generator yields, the block nested in the with
statement is executed. The generator is then resumed after the block is exited.
If an unhandled exception occurs in the block, it is reraised inside the
generator at the point where the yield occurred. Thus, you can use a
try...except...finally statement to trap
the error (if any), or ensure that some cleanup takes place. If an exception is
trapped merely in order to log it or to perform some action (rather than to
suppress it entirely), the generator must reraise that exception. Otherwise the
generator context manager will indicate to the with statement that
the exception has been handled, and execution will resume with the statement
immediately following the with statement.
contextmanager() uses ContextDecorator so the context managers
it creates can be used as decorators as well as in with statements.
When used as a decorator, a new generator instance is implicitly created on
each function call (this allows the otherwise “one-shot” context managers
created by contextmanager() to meet the requirement that context
managers support multiple invocations in order to be used as decorators).
A base class that enables a context manager to also be used as a decorator.
Context managers inheriting from ContextDecorator have to implement
__enter__ and __exit__ as normal. __exit__ retains its optional
exception handling even when used as a decorator.
ContextDecorator is used by contextmanager(), so you get this
functionality automatically.
Example of ContextDecorator:
from contextlib import ContextDecorator
class mycontext(ContextDecorator):
def __enter__(self):
print('Starting')
return self
def __exit__(self, *exc):
print('Finishing')
return False
>>> @mycontext()
... def function():
... print('The bit in the middle')
...
>>> function()
Starting
The bit in the middle
Finishing
>>> with mycontext():
... print('The bit in the middle')
...
Starting
The bit in the middle
Finishing
This change is just syntactic sugar for any construct of the following form:
def f():
with cm():
# Do stuff
ContextDecorator lets you instead write:
@cm()
def f():
# Do stuff
It makes it clear that the cm applies to the whole function, rather than
just a piece of it (and saving an indentation level is nice, too).
Existing context managers that already have a base class can be extended by
using ContextDecorator as a mixin class:
As the decorated function must be able to be called multiple times, the
underlying context manager must support use in multiple with
statements. If this is not the case, then the original construct with the
explicit with statement inside the function should be used.
This module provides the infrastructure for defining abstract base
classes (ABCs) in Python, as outlined in PEP 3119; see the PEP for why this
was added to Python. (See also PEP 3141 and the numbers module
regarding a type hierarchy for numbers based on ABCs.)
The collections module has some concrete classes that derive from
ABCs; these can, of course, be further derived. In addition the
collections module has some ABCs that can be used to test whether
a class or instance provides a particular interface, for example, is it
hashable or a mapping.
Metaclass for defining Abstract Base Classes (ABCs).
Use this metaclass to create an ABC. An ABC can be subclassed directly, and
then acts as a mix-in class. You can also register unrelated concrete
classes (even built-in classes) and unrelated ABCs as “virtual subclasses” –
these and their descendants will be considered subclasses of the registering
ABC by the built-in issubclass() function, but the registering ABC
won’t show up in their MRO (Method Resolution Order) nor will method
implementations defined by the registering ABC be callable (not even via
super()). [1]
Classes created with a metaclass of ABCMeta have the following method:
Check whether subclass is considered a subclass of this ABC. This means
that you can customize the behavior of issubclass further without the
need to call register() on every class you want to consider a
subclass of the ABC. (This class method is called from the
__subclasscheck__() method of the ABC.)
This method should return True, False or NotImplemented. If
it returns True, the subclass is considered a subclass of this ABC.
If it returns False, the subclass is not considered a subclass of
this ABC, even if it would normally be one. If it returns
NotImplemented, the subclass check is continued with the usual
mechanism.
For a demonstration of these concepts, look at this example ABC definition:
class Foo:
def __getitem__(self, index):
...
def __len__(self):
...
def get_iterator(self):
return iter(self)
class MyIterable(metaclass=ABCMeta):
@abstractmethod
def __iter__(self):
while False:
yield None
def get_iterator(self):
return self.__iter__()
@classmethod
def __subclasshook__(cls, C):
if cls is MyIterable:
if any("__iter__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
MyIterable.register(Foo)
The ABC MyIterable defines the standard iterable method,
__iter__(), as an abstract method. The implementation given here can
still be called from subclasses. The get_iterator() method is also
part of the MyIterable abstract base class, but it does not have to be
overridden in non-abstract derived classes.
The __subclasshook__() class method defined here says that any class
that has an __iter__() method in its __dict__ (or in that of
one of its base classes, accessed via the __mro__ list) is
considered a MyIterable too.
Finally, the last line makes Foo a virtual subclass of MyIterable,
even though it does not define an __iter__() method (it uses the
old-style iterable protocol, defined in terms of __len__() and
__getitem__()). Note that this will not make get_iterator
available as a method of Foo, so it is provided separately.
Using this decorator requires that the class’s metaclass is ABCMeta or
is derived from it.
A class that has a metaclass derived from ABCMeta
cannot be instantiated unless all of its abstract methods and
properties are overridden.
The abstract methods can be called using any of the normal ‘super’ call
mechanisms.
Dynamically adding abstract methods to a class, or attempting to modify the
abstraction status of a method or class once it is created, are not
supported. The abstractmethod() only affects subclasses derived using
regular inheritance; “virtual subclasses” registered with the ABC’s
register() method are not affected.
Usage:
class C(metaclass=ABCMeta):
@abstractmethod
def my_abstract_method(self, ...):
...
Note
Unlike Java abstract methods, these abstract
methods may have an implementation. This implementation can be
called via the super() mechanism from the class that
overrides it. This could be useful as an end-point for a
super-call in a framework that uses cooperative
multiple-inheritance.
A subclass of the built-in property(), indicating an abstract property.
Using this function requires that the class’s metaclass is ABCMeta or
is derived from it.
A class that has a metaclass derived from ABCMeta cannot be
instantiated unless all of its abstract methods and properties are overridden.
The abstract properties can be called using any of the normal
‘super’ call mechanisms.
Usage:
class C(metaclass=ABCMeta):
@abstractproperty
def my_abstract_property(self):
...
This defines a read-only property; you can also define a read-write abstract
property using the ‘long’ form of property declaration:
class C(metaclass=ABCMeta):
def getx(self): ...
def setx(self, value): ...
x = abstractproperty(getx, setx)
The atexit module defines functions to register and unregister cleanup
functions. Functions thus registered are automatically executed upon normal
interpreter termination. The order in which the functions are called is not
defined; if you have cleanup operations that depend on each other, you should
wrap them in a function and register that one. This keeps atexit simple.
Note: the functions registered via this module are not called when the program
is killed by a signal not handled by Python, when a Python fatal internal error
is detected, or when os._exit() is called.
Register func as a function to be executed at termination. Any optional
arguments that are to be passed to func must be passed as arguments to
register().
At normal program termination (for instance, if sys.exit() is called or
the main module’s execution completes), all functions registered are called in
last in, first out order. The assumption is that lower level modules will
normally be imported before higher level modules and thus must be cleaned up
later.
If an exception is raised during execution of the exit handlers, a traceback is
printed (unless SystemExit is raised) and the exception information is
saved. After all exit handlers have had a chance to run the last exception to
be raised is re-raised.
This function returns func which makes it possible to use it as a decorator
without binding the original name to None.
Remove a function func from the list of functions to be run at interpreter-
shutdown. After calling unregister(), func is guaranteed not to be
called when the interpreter shuts down.
The following simple example demonstrates how a module can initialize a counter
from a file when it is imported and save the counter’s updated value
automatically when the program terminates without relying on the application
making an explicit call into this module at termination.
Positional and keyword arguments may also be passed to register() to be
passed along to the registered function when it is called:
defgoodbye(name,adjective):print('Goodbye, %s, it was %s to meet you.'%(name,adjective))importatexitatexit.register(goodbye,'Donny','nice')# or:atexit.register(goodbye,adjective='nice',name='Donny')
This module provides a standard interface to extract, format and print stack
traces of Python programs. It exactly mimics the behavior of the Python
interpreter when it prints a stack trace. This is useful when you want to print
stack traces under program control, such as in a “wrapper” around the
interpreter.
The module uses traceback objects — this is the object type that is stored in
the sys.last_traceback variable and returned as the third item from
sys.exc_info().
Print up to limit stack trace entries from traceback. If limit is omitted
or None, all entries are printed. If file is omitted or None, the
output goes to sys.stderr; otherwise it should be an open file or file-like
object to receive the output.
Print exception information and up to limit stack trace entries from
traceback to file. This differs from print_tb() in the following
ways:
if traceback is not None, it prints a header Traceback(mostrecentcalllast):
it prints the exception type and value after the stack trace
if type is SyntaxError and value has the appropriate format, it
prints the line where the syntax error occurred with a caret indicating the
approximate position of the error.
If chain is true (the default), then chained exceptions (the
__cause__ or __context__ attributes of the exception) will be
printed as well, like the interpreter itself does when printing an unhandled
exception.
This is a shorthand for print_exception(sys.last_type,sys.last_value,sys.last_traceback,limit,file). In general it will work only after
an exception has reached an interactive prompt (see sys.last_type).
This function prints a stack trace from its invocation point. The optional f
argument can be used to specify an alternate stack frame to start. The optional
limit and file arguments have the same meaning as for
print_exception().
Return a list of up to limit “pre-processed” stack trace entries extracted
from the traceback object traceback. It is useful for alternate formatting of
stack traces. If limit is omitted or None, all entries are extracted. A
“pre-processed” stack trace entry is a quadruple (filename, line number,
function name, text) representing the information that is usually printed
for a stack trace. The text is a string with leading and trailing whitespace
stripped; if the source is not available it is None.
Extract the raw traceback from the current stack frame. The return value has
the same format as for extract_tb(). The optional f and limit
arguments have the same meaning as for print_stack().
Given a list of tuples as returned by extract_tb() or
extract_stack(), return a list of strings ready for printing. Each string
in the resulting list corresponds to the item with the same index in the
argument list. Each string ends in a newline; the strings may contain internal
newlines as well, for those items whose source text line is not None.
Format the exception part of a traceback. The arguments are the exception type
and value such as given by sys.last_type and sys.last_value. The return
value is a list of strings, each ending in a newline. Normally, the list
contains a single string; however, for SyntaxError exceptions, it
contains several lines that (when printed) display detailed information about
where the syntax error occurred. The message indicating which exception
occurred is the always last string in the list.
Format a stack trace and the exception information. The arguments have the
same meaning as the corresponding arguments to print_exception(). The
return value is a list of strings, each ending in a newline and some containing
internal newlines. When these lines are concatenated and printed, exactly the
same text is printed as does print_exception().
This simple example implements a basic read-eval-print loop, similar to (but
less useful than) the standard Python interactive interpreter loop. For a more
complete implementation of the interpreter loop, refer to the code
module.
importsys,tracebackdefrun_user_code(envdir):source=input(">>> ")try:exec(source,envdir)except:print("Exception in user code:")print("-"*60)traceback.print_exc(file=sys.stdout)print("-"*60)envdir={}whileTrue:run_user_code(envdir)
The following example demonstrates the different ways to print and format the
exception and traceback:
importsys,tracebackdeflumberjack():bright_side_of_death()defbright_side_of_death():returntuple()[0]try:lumberjack()exceptIndexError:exc_type,exc_value,exc_traceback=sys.exc_info()print("*** print_tb:")traceback.print_tb(exc_traceback,limit=1,file=sys.stdout)print("*** print_exception:")traceback.print_exception(exc_type,exc_value,exc_traceback,limit=2,file=sys.stdout)print("*** print_exc:")traceback.print_exc()print("*** format_exc, first and last line:")formatted_lines=traceback.format_exc().splitlines()print(formatted_lines[0])print(formatted_lines[-1])print("*** format_exception:")print(repr(traceback.format_exception(exc_type,exc_value,exc_traceback)))print("*** extract_tb:")print(repr(traceback.extract_tb(exc_traceback)))print("*** format_tb:")print(repr(traceback.format_tb(exc_traceback)))print("*** tb_lineno:",exc_traceback.tb_lineno)
The output for the example would look similar to this:
*** print_tb:
File "<doctest...>", line 10, in <module>
lumberjack()
*** print_exception:
Traceback (most recent call last):
File "<doctest...>", line 10, in <module>
lumberjack()
File "<doctest...>", line 4, in lumberjack
bright_side_of_death()
IndexError: tuple index out of range
*** print_exc:
Traceback (most recent call last):
File "<doctest...>", line 10, in <module>
lumberjack()
File "<doctest...>", line 4, in lumberjack
bright_side_of_death()
IndexError: tuple index out of range
*** format_exc, first and last line:
Traceback (most recent call last):
IndexError: tuple index out of range
*** format_exception:
['Traceback (most recent call last):\n',
' File "<doctest...>", line 10, in <module>\n lumberjack()\n',
' File "<doctest...>", line 4, in lumberjack\n bright_side_of_death()\n',
' File "<doctest...>", line 7, in bright_side_of_death\n return tuple()[0]\n',
'IndexError: tuple index out of range\n']
*** extract_tb:
[('<doctest...>', 10, '<module>', 'lumberjack()'),
('<doctest...>', 4, 'lumberjack', 'bright_side_of_death()'),
('<doctest...>', 7, 'bright_side_of_death', 'return tuple()[0]')]
*** format_tb:
[' File "<doctest...>", line 10, in <module>\n lumberjack()\n',
' File "<doctest...>", line 4, in lumberjack\n bright_side_of_death()\n',
' File "<doctest...>", line 7, in bright_side_of_death\n return tuple()[0]\n']
*** tb_lineno: 10
The following example shows the different ways to print and format the stack:
>>> importtraceback>>> defanother_function():... lumberstack()...>>> deflumberstack():... traceback.print_stack()... print(repr(traceback.extract_stack()))... print(repr(traceback.format_stack()))...>>> another_function() File "<doctest>", line 10, in <module> another_function() File "<doctest>", line 3, in another_function lumberstack() File "<doctest>", line 6, in lumberstack traceback.print_stack()[('<doctest>', 10, '<module>', 'another_function()'), ('<doctest>', 3, 'another_function', 'lumberstack()'), ('<doctest>', 7, 'lumberstack', 'print(repr(traceback.extract_stack()))')][' File "<doctest>", line 10, in <module>\n another_function()\n', ' File "<doctest>", line 3, in another_function\n lumberstack()\n', ' File "<doctest>", line 8, in lumberstack\n print(repr(traceback.format_stack()))\n']
This last example demonstrates the final few formatting functions:
>>> importtraceback>>> traceback.format_list([('spam.py',3,'<module>','spam.eggs()'),... ('eggs.py',42,'eggs','return "bacon"')])[' File "spam.py", line 3, in <module>\n spam.eggs()\n', ' File "eggs.py", line 42, in eggs\n return "bacon"\n']>>> an_error=IndexError('tuple index out of range')>>> traceback.format_exception_only(type(an_error),an_error)['IndexError: tuple index out of range\n']
__future__ is a real module, and serves three purposes:
To avoid confusing existing tools that analyze import statements and expect to
find the modules they’re importing.
To ensure that future statements run under releases prior to
2.1 at least yield runtime exceptions (the import of __future__ will
fail, because there was no module of that name prior to 2.1).
To document when incompatible changes were introduced, and when they will be
— or were — made mandatory. This is a form of executable documentation, and
can be inspected programmatically via importing __future__ and examining
its contents.
where, normally, OptionalRelease is less than MandatoryRelease, and both are
5-tuples of the same form as sys.version_info:
(PY_MAJOR_VERSION,# the 2 in 2.1.0a3; an intPY_MINOR_VERSION,# the 1; an intPY_MICRO_VERSION,# the 0; an intPY_RELEASE_LEVEL,# "alpha", "beta", "candidate" or "final"; stringPY_RELEASE_SERIAL# the 3; an int)
OptionalRelease records the first release in which the feature was accepted.
In the case of a MandatoryRelease that has not yet occurred,
MandatoryRelease predicts the release in which the feature will become part of
the language.
Else MandatoryRelease records when the feature became part of the language; in
releases at or after that, modules no longer need a future statement to use the
feature in question, but may continue to use such imports.
MandatoryRelease may also be None, meaning that a planned feature got
dropped.
Instances of class _Feature have two corresponding methods,
getOptionalRelease() and getMandatoryRelease().
CompilerFlag is the (bitfield) flag that should be passed in the fourth
argument to the built-in function compile() to enable the feature in
dynamically compiled code. This flag is stored in the compiler_flag
attribute on _Feature instances.
No feature description will ever be deleted from __future__. Since its
introduction in Python 2.1 the following features have found their way into the
language using this mechanism:
This module provides an interface to the optional garbage collector. It
provides the ability to disable the collector, tune the collection frequency,
and set debugging options. It also provides access to unreachable objects that
the collector found but cannot free. Since the collector supplements the
reference counting already used in Python, you can disable the collector if you
are sure your program does not create reference cycles. Automatic collection
can be disabled by calling gc.disable(). To debug a leaking program call
gc.set_debug(gc.DEBUG_LEAK). Notice that this includes
gc.DEBUG_SAVEALL, causing garbage-collected objects to be saved in
gc.garbage for inspection.
With no arguments, run a full collection. The optional argument generation
may be an integer specifying which generation to collect (from 0 to 2). A
ValueError is raised if the generation number is invalid. The number of
unreachable objects found is returned.
The free lists maintained for a number of built-in types are cleared
whenever a full collection or collection of the highest generation (2)
is run. Not all items in some free lists may be freed due to the
particular implementation, in particular float.
Set the garbage collection debugging flags. Debugging information will be
written to sys.stderr. See below for a list of debugging flags which can be
combined using bit operations to control debugging.
Set the garbage collection thresholds (the collection frequency). Setting
threshold0 to zero disables collection.
The GC classifies objects into three generations depending on how many
collection sweeps they have survived. New objects are placed in the youngest
generation (generation 0). If an object survives a collection it is moved
into the next older generation. Since generation 2 is the oldest
generation, objects in that generation remain there after a collection. In
order to decide when to run, the collector keeps track of the number object
allocations and deallocations since the last collection. When the number of
allocations minus the number of deallocations exceeds threshold0, collection
starts. Initially only generation 0 is examined. If generation 0 has
been examined more than threshold1 times since generation 1 has been
examined, then generation 1 is examined as well. Similarly, threshold2
controls the number of collections of generation 1 before collecting
generation 2.
Return the list of objects that directly refer to any of objs. This function
will only locate those containers which support garbage collection; extension
types which do refer to other objects but do not support garbage collection will
not be found.
Note that objects which have already been dereferenced, but which live in cycles
and have not yet been collected by the garbage collector can be listed among the
resulting referrers. To get only currently live objects, call collect()
before calling get_referrers().
Care must be taken when using objects returned by get_referrers() because
some of them could still be under construction and hence in a temporarily
invalid state. Avoid using get_referrers() for any purpose other than
debugging.
Return a list of objects directly referred to by any of the arguments. The
referents returned are those objects visited by the arguments’ C-level
tp_traverse methods (if any), and may not be all objects actually
directly reachable. tp_traverse methods are supported only by objects
that support garbage collection, and are only required to visit objects that may
be involved in a cycle. So, for example, if an integer is directly reachable
from an argument, that integer object may or may not appear in the result list.
Returns True if the object is currently tracked by the garbage collector,
False otherwise. As a general rule, instances of atomic types aren’t
tracked and instances of non-atomic types (containers, user-defined
objects...) are. However, some type-specific optimizations can be present
in order to suppress the garbage collector footprint of simple instances
(e.g. dicts containing only atomic keys and values):
A list of objects which the collector found to be unreachable but could not be
freed (uncollectable objects). By default, this list contains only objects with
__del__() methods. Objects that have __del__() methods and are
part of a reference cycle cause the entire reference cycle to be uncollectable,
including objects not necessarily in the cycle but reachable only from it.
Python doesn’t collect such cycles automatically because, in general, it isn’t
possible for Python to guess a safe order in which to run the __del__()
methods. If you know a safe order, you can force the issue by examining the
garbage list, and explicitly breaking cycles due to your objects within the
list. Note that these objects are kept alive even so by virtue of being in the
garbage list, so they should be removed from garbage too. For example,
after breaking cycles, do delgc.garbage[:] to empty the list. It’s
generally better to avoid the issue by not creating cycles containing objects
with __del__() methods, and garbage can be examined in that case to
verify that no such cycles are being created.
If DEBUG_SAVEALL is set, then all unreachable objects will be added
to this list rather than freed.
Changed in version 3.2:
Changed in version 3.2: If this list is non-empty at interpreter shutdown, a
ResourceWarning is emitted, which is silent by default. If
DEBUG_UNCOLLECTABLE is set, in addition all uncollectable objects
are printed.
The following constants are provided for use with set_debug():
Print information of uncollectable objects found (objects which are not
reachable but cannot be freed by the collector). These objects will be added
to the garbage list.
Changed in version 3.2:
Changed in version 3.2: Also print the contents of the garbage list at interpreter
shutdown, if it isn’t empty.
The debugging flags necessary for the collector to print information about a
leaking program (equal to DEBUG_COLLECTABLE|DEBUG_UNCOLLECTABLE|DEBUG_SAVEALL).
The inspect module provides several useful functions to help get
information about live objects such as modules, classes, methods, functions,
tracebacks, frame objects, and code objects. For example, it can help you
examine the contents of a class, retrieve the source code of a method, extract
and format the argument list for a function, or get all the information you need
to display a detailed traceback.
There are four main kinds of services provided by this module: type checking,
getting source code, inspecting classes and functions, and examining the
interpreter stack.
The getmembers() function retrieves the members of an object such as a
class or module. The sixteen functions whose names begin with “is” are mainly
provided as convenient choices for the second argument to getmembers().
They also help you determine when you can expect to find the following special
attributes:
Type
Attribute
Description
module
__doc__
documentation string
__file__
filename (missing for
built-in modules)
class
__doc__
documentation string
__module__
name of module in which
this class was defined
method
__doc__
documentation string
__name__
name with which this
method was defined
__func__
function object
containing implementation
of method
Return all the members of an object in a list of (name, value) pairs sorted by
name. If the optional predicate argument is supplied, only members for which
the predicate returns a true value are included.
Note
getmembers() does not return metaclass attributes when the argument
is a class (this behavior is inherited from the dir() function).
Returns a named tupleModuleInfo(name,suffix,mode,module_type)
of values that describe how Python will interpret the file identified by
path if it is a module, or None if it would not be identified as a
module. In that tuple, name is the name of the module without the name of
any enclosing package, suffix is the trailing part of the file name (which
may not be a dot-delimited extension), mode is the open() mode that
would be used ('r' or 'rb'), and module_type is an integer giving
the type of the module. module_type will have a value which can be
compared to the constants defined in the imp module; see the
documentation for that module for more information on module types.
Return the name of the module named by the file path, without including the
names of enclosing packages. This uses the same algorithm as the interpreter
uses when searching for modules. If the name cannot be matched according to the
interpreter’s rules, None is returned.
This, for example, is true of int.__add__. An object passing this test
has a __get__ attribute but not a __set__ attribute, but
beyond that the set of attributes varies. __name__ is usually
sensible, and __doc__ often is.
Methods implemented via descriptors that also pass one of the other tests
return false from the ismethoddescriptor() test, simply because the
other tests promise more – you can, e.g., count on having the
__func__ attribute (etc) when an object passes ismethod().
Data descriptors have both a __get__ and a __set__ attribute.
Examples are properties (defined in Python), getsets, and members. The
latter two are defined in C and there are more specific tests available for
those types, which is robust across Python implementations. Typically, data
descriptors will also have __name__ and __doc__ attributes
(properties, getsets, and members have both of these attributes), but this is
not guaranteed.
CPython implementation detail: getsets are attributes defined in extension modules via
PyGetSetDef structures. For Python implementations without such
types, this method will always return False.
CPython implementation detail: Member descriptors are attributes defined in extension modules via
PyMemberDef structures. For Python implementations without such
types, this method will always return False.
Return in a single string any lines of comments immediately preceding the
object’s source code (for a class, function, or method), or at the top of the
Python source file (if the object is a module).
Return the name of the (text or binary) file in which an object was defined.
This will fail with a TypeError if the object is a built-in module,
class, or function.
Return the name of the Python source file in which an object was defined. This
will fail with a TypeError if the object is a built-in module, class, or
function.
Return a list of source lines and starting line number for an object. The
argument may be a module, class, method, function, traceback, frame, or code
object. The source code is returned as a list of the lines corresponding to the
object and the line number indicates where in the original source file the first
line of code was found. An IOError is raised if the source code cannot
be retrieved.
Return the text of the source code for an object. The argument may be a module,
class, method, function, traceback, frame, or code object. The source code is
returned as a single string. An IOError is raised if the source code
cannot be retrieved.
Clean up indentation from docstrings that are indented to line up with blocks
of code. Any whitespace that can be uniformly removed from the second line
onwards is removed. Also, all tabs are expanded to spaces.
Arrange the given list of classes into a hierarchy of nested lists. Where a
nested list appears, it contains classes derived from the class whose entry
immediately precedes the list. Each entry is a 2-tuple containing a class and a
tuple of its base classes. If the unique argument is true, exactly one entry
appears in the returned structure for each class in the given list. Otherwise,
classes using multiple inheritance and their descendants will appear multiple
times.
Get the names and default values of a Python function’s arguments. A
named tupleArgSpec(args,varargs,keywords,defaults) is
returned. args is a list of the argument names. varargs and keywords
are the names of the * and ** arguments or None. defaults is a
tuple of default argument values or None if there are no default arguments;
if this tuple has n elements, they correspond to the last n elements
listed in args.
Deprecated since version 3.0:
Deprecated since version 3.0: Use getfullargspec() instead, which provides information about
keyword-only arguments and annotations.
args is a list of the argument names. varargs and varkw are the names
of the * and ** arguments or None. defaults is an n-tuple of
the default values of the last n arguments. kwonlyargs is a list of
keyword-only argument names. kwonlydefaults is a dictionary mapping names
from kwonlyargs to defaults. annotations is a dictionary mapping argument
names to annotations.
The first four items in the tuple correspond to getargspec().
Get information about arguments passed into a particular frame. A
named tupleArgInfo(args,varargs,keywords,locals) is
returned. args is a list of the argument names. varargs and keywords
are the names of the * and ** arguments or None. locals is the
locals dictionary of the given frame.
Format a pretty argument spec from the four values returned by
getargspec(). The format* arguments are the corresponding optional
formatting functions that are called to turn names and values into strings.
Format a pretty argument spec from the four values returned by
getargvalues(). The format* arguments are the corresponding optional
formatting functions that are called to turn names and values into strings.
Return a tuple of class cls’s base classes, including cls, in method resolution
order. No class appears more than once in this tuple. Note that the method
resolution order depends on cls’s type. Unless a very peculiar user-defined
metatype is in use, cls will be the first element of the tuple.
Bind the args and kwds to the argument names of the Python function or
method func, as if it was called with them. For bound methods, bind also the
first argument (typically named self) to the associated instance. A dict
is returned, mapping the argument names (including the names of the * and
** arguments, if any) to their values from args and kwds. In case of
invoking func incorrectly, i.e. whenever func(*args,**kwds) would raise
an exception because of incompatible signature, an exception of the same type
and the same or similar message is raised. For example:
When the following functions return “frame records,” each record is a tuple of
six items: the frame object, the filename, the line number of the current line,
the function name, a list of lines of context from the source code, and the
index of the current line within that list.
Note
Keeping references to frame objects, as found in the first element of the frame
records these functions return, can cause your program to create reference
cycles. Once a reference cycle has been created, the lifespan of all objects
which can be accessed from the objects which form the cycle can become much
longer even if Python’s optional cycle detector is enabled. If such cycles must
be created, it is important to ensure they are explicitly broken to avoid the
delayed destruction of objects and increased memory consumption which occurs.
Though the cycle detector will catch these, destruction of the frames (and local
variables) can be made deterministic by removing the cycle in a
finally clause. This is also important if the cycle detector was
disabled when Python was compiled or using gc.disable(). For example:
def handle_stackframe_without_leak():
frame = inspect.currentframe()
try:
# do something with the frame
finally:
del frame
The optional context argument supported by most of these functions specifies
the number of lines of context to return, which are centered around the current
line.
Get a list of frame records for a frame and all outer frames. These frames
represent the calls that lead to the creation of frame. The first entry in the
returned list represents frame; the last entry represents the outermost call
on frame‘s stack.
Get a list of frame records for a traceback’s frame and all inner frames. These
frames represent calls made as a consequence of frame. The first entry in the
list represents traceback; the last entry represents where the exception was
raised.
Return the frame object for the caller’s stack frame.
CPython implementation detail: This function relies on Python stack frame support in the interpreter,
which isn’t guaranteed to exist in all implementations of Python. If
running in an implementation without Python stack frame support this
function returns None.
Return a list of frame records for the caller’s stack. The first entry in the
returned list represents the caller; the last entry represents the outermost
call on the stack.
Return a list of frame records for the stack between the current frame and the
frame in which an exception currently being handled was raised in. The first
entry in the list represents the caller; the last entry represents where the
exception was raised.
Both getattr() and hasattr() can trigger code execution when
fetching or checking for the existence of attributes. Descriptors, like
properties, will be invoked and __getattr__() and __getattribute__()
may be called.
For cases where you want passive introspection, like documentation tools, this
can be inconvenient. getattr_static has the same signature as getattr()
but avoids executing code when it fetches attributes.
Retrieve attributes without triggering dynamic lookup via the
descriptor protocol, __getattr__ or __getattribute__.
Note: this function may not be able to retrieve all attributes
that getattr can fetch (like dynamically created attributes)
and may find attributes that getattr can’t (like descriptors
that raise AttributeError). It can also return descriptors objects
instead of instance members.
If the instance __dict__ is shadowed by another member (for example a
property) then this function will be unable to find instance members.
New in version 3.2:
New in version 3.2.
getattr_static does not resolve descriptors, for example slot descriptors or
getset descriptors on objects implemented in C. The descriptor object
is returned instead of the underlying attribute.
You can handle these with code like the following. Note that
for arbitrary getset descriptors invoking these may trigger
code execution:
# example code for resolving the builtin descriptor typesclass_foo:__slots__=['foo']slot_descriptor=type(_foo.foo)getset_descriptor=type(type(open(__file__)).name)wrapper_descriptor=type(str.__dict__['__add__'])descriptor_types=(slot_descriptor,getset_descriptor,wrapper_descriptor)result=getattr_static(some_object,'foo')iftype(result)indescriptor_types:try:result=result.__get__()exceptAttributeError:# descriptors can raise AttributeError to# indicate there is no underlying value# in which case the descriptor itself will# have to dopass
When implementing coroutine schedulers and for other advanced uses of
generators, it is useful to determine whether a generator is currently
executing, is waiting to start or resume or execution, or has already
terminated. getgeneratorstate() allows the current state of a
generator to be determined easily.
This module is automatically imported during initialization. The automatic
import can be suppressed using the interpreter’s -S option.
Importing this module will append site-specific paths to the module search path
and add a few builtins.
It starts by constructing up to four directories from a head and a tail part.
For the head part, it uses sys.prefix and sys.exec_prefix; empty heads
are skipped. For the tail part, it uses the empty string and then
lib/site-packages (on Windows) or
lib/python|version|/site-packages and then lib/site-python (on
Unix and Macintosh). For each of the distinct head-tail combinations, it sees
if it refers to an existing directory, and if so, adds it to sys.path and
also inspects the newly added path for configuration files.
A path configuration file is a file whose name has the form name.pth
and exists in one of the four directories mentioned above; its contents are
additional items (one per line) to be added to sys.path. Non-existing items
are never added to sys.path, and no check is made that the item refers to a
directory rather than a file. No item is added to sys.path more than
once. Blank lines and lines beginning with # are skipped. Lines starting
with import (followed by space or tab) are executed.
For example, suppose sys.prefix and sys.exec_prefix are set to
/usr/local. The Python X.Y library is then installed in
/usr/local/lib/pythonX.Y. Suppose this has
a subdirectory /usr/local/lib/pythonX.Y/site-packages with three
subsubdirectories, foo, bar and spam, and two path
configuration files, foo.pth and bar.pth. Assume
foo.pth contains the following:
# foo package configuration
foo
bar
bletch
and bar.pth contains:
# bar package configuration
bar
Then the following version-specific directories are added to
sys.path, in this order:
Note that bletch is omitted because it doesn’t exist; the bar
directory precedes the foo directory because bar.pth comes
alphabetically before foo.pth; and spam is omitted because it is
not mentioned in either path configuration file.
After these path manipulations, an attempt is made to import a module named
sitecustomize, which can perform arbitrary site-specific customizations.
It is typically created by a system administrator in the site-packages
directory. If this import fails with an ImportError exception, it is
silently ignored.
After this, an attempt is made to import a module named usercustomize,
which can perform arbitrary user-specific customizations, if
ENABLE_USER_SITE is true. This file is intended to be created in the
user site-packages directory (see below), which is part of sys.path unless
disabled by -s. An ImportError will be silently ignored.
Note that for some non-Unix systems, sys.prefix and sys.exec_prefix are
empty, and the path manipulations are skipped; however the import of
sitecustomize and usercustomize is still attempted.
Flag showing the status of the user site-packages directory. True means
that it is enabled and was added to sys.path. False means that it
was disabled by user request (with -s or
PYTHONNOUSERSITE). None means it was disabled for security
reasons (mismatch between user or group id and effective id) or by an
administrator.
Path to the user site-packages for the running Python. Can be None if
getusersitepackages() hasn’t been called yet. Default value is
~/.local/lib/pythonX.Y/site-packages for UNIX and non-framework Mac
OS X builds, ~/Library/Python/X.Y/lib/python/site-packages for Mac
framework builds, and %APPDATA%\Python\PythonXY\site-packages
on Windows. This directory is a site directory, which means that
.pth files in it will be processed.
Path to the base directory for the user site-packages. Can be None if
getuserbase() hasn’t been called yet. Default value is
~/.local for UNIX and Mac OS X non-framework builds,
~/Library/Python/X.Y for Mac framework builds, and
%APPDATA%\Python for Windows. This value is used by Distutils to
compute the installation directories for scripts, data files, Python modules,
etc. for the user installation scheme. See
also PYTHONUSERBASE.
Return the path of the user-specific site-packages directory,
USER_SITE. If it is not initialized yet, this function will also set
it, respecting PYTHONNOUSERSITE and USER_BASE.
New in version 3.2:
New in version 3.2.
The site module also provides a way to get the user directories from the
command line:
$ python3 -m site --user-site
/home/user/.local/lib/python3.3/site-packages
If it is called without arguments, it will print the contents of
sys.path on the standard output, followed by the value of
USER_BASE and whether the directory exists, then the same thing for
USER_SITE, and finally the value of ENABLE_USER_SITE.
Print the path to the user site-packages directory.
If both options are given, user base and user site will be printed (always in
this order), separated by os.pathsep.
If any option is given, the script will exit with one of these values: O if
the user site-packages directory is enabled, 1 if it was disabled by the
user, 2 if it is disabled for security reasons or by an administrator, and a
value greater than 2 if there is an error.
The fpectl module is not built by default, and its usage is discouraged
and may be dangerous except in the hands of experts. See also the section
Limitations and other considerations on limitations for more details.
Most computers carry out floating point operations in conformance with the
so-called IEEE-754 standard. On any real computer, some floating point
operations produce results that cannot be expressed as a normal floating point
value. For example, try
>>> import math
>>> math.exp(1000)
inf
>>> math.exp(1000) / math.exp(1000)
nan
(The example above will work on many platforms. DEC Alpha may be one exception.)
“Inf” is a special, non-numeric value in IEEE-754 that stands for “infinity”,
and “nan” means “not a number.” Note that, other than the non-numeric results,
nothing special happened when you asked Python to carry out those calculations.
That is in fact the default behaviour prescribed in the IEEE-754 standard, and
if it works for you, stop reading now.
In some circumstances, it would be better to raise an exception and stop
processing at the point where the faulty operation was attempted. The
fpectl module is for use in that situation. It provides control over
floating point units from several hardware manufacturers, allowing the user to
turn on the generation of SIGFPE whenever any of the IEEE-754
exceptions Division by Zero, Overflow, or Invalid Operation occurs. In tandem
with a pair of wrapper macros that are inserted into the C code comprising your
python system, SIGFPE is trapped and converted into the Python
FloatingPointError exception.
The fpectl module defines the following functions and may raise the given
exception:
After turnon_sigfpe() has been executed, a floating point operation that
raises one of the IEEE-754 exceptions Division by Zero, Overflow, or Invalid
operation will in turn raise this standard Python exception.
The following example demonstrates how to start up and test operation of the
fpectl module.
>>> import fpectl
>>> import fpetest
>>> fpectl.turnon_sigfpe()
>>> fpetest.test()
overflow PASS
FloatingPointError: Overflow
div by 0 PASS
FloatingPointError: Division by zero
[ more output from test elided ]
>>> import math
>>> math.exp(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
FloatingPointError: in math_1
Setting up a given processor to trap IEEE-754 floating point errors currently
requires custom code on a per-architecture basis. You may have to modify
fpectl to control your particular hardware.
Conversion of an IEEE-754 exception to a Python exception requires that the
wrapper macros PyFPE_START_PROTECT and PyFPE_END_PROTECT be inserted
into your code in an appropriate fashion. Python itself has been modified to
support the fpectl module, but many other codes of interest to numerical
analysts have not.
Some files in the source distribution may be interesting in learning more about
how this module operates. The include file Include/pyfpe.h discusses the
implementation of this module at some length. Modules/fpetestmodule.c
gives several examples of use. Many additional examples can be found in
Objects/floatobject.c.
distutils — Building and installing Python modules¶
The distutils package provides support for building and installing
additional modules into a Python installation. The new modules may be either
100%-pure Python, or may be extension modules written in C, or may be
collections of Python packages which include modules coded in both Python and C.
This package is discussed in two separate chapters:
The manual for developers and packagers of Python modules. This describes
how to prepare distutils-based packages so that they may be
easily installed into an existing Python installation.
An “administrators” manual which includes information on installing
modules into an existing Python installation. You do not need to be a
Python programmer to read this manual.
The modules described in this chapter allow writing interfaces similar to
Python’s interactive interpreter. If you want a Python interpreter that
supports some special feature in addition to the Python language, you should
look at the code module. (The codeop module is lower-level, used
to support compiling a possibly-incomplete chunk of Python code.)
The full list of modules described in this chapter is:
The code module provides facilities to implement read-eval-print loops in
Python. Two classes and convenience functions are included which can be used to
build applications which provide an interactive interpreter prompt.
This class deals with parsing and interpreter state (the user’s namespace); it
does not deal with input buffering or prompting or input file naming (the
filename is always passed in explicitly). The optional locals argument
specifies the dictionary in which code will be executed; it defaults to a newly
created dictionary with key '__name__' set to '__console__' and key
'__doc__' set to None.
class code.InteractiveConsole(locals=None, filename="<console>")¶
Closely emulate the behavior of the interactive Python interpreter. This class
builds on InteractiveInterpreter and adds prompting using the familiar
sys.ps1 and sys.ps2, and input buffering.
Convenience function to run a read-eval-print loop. This creates a new instance
of InteractiveConsole and sets readfunc to be used as the
raw_input() method, if provided. If local is provided, it is passed to
the InteractiveConsole constructor for use as the default namespace for
the interpreter loop. The interact() method of the instance is then run
with banner passed as the banner to use, if provided. The console object is
discarded after use.
This function is useful for programs that want to emulate Python’s interpreter
main loop (a.k.a. the read-eval-print loop). The tricky part is to determine
when the user has entered an incomplete command that can be completed by
entering more text (as opposed to a complete command or a syntax error). This
function almost always makes the same decision as the real interpreter main
loop.
source is the source string; filename is the optional filename from which
source was read, defaulting to '<input>'; and symbol is the optional
grammar start symbol, which should be either 'single' (the default) or
'eval'.
Returns a code object (the same as compile(source,filename,symbol)) if the
command is complete and valid; None if the command is incomplete; raises
SyntaxError if the command is complete and contains a syntax error, or
raises OverflowError or ValueError if the command contains an
invalid literal.
Compile and run some source in the interpreter. Arguments are the same as for
compile_command(); the default for filename is '<input>', and for
symbol is 'single'. One several things can happen:
The input is complete; compile_command() returned a code object. The
code is executed by calling the runcode() (which also handles run-time
exceptions, except for SystemExit). runsource() returns False.
The return value can be used to decide whether to use sys.ps1 or sys.ps2
to prompt the next line.
Execute a code object. When an exception occurs, showtraceback() is called
to display a traceback. All exceptions are caught except SystemExit,
which is allowed to propagate.
A note about KeyboardInterrupt: this exception may occur elsewhere in
this code, and may not always be caught. The caller should be prepared to deal
with it.
Display the syntax error that just occurred. This does not display a stack
trace because there isn’t one for syntax errors. If filename is given, it is
stuffed into the exception instead of the default filename provided by Python’s
parser, because it always uses '<string>' when reading from a string. The
output is written by the write() method.
Display the exception that just occurred. We remove the first stack item
because it is within the interpreter object implementation. The output is
written by the write() method.
The InteractiveConsole class is a subclass of
InteractiveInterpreter, and so offers all the methods of the
interpreter objects as well as the following additions.
Closely emulate the interactive Python console. The optional banner argument
specify the banner to print before the first interaction; by default it prints a
banner similar to the one printed by the standard Python interpreter, followed
by the class name of the console object in parentheses (so as not to confuse
this with the real interpreter – since it’s so close!).
Push a line of source text to the interpreter. The line should not have a
trailing newline; it may have internal newlines. The line is appended to a
buffer and the interpreter’s runsource() method is called with the
concatenated contents of the buffer as source. If this indicates that the
command was executed or invalid, the buffer is reset; otherwise, the command is
incomplete, and the buffer is left as it was after the line was appended. The
return value is True if more input is required, False if the line was
dealt with in some way (this is the same as runsource()).
Write a prompt and read a line. The returned line does not include the trailing
newline. When the user enters the EOF key sequence, EOFError is raised.
The base implementation reads from sys.stdin; a subclass may replace this
with a different implementation.
The codeop module provides utilities upon which the Python
read-eval-print loop can be emulated, as is done in the code module. As
a result, you probably don’t want to use the module directly; if you want to
include such a loop in your program you probably want to use the code
module instead.
There are two parts to this job:
Being able to tell if a line of input completes a Python statement: in
short, telling whether to print ‘>>>‘ or ‘...‘ next.
Remembering which future statements the user has entered, so subsequent
input can be compiled with these in effect.
The codeop module provides a way of doing each of these things, and a way
of doing them both.
Tries to compile source, which should be a string of Python code and return a
code object if source is valid Python code. In that case, the filename
attribute of the code object will be filename, which defaults to
'<input>'. Returns None if source is not valid Python code, but is a
prefix of valid Python code.
If there is a problem with source, an exception will be raised.
SyntaxError is raised if there is invalid Python syntax, and
OverflowError or ValueError if there is an invalid literal.
The symbol argument determines whether source is compiled as a statement
('single', the default) or as an expression ('eval'). Any
other value will cause ValueError to be raised.
Note
It is possible (but not likely) that the parser stops parsing with a
successful outcome before reaching the end of the source; in this case,
trailing symbols may be ignored instead of causing an error. For example,
a backslash followed by two newlines may be followed by arbitrary garbage.
This will be fixed once the API for the parser is better.
Instances of this class have __call__() methods identical in signature to
the built-in function compile(), but with the difference that if the
instance compiles program text containing a __future__ statement, the
instance ‘remembers’ and compiles all subsequent program texts with the
statement in force.
Instances of this class have __call__() methods identical in signature to
compile_command(); the difference is that if the instance compiles program
text containing a __future__ statement, the instance ‘remembers’ and
compiles all subsequent program texts with the statement in force.
Return a list of 3-element tuples, each describing a particular type of
module. Each triple has the form (suffix,mode,type), where suffix is
a string to be appended to the module name to form the filename to search
for, mode is the mode string to pass to the built-in open() function
to open the file (this can be 'r' for text files or 'rb' for binary
files), and type is the file type, which has one of the values
PY_SOURCE, PY_COMPILED, or C_EXTENSION, described
below.
Try to find the module name. If path is omitted or None, the list of
directory names given by sys.path is searched, but first a few special
places are searched: the function tries to find a built-in module with the
given name (C_BUILTIN), then a frozen module (PY_FROZEN),
and on some systems some other places are looked in as well (on Windows, it
looks in the registry which may point to a specific file).
Otherwise, path must be a list of directory names; each directory is
searched for files with any of the suffixes returned by get_suffixes()
above. Invalid names in the list are silently ignored (but all list items
must be strings).
If search is successful, the return value is a 3-element tuple (file,pathname,description):
file is an open file object positioned at the beginning, pathname
is the pathname of the file found, and description is a 3-element tuple as
contained in the list returned by get_suffixes() describing the kind of
module found.
If the module does not live in a file, the returned file is None,
pathname is the empty string, and the description tuple contains empty
strings for its suffix and mode; the module type is indicated as given in
parentheses above. If the search is unsuccessful, ImportError is
raised. Other exceptions indicate problems with the arguments or
environment.
If the module is a package, file is None, pathname is the package
path and the last item in the description tuple is PKG_DIRECTORY.
This function does not handle hierarchical module names (names containing
dots). In order to find P.*M*, that is, submodule M of package P, use
find_module() and load_module() to find and load package P, and
then use find_module() with the path argument set to P.__path__.
When P itself has a dotted name, apply this recipe recursively.
Load a module that was previously found by find_module() (or by an
otherwise conducted search yielding compatible results). This function does
more than importing the module: if the module was already imported, it will
reload the module! The name argument indicates the full
module name (including the package name, if this is a submodule of a
package). The file argument is an open file, and pathname is the
corresponding file name; these can be None and '', respectively, when
the module is a package or not being loaded from a file. The description
argument is a tuple, as would be returned by get_suffixes(), describing
what kind of module must be loaded.
If the load is successful, the return value is the module object; otherwise,
an exception (usually ImportError) is raised.
Important: the caller is responsible for closing the file argument, if
it was not None, even when an exception is raised. This is best done
using a try ... finally statement.
Return True if the import lock is currently held, else False. On
platforms without threads, always return False.
On platforms with threads, a thread executing an import holds an internal lock
until the import is complete. This lock blocks other threads from doing an
import until the original import completes, which in turn prevents other threads
from seeing incomplete module objects constructed by the original thread while
in the process of completing its import (and the imports, if any, triggered by
that).
Acquire the interpreter’s import lock for the current thread. This lock should
be used by import hooks to ensure thread-safety when importing modules.
Once a thread has acquired the import lock, the same thread may acquire it
again without blocking; the thread must release it once for each time it has
acquired it.
On platforms without threads, this function does nothing.
Reload a previously imported module. The argument must be a module object, so
it must have been successfully imported before. This is useful if you have
edited the module source file using an external editor and want to try out the
new version without leaving the Python interpreter. The return value is the
module object (the same as the module argument).
When reload(module) is executed:
Python modules’ code is recompiled and the module-level code reexecuted,
defining a new set of objects which are bound to names in the module’s
dictionary. The init function of extension modules is not called a second
time.
As with all other objects in Python the old objects are only reclaimed after
their reference counts drop to zero.
The names in the module namespace are updated to point to any new or changed
objects.
Other references to the old objects (such as names external to the module) are
not rebound to refer to the new objects and must be updated in each namespace
where they occur if that is desired.
There are a number of other caveats:
If a module is syntactically correct but its initialization fails, the first
import statement for it does not bind its name locally, but does
store a (partially initialized) module object in sys.modules. To reload the
module you must first import it again (this will bind the name to the
partially initialized module object) before you can reload() it.
When a module is reloaded, its dictionary (containing the module’s global
variables) is retained. Redefinitions of names will override the old
definitions, so this is generally not a problem. If the new version of a module
does not define a name that was defined by the old version, the old definition
remains. This feature can be used to the module’s advantage if it maintains a
global table or cache of objects — with a try statement it can test
for the table’s presence and skip its initialization if desired:
try:
cache
except NameError:
cache = {}
It is legal though generally not very useful to reload built-in or dynamically
loaded modules, except for sys, __main__ and __builtin__.
In many cases, however, extension modules are not designed to be initialized
more than once, and may fail in arbitrary ways when reloaded.
If a module imports objects from another module using from ...
import ..., calling reload() for the other module does not
redefine the objects imported from it — one way around this is to re-execute
the from statement, another is to use import and qualified
names (module.*name*) instead.
If a module instantiates instances of a class, reloading the module that defines
the class does not affect the method definitions of the instances — they
continue to use the old class definition. The same is true for derived classes.
The following functions are conveniences for handling PEP 3147 byte-compiled
file paths.
Return the PEP 3147 path to the byte-compiled file associated with the
source path. For example, if path is /foo/bar/baz.py the return
value would be /foo/bar/__pycache__/baz.cpython-32.pyc for Python 3.2.
The cpython-32 string comes from the current magic tag (see
get_tag()). The returned path will end in .pyc when
__debug__ is True or .pyo for an optimized Python
(i.e. __debug__ is False). By passing in True or False for
debug_override you can override the system’s value for __debug__ for
extension selection.
Given the path to a PEP 3147 file name, return the associated source code
file path. For example, if path is
/foo/bar/__pycache__/baz.cpython-32.pyc the returned path would be
/foo/bar/baz.py. path need not exist, however if it does not conform
to PEP 3147 format, a ValueError is raised.
The NullImporter type is a PEP 302 import hook that handles
non-directory path strings by failing to find any modules. Calling this type
with an existing directory or empty string raises ImportError.
Otherwise, a NullImporter instance is returned.
Python adds instances of this type to sys.path_importer_cache for any path
entries that are not directories and are not handled by any other path hooks on
sys.path_hooks. Instances have only one method:
The following function emulates what was the standard import statement up to
Python 1.4 (no hierarchical module names). (This implementation wouldn’t work
in that version, since find_module() has been extended and
load_module() has been added in 1.4.)
import imp
import sys
def __import__(name, globals=None, locals=None, fromlist=None):
# Fast path: see if the module has already been imported.
try:
return sys.modules[name]
except KeyError:
pass
# If any of the following calls raises an exception,
# there's a problem we can't handle -- let the caller handle it.
fp, pathname, description = imp.find_module(name)
try:
return imp.load_module(name, fp, pathname, description)
finally:
# Since we may exit via an exception, close fp explicitly.
if fp:
fp.close()
This module adds the ability to import Python modules (*.py,
*.py[co]) and packages from ZIP-format archives. It is usually not
needed to use the zipimport module explicitly; it is automatically used
by the built-in import mechanism for sys.path items that are paths
to ZIP archives.
Typically, sys.path is a list of directory names as strings. This module
also allows an item of sys.path to be a string naming a ZIP file archive.
The ZIP archive can contain a subdirectory structure to support package imports,
and a path within the archive can be specified to only import from a
subdirectory. For example, the path /tmp/example.zip/lib/ would only
import from the lib/ subdirectory within the archive.
Any files may be present in the ZIP archive, but only files .py and
.py[co] are available for import. ZIP import of dynamic modules
(.pyd, .so) is disallowed. Note that if an archive only contains
.py files, Python will not attempt to modify the archive by adding the
corresponding .pyc or .pyo file, meaning that if a ZIP archive
doesn’t contain .pyc files, importing may be rather slow.
ZIP archives with an archive comment are currently not supported.
Written by James C. Ahlstrom, who also provided an implementation. Python 2.3
follows the specification in PEP 273, but uses an implementation written by Just
van Rossum that uses the import hooks described in PEP 302.
Create a new zipimporter instance. archivepath must be a path to a ZIP
file, or to a specific path within a ZIP file. For example, an archivepath
of foo/bar.zip/lib will look for modules in the lib directory
inside the ZIP file foo/bar.zip (provided that it exists).
ZipImportError is raised if archivepath doesn’t point to a valid ZIP
archive.
Search for a module specified by fullname. fullname must be the fully
qualified (dotted) module name. It returns the zipimporter instance itself
if the module was found, or None if it wasn’t. The optional
path argument is ignored—it’s there for compatibility with the
importer protocol.
Return the source code for the specified module. Raise
ZipImportError if the module couldn’t be found, return
None if the archive does contain the module, but has no source
for it.
Load the module specified by fullname. fullname must be the fully
qualified (dotted) module name. It returns the imported module, or raises
ZipImportError if it wasn’t found.
Extend the search path for the modules which comprise a package. Intended
use is to place the following code in a package’s __init__.py:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
This will add to the package’s __path__ all subdirectories of directories
on sys.path named after the package. This is useful if one wants to
distribute different parts of a single logical package as multiple
directories.
It also looks for *.pkg files beginning where * matches the
name argument. This feature is similar to *.pth files (see the
site module for more information), except that it doesn’t special-case
lines starting with import. A *.pkg file is trusted at face
value: apart from checking for duplicates, all entries found in a
*.pkg file are added to the path, regardless of whether they exist
on the filesystem. (This is a feature.)
If the input path is not a list (as is the case for frozen packages) it is
returned unchanged. The input path is not modified; an extended copy is
returned. Items are only appended to the copy at the end.
It is assumed that sys.path is a sequence. Items of sys.path
that are not strings referring to existing directories are ignored. Unicode
items on sys.path that cause errors when used as filenames may cause
this function to raise an exception (in line with os.path.isdir()
behavior).
PEP 302 Importer that wraps Python’s “classic” import algorithm.
If dirname is a string, a PEP 302 importer is created that searches that
directory. If dirname is None, a PEP 302 importer is created that
searches the current sys.path, plus any modules that are frozen or
built-in.
If fullname contains dots, path must be the containing package’s
__path__. Returns None if the module cannot be found or imported.
This function uses iter_importers(), and is thus subject to the same
limitations regarding platform-specific special import locations such as the
Windows registry.
Retrieve a PEP 302 importer for the given path_item.
The returned importer is cached in sys.path_importer_cache if it was
newly created by a path hook.
If there is no importer, a wrapper around the basic import machinery is
returned. This wrapper is never inserted into the importer cache (None
is inserted instead).
The cache (or part of it) can be cleared manually if a rescan of
sys.path_hooks is necessary.
If the module or package is accessible via the normal import mechanism, a
wrapper around the relevant part of that machinery is returned. Returns
None if the module cannot be found or imported. If the named module is
not already imported, its containing package (if any) is imported, in order
to establish the package __path__.
This function uses iter_importers(), and is thus subject to the same
limitations regarding platform-specific special import locations such as the
Windows registry.
Yield PEP 302 importers for the given module name.
If fullname contains a ‘.’, the importers will be for the package containing
fullname, otherwise they will be importers for sys.meta_path,
sys.path, and Python’s “classic” import machinery, in that order. If
the named module is in a package, that package is imported as a side effect
of invoking this function.
Non-PEP 302 mechanisms (e.g. the Windows registry) used by the standard
import machinery to find files in alternative locations are partially
supported, but are searched aftersys.path. Normally, these
locations are searched beforesys.path, preventing sys.path
entries from shadowing them.
For this to cause a visible difference in behaviour, there must be a module
or package name that is accessible via both sys.path and one of the
non-PEP 302 file system mechanisms. In this case, the emulation will find
the former version, while the builtin import mechanism will find the latter.
Items of the following types can be affected by this discrepancy:
imp.C_EXTENSION, imp.PY_SOURCE, imp.PY_COMPILED,
imp.PKG_DIRECTORY.
Yields (module_loader,name,ispkg) for all modules recursively on
path, or, if path is None, all accessible modules.
path should be either None or a list of paths to look for modules in.
prefix is a string to output on the front of every module name on output.
Note that this function must import all packages (not all modules!) on
the given path, in order to access the __path__ attribute to find
submodules.
onerror is a function which gets called with one argument (the name of the
package which was being imported) if any exception occurs while trying to
import a package. If no onerror function is supplied, ImportErrors
are caught and ignored, while all other exceptions are propagated,
terminating the search.
Examples:
# list all modules python can access
walk_packages()
# list all submodules of ctypes
walk_packages(ctypes.__path__, ctypes.__name__ + '.')
This is a wrapper for the PEP 302 loader get_data() API. The
package argument should be the name of a package, in standard module format
(foo.bar). The resource argument should be in the form of a relative
filename, using / as the path separator. The parent directory name
.. is not allowed, and nor is a rooted name (starting with a /).
The function returns a binary string that is the contents of the specified
resource.
For packages located in the filesystem, which have already been imported,
this is the rough equivalent of:
d = os.path.dirname(sys.modules[package].__file__)
data = open(os.path.join(d, resource), 'rb').read()
If the package cannot be located or loaded, or it uses a PEP 302 loader
which does not support get_data(), then None is returned.
This module provides a ModuleFinder class that can be used to determine
the set of modules imported by a script. modulefinder.py can also be run as
a script, giving the filename of a Python script as its argument, after which a
report of the imported modules will be printed.
Allows specifying that the module named oldname is in fact the package named
newname.
class modulefinder.ModuleFinder(path=None, debug=0, excludes=[], replace_paths=[])¶
This class provides run_script() and report() methods to determine
the set of modules imported by a script. path can be a list of directories to
search for modules; if not specified, sys.path is used. debug sets the
debugging level; higher values make the class print debugging messages about
what it’s doing. excludes is a list of module names to exclude from the
analysis. replace_paths is a list of (oldpath,newpath) tuples that will
be replaced in module paths.
Print a report to standard output that lists the modules imported by the
script and their paths, as well as modules that are missing or seem to be
missing.
The script that will output the report of bacon.py:
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('bacon.py')
print('Loaded modules:')
for name, mod in finder.modules.items():
print('%s: ' % name, end='')
print(','.join(list(mod.globalnames.keys())[:3]))
print('-'*50)
print('Modules not imported:')
print('\n'.join(finder.badmodules.keys()))
Sample output (may vary depending on the architecture):
The runpy module is used to locate and run Python modules without
importing them first. Its main use is to implement the -m command
line switch that allows scripts to be located using the Python module
namespace rather than the filesystem.
Execute the code of the specified module and return the resulting module
globals dictionary. The module’s code is first located using the standard
import mechanism (refer to PEP 302 for details) and then executed in a
fresh module namespace.
If the supplied module name refers to a package rather than a normal
module, then that package is imported and the __main__ submodule within
that package is then executed and the resulting module globals dictionary
returned.
The optional dictionary argument init_globals may be used to pre-populate
the module’s globals dictionary before the code is executed. The supplied
dictionary will not be modified. If any of the special global variables
below are defined in the supplied dictionary, those definitions are
overridden by run_module().
The special global variables __name__, __file__, __cached__,
__loader__
and __package__ are set in the globals dictionary before the module
code is executed (Note that this is a minimal set of variables - other
variables may be set implicitly as an interpreter implementation detail).
__name__ is set to run_name if this optional argument is not
None, to mod_name+'.__main__' if the named module is a
package and to the mod_name argument otherwise.
__file__ is set to the name provided by the module loader. If the
loader does not make filename information available, this variable is set
to None.
__cached__ will be set to None.
__loader__ is set to the PEP 302 module loader used to retrieve the
code for the module (This loader may be a wrapper around the standard
import mechanism).
__package__ is set to mod_name if the named module is a package and
to mod_name.rpartition('.')[0] otherwise.
If the argument alter_sys is supplied and evaluates to True,
then sys.argv[0] is updated with the value of __file__ and
sys.modules[__name__] is updated with a temporary module object for the
module being executed. Both sys.argv[0] and sys.modules[__name__]
are restored to their original values before the function returns.
Note that this manipulation of sys is not thread-safe. Other threads
may see the partially initialised module, as well as the altered list of
arguments. It is recommended that the sys module be left alone when
invoking this function from threaded code.
Changed in version 3.1:
Changed in version 3.1: Added ability to execute packages by looking for a __main__ submodule.
Changed in version 3.2:
Changed in version 3.2: Added __cached__ global variable (see PEP 3147).
Execute the code at the named filesystem location and return the resulting
module globals dictionary. As with a script name supplied to the CPython
command line, the supplied path may refer to a Python source file, a
compiled bytecode file or a valid sys.path entry containing a __main__
module (e.g. a zipfile containing a top-level __main__.py file).
For a simple script, the specified code is simply executed in a fresh
module namespace. For a valid sys.path entry (typically a zipfile or
directory), the entry is first added to the beginning of sys.path. The
function then looks for and executes a __main__ module using the
updated path. Note that there is no special protection against invoking
an existing __main__ entry located elsewhere on sys.path if
there is no such module at the specified location.
The optional dictionary argument init_globals may be used to pre-populate
the module’s globals dictionary before the code is executed. The supplied
dictionary will not be modified. If any of the special global variables
below are defined in the supplied dictionary, those definitions are
overridden by run_path().
The special global variables __name__, __file__, __loader__
and __package__ are set in the globals dictionary before the module
code is executed (Note that this is a minimal set of variables - other
variables may be set implicitly as an interpreter implementation detail).
__name__ is set to run_name if this optional argument is not
None and to '<run_path>' otherwise.
__file__ is set to the name provided by the module loader. If the
loader does not make filename information available, this variable is set
to None. For a simple script, this will be set to file_path.
__loader__ is set to the PEP 302 module loader used to retrieve the
code for the module (This loader may be a wrapper around the standard
import mechanism). For a simple script, this will be set to None.
__package__ is set to __name__.rpartition('.')[0].
A number of alterations are also made to the sys module. Firstly,
sys.path may be altered as described above. sys.argv[0] is updated
with the value of file_path and sys.modules[__name__] is updated
with a temporary module object for the module being executed. All
modifications to items in sys are reverted before the function
returns.
Note that, unlike run_module(), the alterations made to sys
are not optional in this function as these adjustments are essential to
allowing the execution of sys.path entries. As the thread-safety
limitations still apply, use of this function in threaded code should be
either serialised with the import lock or delegated to a separate process.
The purpose of the importlib package is two-fold. One is to provide an
implementation of the import statement (and thus, by extension, the
__import__() function) in Python source code. This provides an
implementation of import which is portable to any Python
interpreter. This also provides a reference implementation which is easier to
comprehend than one implemented in a programming language other than Python.
Two, the components to implement import are exposed in this
package, making it easier for users to create their own custom objects (known
generically as an importer) to participate in the import process.
Details on custom importers can be found in PEP 302.
Import a module. The name argument specifies what module to
import in absolute or relative terms
(e.g. either pkg.mod or ..mod). If the name is
specified in relative terms, then the package argument must be set to
the name of the package which is to act as the anchor for resolving the
package name (e.g. import_module('..mod','pkg.subpkg') will import
pkg.mod).
The import_module() function acts as a simplifying wrapper around
importlib.__import__(). This means all semantics of the function are
derived from importlib.__import__(), including requiring the package
from which an import is occurring to have been previously imported
(i.e., package must already be imported). The most important difference
is that import_module() returns the most nested package or module
that was imported (e.g. pkg.mod), while __import__() returns the
top-level package or module (e.g. pkg).
importlib.abc – Abstract base classes related to import¶
The importlib.abc module contains all of the core abstract base classes
used by import. Some subclasses of the core abstract base classes
are also provided to help in implementing the core ABCs.
An abstract method for finding a loader for the specified
module. If the finder is found on sys.meta_path and the
module to be searched for is a subpackage or module then path will
be the value of __path__ from the parent package. If a loader
cannot be found, None is returned.
An abstract method for loading a module. If the module cannot be
loaded, ImportError is raised, otherwise the loaded module is
returned.
If the requested module already exists in sys.modules, that
module should be used and reloaded.
Otherwise the loader should create a new module and insert it into
sys.modules before any loading begins, to prevent recursion
from the import. If the loader inserted a module and the load fails, it
must be removed by the loader from sys.modules; modules already
in sys.modules before the loader began execution should be left
alone. The importlib.util.module_for_loader() decorator handles
all of these details.
The loader should set several attributes on the module.
(Note that some of these attributes can change when a module is
reloaded.)
__name__
The name of the module.
__file__
The path to where the module data is stored (not set for built-in
modules).
__path__
A list of strings specifying the search path within a
package. This attribute is not set on modules.
__package__
The parent package for the module/package. If the module is
top-level then it has a value of the empty string. The
importlib.util.set_package() decorator can handle the details
for __package__.
__loader__
The loader used to load the module.
(This is not set by the built-in import machinery,
but it should be set whenever a loader is used.)
An abstract method to return the bytes for the data located at path.
Loaders that have a file-like storage back-end
that allows storing arbitrary data
can implement this abstract method to give direct access
to the data stored. IOError is to be raised if the path cannot
be found. The path is expected to be constructed using a module’s
__file__ attribute or an item from a package’s __path__.
An abstract method to return the code object for a module.
None is returned if the module does not have a code object
(e.g. built-in module). ImportError is raised if loader cannot
find the requested module.
An abstract method to return the source of a module. It is returned as
a text string with universal newlines. Returns None if no
source is available (e.g. a built-in module). Raises ImportError
if the loader cannot find the module specified.
An abstract method to return a true value if the module is a package, a
false value otherwise. ImportError is raised if the
loader cannot find the module.
An abstract base class which inherits from InspectLoader that,
when implemented, helps a module to be executed as a script. The ABC
represents an optional PEP 302 protocol.
An abstract base class for implementing source (and optionally bytecode)
file loading. The class inherits from both ResourceLoader and
ExecutionLoader, requiring the implementation of:
Should only return the path to the source file; sourceless
loading is not supported.
The abstract methods defined by this class are to add optional bytecode
file support. Not implementing these optional methods causes the loader to
only work with source code. Implementing the methods allows the loader to
work with source and bytecode files; it does not allow for sourceless
loading where only bytecode is provided. Bytecode files are an
optimization to speed up loading by removing the parsing step of Python’s
compiler, and so no bytecode-specific API is exposed.
Optional abstract method which writes the specified bytes to a file
path. Any intermediate directories which do not exist are to be created
automatically.
When writing to the path fails because the path is read-only
(errno.EACCES), do not propagate the exception.
Concrete implementation of InspectLoader.is_package(). A module
is determined to be a package if its file path is a file named
__init__ when the file extension is removed.
An abstract base class inheriting from
ExecutionLoader and
ResourceLoader designed to ease the loading of
Python source modules (bytecode is not handled; see
SourceLoader for a source/bytecode ABC). A subclass
implementing this ABC will only need to worry about exposing how the source
code is stored; all other details for loading Python source code will be
handled by the concrete implementations of key methods.
Deprecated since version 3.2:
Deprecated since version 3.2: This class has been deprecated in favor of SourceLoader and is
slated for removal in Python 3.4. See below for how to create a
subclass that is compatible with Python 3.1 onwards.
If compatibility with Python 3.1 is required, then use the following idiom
to implement a subclass that will work with Python 3.1 onwards (make sure
to implement ExecutionLoader.get_filename()):
try:
from importlib.abc import SourceLoader
except ImportError:
from importlib.abc import PyLoader as SourceLoader
class CustomLoader(SourceLoader):
def get_filename(self, fullname):
"""Return the path to the source file."""
# Implement ...
def source_path(self, fullname):
"""Implement source_path in terms of get_filename."""
try:
return self.get_filename(fullname)
except ImportError:
return None
def is_package(self, fullname):
"""Implement is_package by looking for an __init__ file
name as returned by get_filename."""
filename = os.path.basename(self.get_filename(fullname))
return os.path.splitext(filename)[0] == '__init__'
An abstract method that returns the path to the source code for a
module. Should return None if there is no source code.
Raises ImportError if the loader knows it cannot handle the
module.
A concrete implementation of importlib.abc.Loader.load_module()
that loads Python source code. All needed information comes from the
abstract methods required by this ABC. The only pertinent assumption
made by this method is that when loading a package
__path__ is set to [os.path.dirname(__file__)].
A concrete implementation of
importlib.abc.InspectLoader.get_code() that creates code objects
from Python source code, by requesting the source code (using
source_path() and get_data()) and compiling it with the
built-in compile() function.
An abstract base class inheriting from PyLoader.
This ABC is meant to help in creating loaders that support both Python
source and bytecode.
Deprecated since version 3.2:
Deprecated since version 3.2: This class has been deprecated in favor of SourceLoader and to
properly support PEP 3147. If compatibility is required with
Python 3.1, implement both SourceLoader and PyLoader;
instructions on how to do so are included in the documentation for
PyLoader. Do note that this solution will not support
sourceless/bytecode-only loading; only source and bytecode loading.
An abstract method which returns the modification time for the source
code of the specified module. The modification time should be an
integer. If there is no source code, return None. If the
module cannot be found then ImportError is raised.
An abstract method which returns the path to the bytecode for the
specified module, if it exists. It returns None
if no bytecode exists (yet).
Raises ImportError if the loader knows it cannot handle the
module.
An abstract method which has the loader write bytecode for future
use. If the bytecode is written, return True. Return
False if the bytecode could not be written. This method
should not be called if sys.dont_write_bytecode is true.
The bytecode argument should be a bytes string or bytes array.
This class does not perfectly mirror the semantics of import in
terms of sys.path. No implicit path hooks are assumed for
simplification of the class and its semantics.
Only class methods are defined by this class to alleviate the need for
instantiation.
Class method that attempts to find a loader for the module
specified by fullname on sys.path or, if defined, on
path. For each path entry that is searched,
sys.path_importer_cache is checked. If an non-false object is
found then it is used as the finder to look for the module
being searched for. If no entry is found in
sys.path_importer_cache, then sys.path_hooks is
searched for a finder for the path entry and, if found, is stored in
sys.path_importer_cache along with being queried about the
module. If no finder is ever found then None is returned.
A decorator for a loader method,
to handle selecting the proper
module object to load with. The decorated method is expected to have a call
signature taking two positional arguments
(e.g. load_module(self,module)) for which the second argument
will be the module object to be used by the loader.
Note that the decorator
will not work on static methods because of the assumption of two
arguments.
The decorated method will take in the name of the module to be loaded
as expected for a loader. If the module is not found in
sys.modules then a new one is constructed with its
__name__ attribute set. Otherwise the module found in
sys.modules will be passed into the method. If an
exception is raised by the decorated method and a module was added to
sys.modules it will be removed to prevent a partially initialized
module from being in left in sys.modules. If the module was already
in sys.modules then it is left alone.
Use of this decorator handles all the details of which module object a
loader should initialize as specified by PEP 302.
A decorator for a loader method,
to set the __loader__
attribute on loaded modules. If the attribute is already set the decorator
does nothing. It is assumed that the first positional argument to the
wrapped method is what __loader__ should be set to.
A decorator for a loader to set the __package__
attribute on the module returned by the loader. If __package__ is
set and has a value other than None it will not be changed.
Note that the module returned by the loader is what has the attribute
set on and not the module found in sys.modules.
Reliance on this decorator is discouraged when it is possible to set
__package__ before the execution of the code is possible. By
setting it before the code for the module is executed it allows the
attribute to be used at the global level of the module during
initialization.
Python provides a number of modules to assist in working with the Python
language. These modules support tokenizing, parsing, syntax analysis, bytecode
disassembly, and various other facilities.
The parser module provides an interface to Python’s internal parser and
byte-code compiler. The primary purpose for this interface is to allow Python
code to edit the parse tree of a Python expression and create executable code
from this. This is better than trying to parse and modify an arbitrary Python
code fragment as a string because parsing is performed in a manner identical to
the code forming the application. It is also faster.
Note
From Python 2.5 onward, it’s much more convenient to cut in at the Abstract
Syntax Tree (AST) generation and compilation stage, using the ast
module.
There are a few things to note about this module which are important to making
use of the data structures created. This is not a tutorial on editing the parse
trees for Python code, but some examples of using the parser module are
presented.
Most importantly, a good understanding of the Python grammar processed by the
internal parser is required. For full information on the language syntax, refer
to Python 语言参考. The parser
itself is created from a grammar specification defined in the file
Grammar/Grammar in the standard Python distribution. The parse trees
stored in the ST objects created by this module are the actual output from the
internal parser when created by the expr() or suite() functions,
described below. The ST objects created by sequence2st() faithfully
simulate those structures. Be aware that the values of the sequences which are
considered “correct” will vary from one version of Python to another as the
formal grammar for the language is revised. However, transporting code from one
Python version to another as source text will always allow correct parse trees
to be created in the target version, with the only restriction being that
migrating to an older version of the interpreter will not support more recent
language constructs. The parse trees are not typically compatible from one
version to another, whereas source code has always been forward-compatible.
Each element of the sequences returned by st2list() or st2tuple()
has a simple form. Sequences representing non-terminal elements in the grammar
always have a length greater than one. The first element is an integer which
identifies a production in the grammar. These integers are given symbolic names
in the C header file Include/graminit.h and the Python module
symbol. Each additional element of the sequence represents a component
of the production as recognized in the input string: these are always sequences
which have the same form as the parent. An important aspect of this structure
which should be noted is that keywords used to identify the parent node type,
such as the keyword if in an if_stmt, are included in the
node tree without any special treatment. For example, the if keyword
is represented by the tuple (1,'if'), where 1 is the numeric value
associated with all NAME tokens, including variable and function names
defined by the user. In an alternate form returned when line number information
is requested, the same token might be represented as (1,'if',12), where
the 12 represents the line number at which the terminal symbol was found.
Terminal elements are represented in much the same way, but without any child
elements and the addition of the source text which was identified. The example
of the if keyword above is representative. The various types of
terminal symbols are defined in the C header file Include/token.h and
the Python module token.
The ST objects are not required to support the functionality of this module,
but are provided for three purposes: to allow an application to amortize the
cost of processing complex parse trees, to provide a parse tree representation
which conserves memory space when compared to the Python list or tuple
representation, and to ease the creation of additional modules in C which
manipulate parse trees. A simple “wrapper” class may be created in Python to
hide the use of ST objects.
The parser module defines functions for a few distinct purposes. The
most important purposes are to create ST objects and to convert ST objects to
other representations such as parse trees and compiled code objects, but there
are also functions which serve to query the type of parse tree represented by an
ST object.
ST objects may be created from source code or from a parse tree. When creating
an ST object from source, different functions are used to create the 'eval'
and 'exec' forms.
The expr() function parses the parameter source as if it were an input
to compile(source,'file.py','eval'). If the parse succeeds, an ST object
is created to hold the internal parse tree representation, otherwise an
appropriate exception is raised.
The suite() function parses the parameter source as if it were an input
to compile(source,'file.py','exec'). If the parse succeeds, an ST object
is created to hold the internal parse tree representation, otherwise an
appropriate exception is raised.
This function accepts a parse tree represented as a sequence and builds an
internal representation if possible. If it can validate that the tree conforms
to the Python grammar and all nodes are valid node types in the host version of
Python, an ST object is created from the internal representation and returned
to the called. If there is a problem creating the internal representation, or
if the tree cannot be validated, a ParserError exception is raised. An
ST object created this way should not be assumed to compile correctly; normal
exceptions raised by compilation may still be initiated when the ST object is
passed to compilest(). This may indicate problems not related to syntax
(such as a MemoryError exception), but may also be due to constructs such
as the result of parsing delf(0), which escapes the Python parser but is
checked by the bytecode compiler.
Sequences representing terminal tokens may be represented as either two-element
lists of the form (1,'name') or as three-element lists of the form (1,'name',56). If the third element is present, it is assumed to be a valid
line number. The line number may be specified for any subset of the terminal
symbols in the input tree.
ST objects, regardless of the input used to create them, may be converted to
parse trees represented as list- or tuple- trees, or may be compiled into
executable code objects. Parse trees may be extracted with or without line
numbering information.
This function accepts an ST object from the caller in st and returns a
Python list representing the equivalent parse tree. The resulting list
representation can be used for inspection or the creation of a new parse tree in
list form. This function does not fail so long as memory is available to build
the list representation. If the parse tree will only be used for inspection,
st2tuple() should be used instead to reduce memory consumption and
fragmentation. When the list representation is required, this function is
significantly faster than retrieving a tuple representation and converting that
to nested lists.
If line_info is true, line number information will be included for all
terminal tokens as a third element of the list representing the token. Note
that the line number provided specifies the line on which the token ends.
This information is omitted if the flag is false or omitted.
This function accepts an ST object from the caller in st and returns a
Python tuple representing the equivalent parse tree. Other than returning a
tuple instead of a list, this function is identical to st2list().
If line_info is true, line number information will be included for all
terminal tokens as a third element of the list representing the token. This
information is omitted if the flag is false or omitted.
The Python byte compiler can be invoked on an ST object to produce code objects
which can be used as part of a call to the built-in exec() or eval()
functions. This function provides the interface to the compiler, passing the
internal parse tree from st to the parser, using the source file name
specified by the filename parameter. The default value supplied for filename
indicates that the source was an ST object.
Compiling an ST object may result in exceptions related to compilation; an
example would be a SyntaxError caused by the parse tree for delf(0):
this statement is considered legal within the formal grammar for Python but is
not a legal language construct. The SyntaxError raised for this
condition is actually generated by the Python byte-compiler normally, which is
why it can be raised at this point by the parser module. Most causes of
compilation failure can be diagnosed programmatically by inspection of the parse
tree.
Two functions are provided which allow an application to determine if an ST was
created as an expression or a suite. Neither of these functions can be used to
determine if an ST was created from source code via expr() or
suite() or from a parse tree via sequence2st().
When st represents an 'eval' form, this function returns true, otherwise
it returns false. This is useful, since code objects normally cannot be queried
for this information using existing built-in functions. Note that the code
objects created by compilest() cannot be queried like this either, and
are identical to those created by the built-in compile() function.
This function mirrors isexpr() in that it reports whether an ST object
represents an 'exec' form, commonly known as a “suite.” It is not safe to
assume that this function is equivalent to notisexpr(st), as additional
syntactic fragments may be supported in the future.
The parser module defines a single exception, but may also pass other built-in
exceptions from other portions of the Python runtime environment. See each
function for information about the exceptions it can raise.
Exception raised when a failure occurs within the parser module. This is
generally produced for validation failures rather than the built-in
SyntaxError raised during normal parsing. The exception argument is
either a string describing the reason of the failure or a tuple containing a
sequence causing the failure from a parse tree passed to sequence2st()
and an explanatory string. Calls to sequence2st() need to be able to
handle either type of exception, while calls to other functions in the module
will only need to be aware of the simple string values.
Note that the functions compilest(), expr(), and suite() may
raise exceptions which are normally raised by the parsing and compilation
process. These include the built in exceptions MemoryError,
OverflowError, SyntaxError, and SystemError. In these
cases, these exceptions carry all the meaning normally associated with them.
Refer to the descriptions of each function for detailed information.
While many useful operations may take place between parsing and bytecode
generation, the simplest operation is to do nothing. For this purpose, using
the parser module to produce an intermediate data structure is equivalent
to the code
The ast module helps Python applications to process trees of the Python
abstract syntax grammar. The abstract syntax itself might change with each
Python release; this module helps to find out programmatically what the current
grammar looks like.
An abstract syntax tree can be generated by passing ast.PyCF_ONLY_AST as
a flag to the compile() built-in function, or using the parse()
helper provided in this module. The result will be a tree of objects whose
classes all inherit from ast.AST. An abstract syntax tree can be
compiled into a Python code object using the built-in compile() function.
This is the base of all AST node classes. The actual node classes are
derived from the Parser/Python.asdl file, which is reproduced
below. They are defined in the _ast C
module and re-exported in ast.
There is one class defined for each left-hand side symbol in the abstract
grammar (for example, ast.stmt or ast.expr). In addition,
there is one class defined for each constructor on the right-hand side; these
classes inherit from the classes for the left-hand side trees. For example,
ast.BinOp inherits from ast.expr. For production rules
with alternatives (aka “sums”), the left-hand side class is abstract: only
instances of specific constructor nodes are ever created.
Each concrete class has an attribute _fields which gives the names
of all child nodes.
Each instance of a concrete class has one attribute for each child node,
of the type as defined in the grammar. For example, ast.BinOp
instances have an attribute left of type ast.expr.
If these attributes are marked as optional in the grammar (using a
question mark), the value might be None. If the attributes can have
zero-or-more values (marked with an asterisk), the values are represented
as Python lists. All possible attributes must be present and have valid
values when compiling an AST with compile().
Instances of ast.expr and ast.stmt subclasses have
lineno and col_offset attributes. The lineno is
the line number of source text (1-indexed so the first line is line 1) and
the col_offset is the UTF-8 byte offset of the first token that
generated the node. The UTF-8 offset is recorded because the parser uses
UTF-8 internally.
The constructor of a class ast.T parses its arguments as follows:
If there are positional arguments, there must be as many as there are items
in T._fields; they will be assigned as attributes of these names.
If there are keyword arguments, they will set the attributes of the same
names to the given values.
For example, to create and populate an ast.UnaryOp node, you could
use
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
This can be used for safely evaluating strings containing Python expressions
from untrusted sources without the need to parse the values oneself.
Changed in version 3.2:
Changed in version 3.2: Now allows bytes and set literals.
Return the docstring of the given node (which must be a
FunctionDef, ClassDef or Module node), or None
if it has no docstring. If clean is true, clean up the docstring’s
indentation with inspect.cleandoc().
When you compile a node tree with compile(), the compiler expects
lineno and col_offset attributes for every node that supports
them. This is rather tedious to fill in for generated nodes, so this helper
adds these attributes recursively where not already set, by setting them to
the values of the parent node. It works recursively starting at node.
Recursively yield all descendant nodes in the tree starting at node
(including node itself), in no specified order. This is useful if you only
want to modify nodes in place and don’t care about the context.
A node visitor base class that walks the abstract syntax tree and calls a
visitor function for every node found. This function may return a value
which is forwarded by the visit() method.
This class is meant to be subclassed, with the subclass adding visitor
methods.
Visit a node. The default implementation calls the method called
self.visit_classname where classname is the name of the node
class, or generic_visit() if that method doesn’t exist.
This visitor calls visit() on all children of the node.
Note that child nodes of nodes that have a custom visitor method won’t be
visited unless the visitor calls generic_visit() or visits them
itself.
Don’t use the NodeVisitor if you want to apply changes to nodes
during traversal. For this a special visitor exists
(NodeTransformer) that allows modifications.
A NodeVisitor subclass that walks the abstract syntax tree and
allows modification of nodes.
The NodeTransformer will walk the AST and use the return value of
the visitor methods to replace or remove the old node. If the return value
of the visitor method is None, the node will be removed from its
location, otherwise it is replaced with the return value. The return value
may be the original node in which case no replacement takes place.
Here is an example transformer that rewrites all occurrences of name lookups
(foo) to data['foo']:
Keep in mind that if the node you’re operating on has child nodes you must
either transform the child nodes yourself or call the generic_visit()
method for the node first.
For nodes that were part of a collection of statements (that applies to all
statement nodes), the visitor may also return a list of nodes rather than
just a single node.
Return a formatted dump of the tree in node. This is mainly useful for
debugging purposes. The returned string will show the names and the values
for fields. This makes the code impossible to evaluate, so if evaluation is
wanted annotate_fields must be set to False. Attributes such as line
numbers and column offsets are not dumped by default. If this is wanted,
include_attributes can be set to True.
symtable — Access to the compiler’s symbol tables¶
Symbol tables are generated by the compiler from AST just before bytecode is
generated. The symbol table is responsible for calculating the scope of every
identifier in the code. symtable provides an interface to examine these
tables.
Return the toplevel SymbolTable for the Python source code.
filename is the name of the file containing the code. compile_type is
like the mode argument to compile().
Return the table’s name. This is the name of the class if the table is
for a class, the name of the function if the table is for a function, or
'top' if the table is global (get_type() returns 'module').
Note that a single name can be bound to multiple objects. If the result
is True, the name may also be bound to other objects, like an int or
list, that does not introduce a new namespace.
This module provides constants which represent the numeric values of internal
nodes of the parse tree. Unlike most Python constants, these use lower-case
names. Refer to the file Grammar/Grammar in the Python distribution for
the definitions of the names in the context of the language grammar. The
specific numeric values which the names map to may change between Python
versions.
This module also provides one additional data object:
Dictionary mapping the numeric values of the constants defined in this module
back to name strings, allowing more human-readable representation of parse trees
to be generated.
This module provides constants which represent the numeric values of leaf nodes
of the parse tree (terminal tokens). Refer to the file Grammar/Grammar
in the Python distribution for the definitions of the names in the context of
the language grammar. The specific numeric values which the names map to may
change between Python versions.
The module also provides a mapping from numeric codes to names and some
functions. The functions mirror definitions in the Python C header files.
Dictionary mapping the numeric values of the constants defined in this module
back to name strings, allowing more human-readable representation of parse trees
to be generated.
Sequence containing all the keywords defined for the interpreter. If any
keywords are defined to only be active when particular __future__
statements are in effect, these will be included as well.
The tokenize module provides a lexical scanner for Python source code,
implemented in Python. The scanner in this module returns comments as tokens
as well, making it useful for implementing “pretty-printers,” including
colorizers for on-screen displays.
The tokenize() generator requires one argument, readline, which
must be a callable object which provides the same interface as the
io.IOBase.readline() method of file objects. Each call to the
function should return one line of input as bytes.
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow,scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow,ecol) of
ints specifying the row and column where the token ends in the source; and
the line on which the token was found. The line passed (the last tuple item)
is the logical line; continuation lines are included. The 5 tuple is
returned as a named tuple with the field names:
typestringstartendline.
Changed in version 3.1:
Changed in version 3.1: Added support for named tuples.
tokenize() determines the source encoding of the file by looking for a
UTF-8 BOM or encoding cookie, according to PEP 263.
All constants from the token module are also exported from
tokenize, as are three additional token type values:
Token value used to indicate a non-terminating newline. The NEWLINE token
indicates the end of a logical line of Python code; NL tokens are generated
when a logical line of code is continued over multiple physical lines.
Token value that indicates the encoding used to decode the source bytes
into text. The first token returned by tokenize() will always be an
ENCODING token.
Another function is provided to reverse the tokenization process. This is
useful for creating tools that tokenize a script, modify the token stream, and
write back the modified script.
Converts tokens back into Python source code. The iterable must return
sequences with at least two elements, the token type and the token string.
Any additional sequence elements are ignored.
The reconstructed script is returned as a single string. The result is
guaranteed to tokenize back to match the input so that the conversion is
lossless and round-trips are assured. The guarantee applies only to the
token type and token string as the spacing between tokens (column
positions) may change.
It returns bytes, encoded using the ENCODING token, which is the first
token sequence output by tokenize().
tokenize() needs to detect the encoding of source files it tokenizes. The
function it uses to do this is available:
The detect_encoding() function is used to detect the encoding that
should be used to decode a Python source file. It requires one argument,
readline, in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (not decoded from bytes) it has read
in.
It detects the encoding from the presence of a UTF-8 BOM or an encoding
cookie as specified in PEP 263. If both a BOM and a cookie are present,
but disagree, a SyntaxError will be raised. Note that if the BOM is found,
'utf-8-sig' will be returned as an encoding.
If no encoding is specified, then the default of 'utf-8' will be
returned.
Use open() to open Python source files: it uses
detect_encoding() to detect the file encoding.
Open a file in read only mode using the encoding detected by
detect_encoding().
New in version 3.2:
New in version 3.2.
Example of a script rewriter that transforms float literals into Decimal
objects:
from tokenize import tokenize, untokenize, NUMBER, STRING, NAME, OP
from io import BytesIO
def decistmt(s):
"""Substitute Decimals for floats in a string of statements.
>>> from decimal import Decimal
>>> s = 'print(+21.3e-5*-.1234/81.7)'
>>> decistmt(s)
"print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
The format of the exponent is inherited from the platform C library.
Known cases are "e-007" (Windows) and "e-07" (not Windows). Since
we're only showing 12 digits, and the 13th isn't close to 5, the
rest of the output should be platform-independent.
>>> exec(s) #doctest: +ELLIPSIS
-3.21716034272e-0...7
Output from calculations with Decimal should be identical across all
platforms.
>>> exec(decistmt(s))
-3.217160342717258261933904529E-7
"""
result = []
g = tokenize(BytesIO(s.encode('utf-8')).readline) # tokenize the string
for toknum, tokval, _, _, _ in g:
if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
else:
result.append((toknum, tokval))
return untokenize(result).decode('utf-8')
For the time being this module is intended to be called as a script. However it
is possible to import it into an IDE and use the function check()
described below.
Note
The API provided by this module is likely to change in future releases; such
changes may not be backward compatible.
If file_or_dir is a directory and not a symbolic link, then recursively
descend the directory tree named by file_or_dir, checking all .py
files along the way. If file_or_dir is an ordinary Python source file, it
is checked for whitespace related problems. The diagnostic messages are
written to standard output using the print() function.
Flag indicating whether to print only the filenames of files containing
whitespace related problems. This is set to true by the -q option if called
as a script.
The pyclbr module can be used to determine some limited information
about the classes, methods and top-level functions defined in a module. The
information provided is sufficient to implement a traditional three-pane
class browser. The information is extracted from the source code rather
than by importing the module, so this module is safe to use with untrusted
code. This restriction makes it impossible to use this module with modules
not implemented in Python, including all standard and optional extension
modules.
Read a module and return a dictionary mapping class names to class
descriptor objects. The parameter module should be the name of a
module as a string; it may be the name of a module within a package. The
path parameter should be a sequence, and is used to augment the value
of sys.path, which is used to locate module source code.
Like readmodule(), but the returned dictionary, in addition to
mapping class names to class descriptor objects, also maps top-level
function names to function descriptor objects. Moreover, if the module
being read is a package, the key '__path__' in the returned
dictionary has as its value a list which contains the package search
path.
A list of Class objects which describe the immediate base
classes of the class being described. Classes which are named as
superclasses but which are not discoverable by readmodule() are
listed as a string with the class name instead of as Class
objects.
The py_compile module provides a function to generate a byte-code file
from a source file, and another function used when the module source file is
invoked as a script.
Though not often needed, this function can be useful when installing modules for
shared use, especially if some of the users may not have permission to write the
byte-code cache files in the directory containing the source code.
Compile a source file to byte-code and write out the byte-code cache file.
The source code is loaded from the file name file. The byte-code is
written to cfile, which defaults to the PEP 3147 path, ending in
.pyc (.pyo if optimization is enabled in the current interpreter).
For example, if file is /foo/bar/baz.pycfile will default to
/foo/bar/__pycache__/baz.cpython-32.pyc for Python 3.2. If dfile is
specified, it is used as the name of the source file in error messages when
instead of file. If doraise is true, a PyCompileError is raised
when an error is encountered while compiling file. If doraise is false
(the default), an error string is written to sys.stderr, but no exception
is raised. This function returns the path to byte-compiled file, i.e.
whatever cfile value was used.
optimize controls the optimization level and is passed to the built-in
compile() function. The default of -1 selects the optimization
level of the current interpreter.
Changed in version 3.2:
Changed in version 3.2: Changed default value of cfile to be PEP 3147-compliant. Previous
default was file + 'c' ('o' if optimization was enabled).
Also added the optimize parameter.
Compile several source files. The files named in args (or on the command
line, if args is None) are compiled and the resulting bytecode is
cached in the normal manner. This function does not search a directory
structure to locate source files; it only compiles files named explicitly.
If '-' is the only parameter in args, the list of files is taken from
standard input.
Changed in version 3.2:
Changed in version 3.2: Added support for '-'.
When this module is run as a script, the main() is used to compile all the
files named on the command line. The exit status is nonzero if one of the files
could not be compiled.
This module provides some utility functions to support installing Python
libraries. These functions compile Python source files in a directory tree.
This module can be used to create the cached byte-code files at library
installation time, which makes them available for use even by users who don’t
have write permission to the library directories.
This module can work as a script (using python -m compileall) to
compile Python sources.
[directory|file]...
Positional arguments are files to compile or directories that contain
source files, traversed recursively. If no argument is given, behave as if
the command line was -l<directoriesfromsys.path>.
Directory prepended to the path to each file being compiled. This will
appear in compilation time tracebacks, and is also compiled in to the
byte-code file, where it will be used in tracebacks and other messages in
cases where the source file does not exist at the time the byte-code file is
executed.
Write the byte-code files to their legacy locations and names, which may
overwrite byte-code files created by another version of Python. The default
is to write files to their PEP 3147 locations and names, which allows
byte-code files from multiple versions of Python to coexist.
Changed in version 3.2:
Changed in version 3.2: Added the -i, -b and -h options.
There is no command-line option to control the optimization level used by the
compile() function, because the Python interpreter itself already
provides the option: python -O -m compileall.
Recursively descend the directory tree named by dir, compiling all .py
files along the way.
The maxlevels parameter is used to limit the depth of the recursion; it
defaults to 10.
If ddir is given, it is prepended to the path to each file being compiled
for use in compilation time tracebacks, and is also compiled in to the
byte-code file, where it will be used in tracebacks and other messages in
cases where the source file does not exist at the time the byte-code file is
executed.
If force is true, modules are re-compiled even if the timestamps are up to
date.
If rx is given, its search method is called on the complete path to each
file considered for compilation, and if it returns a true value, the file
is skipped.
If quiet is true, nothing is printed to the standard output unless errors
occur.
If legacy is true, byte-code files are written to their legacy locations
and names, which may overwrite byte-code files created by another version of
Python. The default is to write files to their PEP 3147 locations and
names, which allows byte-code files from multiple versions of Python to
coexist.
optimize specifies the optimization level for the compiler. It is passed to
the built-in compile() function.
Changed in version 3.2:
Changed in version 3.2: Added the legacy and optimize parameter.
If ddir is given, it is prepended to the path to the file being compiled
for use in compilation time tracebacks, and is also compiled in to the
byte-code file, where it will be used in tracebacks and other messages in
cases where the source file does not exist at the time the byte-code file is
executed.
If rx is given, its search method is passed the full path name to the
file being compiled, and if it returns a true value, the file is not
compiled and True is returned.
If quiet is true, nothing is printed to the standard output unless errors
occur.
If legacy is true, byte-code files are written to their legacy locations
and names, which may overwrite byte-code files created by another version of
Python. The default is to write files to their PEP 3147 locations and
names, which allows byte-code files from multiple versions of Python to
coexist.
optimize specifies the optimization level for the compiler. It is passed to
the built-in compile() function.
Byte-compile all the .py files found along sys.path. If
skip_curdir is true (the default), the current directory is not included
in the search. All other parameters are passed to the compile_dir()
function. Note that unlike the other compile functions, maxlevels
defaults to 0.
Changed in version 3.2:
Changed in version 3.2: Added the legacy and optimize parameter.
To force a recompile of all the .py files in the Lib/
subdirectory and all its subdirectories:
import compileall
compileall.compile_dir('Lib/', force=True)
# Perform same compilation, excluding files in .svn directories.
import re
compileall.compile_dir('Lib/', rx=re.compile('/[.]svn'), force=True)
The dis module supports the analysis of CPython bytecode by
disassembling it. The CPython bytecode which this module takes as an
input is defined in the file Include/opcode.h and used by the compiler
and the interpreter.
CPython implementation detail: Bytecode is an implementation detail of the CPython interpreter. No
guarantees are made that bytecode will not be added, removed, or changed
between versions of Python. Use of this module should not be considered to
work across Python VMs or Python releases.
Example: Given the function myfunc():
def myfunc(alist):
return len(alist)
the following command can be used to get the disassembly of myfunc():
Return a formatted multi-line string with detailed code object information
for the supplied function, method, source code string or code object.
Note that the exact contents of code info strings are highly implementation
dependent and they may change arbitrarily across Python VMs or Python
releases.
Disassemble the x object. x can denote either a module, a class, a
method, a function, a code object, a string of source code or a byte sequence
of raw bytecode. For a module, it disassembles all functions. For a class,
it disassembles all methods. For a code object or sequence of raw bytecode,
it prints one line per bytecode instruction. Strings are first compiled to
code objects with the compile() built-in function before being
disassembled. If no object is provided, this function disassembles the last
traceback.
This generator function uses the co_firstlineno and co_lnotab
attributes of the code object code to find the offsets which are starts of
lines in the source code. They are generated as (offset,lineno) pairs.
Binary operations remove the top of the stack (TOS) and the second top-most
stack item (TOS1) from the stack. They perform the operation, and put the
result back on the stack.
In-place operations are like binary operations, in that they remove TOS and
TOS1, and push the result back on the stack, but the operation is done in-place
when TOS1 supports it, and the resulting TOS may be (but does not have to be)
the original TOS1.
Implements the expression statement for the interactive mode. TOS is removed
from the stack and printed. In non-interactive mode, an expression statement is
terminated with POP_STACK.
Calls dict.setitem(TOS1[-i],TOS,TOS1). Used to implement dict
comprehensions.
For all of the SET_ADD, LIST_APPEND and MAP_ADD instructions, while the
added value or key/value pair is popped off, the container object remains on
the stack so that it is available for further iterations of the loop.
Loads all symbols not starting with '_' directly from the module TOS to the
local namespace. The module is popped after loading all names. This opcode
implements frommoduleimport*.
Removes one block from the block stack. The popped block must be an exception
handler block, as implicitly created when entering an except handler.
In addition to popping extraneous values from the frame stack, the
last three popped values are used to restore the exception state.
Terminates a finally clause. The interpreter recalls whether the
exception has to be re-raised, or whether the function returns, and continues
with the outer-next block.
This opcode performs several operations before a with block starts. First,
it loads __exit__() from the context manager and pushes it onto
the stack for later use by WITH_CLEANUP. Then,
__enter__() is called, and a finally block pointing to delta
is pushed. Finally, the result of calling the enter method is pushed onto
the stack. The next opcode will either ignore it (POP_TOP), or
store it in (a) variable(s) (STORE_FAST, STORE_NAME, or
UNPACK_SEQUENCE).
Cleans up the stack when a with statement block exits. TOS is
the context manager’s __exit__() bound method. Below TOS are 1–3
values indicating how/why the finally clause was entered:
SECOND = None
(SECOND, THIRD) = (WHY_{RETURN,CONTINUE}), retval
SECOND = WHY_*; no retval below it
(SECOND, THIRD, FOURTH) = exc_info()
In the last case, TOS(SECOND,THIRD,FOURTH) is called, otherwise
TOS(None,None,None). In addition, TOS is removed from the stack.
If the stack represents an exception, and the function call returns
a ‘true’ value, this information is “zapped” and replaced with a single
WHY_SILENCED to prevent END_FINALLY from re-raising the exception.
(But non-local gotos will still be resumed.)
Implements name=TOS. namei is the index of name in the attribute
co_names of the code object. The compiler tries to use STORE_FAST
or STORE_GLOBAL if possible.
Implements assignment with a starred target: Unpacks an iterable in TOS into
individual values, where the total number of values can be smaller than the
number of items in the iterable: one the new values will be a list of all
leftover items.
The low byte of counts is the number of values before the list value, the
high byte of counts the number of values after it. The resulting values
are put onto the stack right-to-left.
Imports the module co_names[namei]. TOS and TOS1 are popped and provide
the fromlist and level arguments of __import__(). The module
object is pushed onto the stack. The current namespace is not affected:
for a proper import statement, a subsequent STORE_FAST instruction
modifies the namespace.
Loads the attribute co_names[namei] from the module found in TOS. The
resulting object is pushed onto the stack, to be subsequently stored by a
STORE_FAST instruction.
TOS is an iterator. Call its __next__() method. If this
yields a new value, push it on the stack (leaving the iterator below it). If
the iterator indicates it is exhausted TOS is popped, and the byte code
counter is incremented by delta.
Pushes a reference to the cell contained in slot i of the cell and free
variable storage. The name of the variable is co_cellvars[i] if i is
less than the length of co_cellvars. Otherwise it is co_freevars[i-len(co_cellvars)].
Raises an exception. argc indicates the number of parameters to the raise
statement, ranging from 0 to 3. The handler will find the traceback as TOS2,
the parameter as TOS1, and the exception as TOS.
Calls a function. The low byte of argc indicates the number of positional
parameters, the high byte the number of keyword parameters. On the stack, the
opcode finds the keyword parameters first. For each keyword argument, the value
is on top of the key. Below the keyword parameters, the positional parameters
are on the stack, with the right-most parameter on top. Below the parameters,
the function object to call is on the stack. Pops all function arguments, and
the function itself off the stack, and pushes the return value.
Pushes a new function object on the stack. TOS is the code associated with the
function. The function object is defined to have argc default parameters,
which are found below TOS.
Creates a new function object, sets its __closure__ slot, and pushes it on
the stack. TOS is the code associated with the function, TOS1 the tuple
containing cells for the closure’s free variables. The function also has
argc default parameters, which are found below the cells.
Pushes a slice object on the stack. argc must be 2 or 3. If it is 2,
slice(TOS1,TOS) is pushed; if it is 3, slice(TOS2,TOS1,TOS) is
pushed. See the slice() built-in function for more information.
Prefixes any opcode which has an argument too big to fit into the default two
bytes. ext holds two additional bytes which, taken together with the
subsequent opcode’s argument, comprise a four-byte argument, ext being the two
most-significant bytes.
Calls a function. argc is interpreted as in CALL_FUNCTION. The top element
on the stack contains the variable argument list, followed by keyword and
positional arguments.
Calls a function. argc is interpreted as in CALL_FUNCTION. The top element
on the stack contains the keyword arguments dictionary, followed by explicit
keyword and positional arguments.
Calls a function. argc is interpreted as in CALL_FUNCTION. The top
element on the stack contains the keyword arguments dictionary, followed by the
variable-arguments tuple, followed by explicit keyword and positional arguments.
This is not really an opcode. It identifies the dividing line between opcodes
which don’t take arguments <HAVE_ARGUMENT and those which do >=HAVE_ARGUMENT.
This module contains various constants relating to the intimate details of the
pickle module, some lengthy comments about the implementation, and a
few useful functions for analyzing pickled data. The contents of this module
are useful for Python core developers who are working on the pickle;
ordinary users of the pickle module probably won’t find the
pickletools module relevant.
When invoked from the command line, python-mpickletools will
disassemble the contents of one or more pickle files. Note that if
you want to see the Python object stored in the pickle rather than the
details of pickle format, you may want to use -mpickle instead.
However, when the pickle file that you want to examine comes from an
untrusted source, -mpickletools is a safer option because it does
not execute pickle bytecode.
For example, with a tuple (1,2) pickled in file x.pickle:
$ python -m pickle x.pickle
(1, 2)
$ python -m pickletools x.pickle
0: \x80 PROTO 3
2: K BININT1 1
4: K BININT1 2
6: \x86 TUPLE2
7: q BINPUT 0
9: . STOP
highest protocol among opcodes = 2
Outputs a symbolic disassembly of the pickle to the file-like
object out, defaulting to sys.stdout. pickle can be a
string or a file-like object. memo can be a Python dictionary
that will be used as the pickle’s memo; it can be used to perform
disassemblies across multiple pickles created by the same
pickler. Successive levels, indicated by MARK opcodes in the
stream, are indented by indentlevel spaces. If a nonzero value
is given to annotate, each opcode in the output is annotated with
a short description. The value of annotate is used as a hint for
the column where annotation should start.
Provides an iterator over all of the opcodes in a pickle, returning a
sequence of (opcode,arg,pos) triples. opcode is an instance of an
OpcodeInfo class; arg is the decoded value, as a Python object, of
the opcode’s argument; pos is the position at which this opcode is located.
pickle can be a string or a file-like object.
Returns a new equivalent pickle string after eliminating unused PUT
opcodes. The optimized pickle is shorter, takes less transmission time,
requires less storage space, and unpickles more efficiently.
This module supports two interface definitions, each with multiple
implementations: The formatter interface, and the writer interface which is
required by the formatter interface.
Formatter objects transform an abstract flow of formatting events into specific
output events on writer objects. Formatters manage several stack structures to
allow various properties of a writer object to be changed and restored; writers
need not be able to handle relative changes nor any sort of “change back”
operation. Specific writer properties which may be controlled via formatter
objects are horizontal alignment, font, and left margin indentations. A
mechanism is provided which supports providing arbitrary, non-exclusive style
settings to a writer as well. Additional interfaces facilitate formatting
events which are not reversible, such as paragraph separation.
Writer objects encapsulate device interfaces. Abstract devices, such as file
formats, are supported as well as physical devices. The provided
implementations all work with abstract devices. The interface makes available
mechanisms for setting the properties which formatter objects manage and
inserting data into the output.
Interfaces to create formatters are dependent on the specific formatter class
being instantiated. The interfaces described below are the required interfaces
which all formatters must support once initialized.
Value which can be used in the font specification passed to the push_font()
method described below, or as the new value to any other push_property()
method. Pushing the AS_IS value allows the corresponding pop_property()
method to be called without having to track whether the property was changed.
The following attributes are defined for formatter instance objects:
Insert a horizontal rule in the output. A hard break is inserted if there is
data in the current paragraph, but the logical paragraph is not broken. The
arguments and keywords are passed on to the writer’s send_line_break()
method.
Provide data which should be formatted with collapsed whitespace. Whitespace
from preceding and successive calls to add_flowing_data() is considered as
well when the whitespace collapse is performed. The data which is passed to
this method is expected to be word-wrapped by the output device. Note that any
word-wrapping still must be performed by the writer object due to the need to
rely on device and font information.
Provide data which should be passed to the writer unchanged. Whitespace,
including newline and tab characters, are considered legal in the value of
data.
Insert a label which should be placed to the left of the current left margin.
This should be used for constructing bulleted or numbered lists. If the
format value is a string, it is interpreted as a format specification for
counter, which should be an integer. The result of this formatting becomes the
value of the label; if format is not a string it is used as the label value
directly. The label value is passed as the only argument to the writer’s
send_label_data() method. Interpretation of non-string label values is
dependent on the associated writer.
Format specifications are strings which, in combination with a counter value,
are used to compute label values. Each character in the format string is copied
to the label value, with some characters recognized to indicate a transform on
the counter value. Specifically, the character '1' represents the counter
value formatter as an Arabic number, the characters 'A' and 'a'
represent alphabetic representations of the counter value in upper and lower
case, respectively, and 'I' and 'i' represent the counter value in Roman
numerals, in upper and lower case. Note that the alphabetic and roman
transforms require that the counter value be greater than zero.
Send any pending whitespace buffered from a previous call to
add_flowing_data() to the associated writer object. This should be called
before any direct manipulation of the writer object.
Push a new alignment setting onto the alignment stack. This may be
AS_IS if no change is desired. If the alignment value is changed from
the previous setting, the writer’s new_alignment() method is called with
the align value.
Change some or all font properties of the writer object. Properties which are
not set to AS_IS are set to the values passed in while others are
maintained at their current settings. The writer’s new_font() method is
called with the fully resolved font specification.
Increase the number of left margin indentations by one, associating the logical
tag margin with the new indentation. The initial margin level is 0.
Changed values of the logical tag must be true values; false values other than
AS_IS are not sufficient to change the margin.
Push any number of arbitrary style specifications. All styles are pushed onto
the styles stack in order. A tuple representing the entire stack, including
AS_IS values, is passed to the writer’s new_styles() method.
Pop the last n style specifications passed to push_style(). A tuple
representing the revised stack, including AS_IS values, is passed to
the writer’s new_styles() method.
Inform the formatter that data has been added to the current paragraph
out-of-band. This should be used when the writer has been manipulated
directly. The optional flag argument can be set to false if the writer
manipulations produced a hard line break at the end of the output.
Two implementations of formatter objects are provided by this module. Most
applications may use one of these classes without modification or subclassing.
A formatter which does nothing. If writer is omitted, a NullWriter
instance is created. No methods of the writer are called by
NullFormatter instances. Implementations should inherit from this
class if implementing a writer interface but don’t need to inherit any
implementation.
The standard formatter. This implementation has demonstrated wide applicability
to many writers, and may be used directly in most circumstances. It has been
used to implement a full-featured World Wide Web browser.
Interfaces to create writers are dependent on the specific writer class being
instantiated. The interfaces described below are the required interfaces which
all writers must support once initialized. Note that while most applications can
use the AbstractFormatter class as a formatter, the writer must
typically be provided by the application.
Set the alignment style. The align value can be any object, but by convention
is a string or None, where None indicates that the writer’s “preferred”
alignment should be used. Conventional align values are 'left',
'center', 'right', and 'justify'.
Set the font style. The value of font will be None, indicating that the
device’s default font should be used, or a tuple of the form (size,italic,bold,teletype). Size will be a string indicating the size of
font that should be used; specific strings and their interpretation must be
defined by the application. The italic, bold, and teletype values are
Boolean values specifying which of those font attributes should be used.
Set the margin level to the integer level and the logical tag to margin.
Interpretation of the logical tag is at the writer’s discretion; the only
restriction on the value of the logical tag is that it not be a false value for
non-zero values of level.
Set additional styles. The styles value is a tuple of arbitrary values; the
value AS_IS should be ignored. The styles tuple may be interpreted
either as a set or as a stack depending on the requirements of the application
and writer implementation.
Produce a paragraph separation of at least blankline blank lines, or the
equivalent. The blankline value will be an integer. Note that the
implementation will receive a call to send_line_break() before this call
if a line break is needed; this method should not include ending the last line
of the paragraph. It is only responsible for vertical spacing between
paragraphs.
Display a horizontal rule on the output device. The arguments to this method
are entirely application- and writer-specific, and should be interpreted with
care. The method implementation may assume that a line break has already been
issued via send_line_break().
Output character data which may be word-wrapped and re-flowed as needed. Within
any sequence of calls to this method, the writer may assume that spans of
multiple whitespace characters have been collapsed to single space characters.
Output character data which has already been formatted for display. Generally,
this should be interpreted to mean that line breaks indicated by newline
characters should be preserved and no new line breaks should be introduced. The
data may contain embedded newline and tab characters, unlike data provided to
the send_formatted_data() interface.
Set data to the left of the current left margin, if possible. The value of
data is not restricted; treatment of non-string values is entirely
application- and writer-dependent. This method will only be called at the
beginning of a line.
Three implementations of the writer object interface are provided as examples by
this module. Most applications will need to derive new writer classes from the
NullWriter class.
A writer which only provides the interface definition; no actions are taken on
any methods. This should be the base class for all writers which do not need to
inherit any implementation methods.
A writer which can be used in debugging formatters, but not much else. Each
method simply announces itself by printing its name and arguments on standard
output.
Simple writer class which writes output on the file object passed
in as file or, if file is omitted, on standard output. The output is
simply word-wrapped to the number of columns specified by maxcol. This
class is suitable for reflowing a sequence of paragraphs.
This chapter describes modules that are only available on MS Windows platforms.
msilib — Read and write Microsoft Installer files¶
The msilib supports the creation of Microsoft Installer (.msi) files.
Because these files often contain an embedded “cabinet” file (.cab), it also
exposes an API to create CAB files. Support for reading .cab files is
currently not implemented; read support for the .msi database is possible.
This package aims to provide complete access to all tables in an .msi file,
therefore, it is a fairly low-level API. Two primary applications of this
package are the distutils command bdist_msi, and the creation of
Python installer package itself (although that currently uses a different
version of msilib).
The package contents can be roughly split into four parts: low-level CAB
routines, low-level MSI routines, higher-level MSI routines, and standard table
structures.
Create a new CAB file named cabname. files must be a list of tuples, each
containing the name of the file on disk, and the name of the file inside the CAB
file.
The files are added to the CAB file in the order they appear in the list. All
files are added into a single CAB file, using the MSZIP compression algorithm.
Callbacks to Python for the various steps of MSI creation are currently not
exposed.
Return a new database object by calling MsiOpenDatabase. path is the file
name of the MSI file; persist can be one of the constants
MSIDBOPEN_CREATEDIRECT, MSIDBOPEN_CREATE, MSIDBOPEN_DIRECT,
MSIDBOPEN_READONLY, or MSIDBOPEN_TRANSACT, and may include the flag
MSIDBOPEN_PATCHFILE. See the Microsoft documentation for the meaning of
these flags; depending on the flags, an existing database is opened, or a new
one created.
Add all records to the table named table in database.
The table argument must be one of the predefined tables in the MSI schema,
e.g. 'Feature', 'File', 'Component', 'Dialog', 'Control',
etc.
records should be a list of tuples, each one containing all fields of a
record according to the schema of the table. For optional fields,
None can be passed.
Field values can be int or long numbers, strings, or instances of the Binary
class.
Add all table content from module to database. module must contain an
attribute tables listing all tables for which content should be added, and one
attribute per table that has the actual content.
This is typically used to install the sequence tables.
Execute the SQL query of the view, through MSIViewExecute(). If
params is not None, it is a record describing actual values of the
parameter tokens in the query.
Modify the view, by calling MsiViewModify(). kind can be one of
MSIMODIFY_SEEK, MSIMODIFY_REFRESH, MSIMODIFY_INSERT,
MSIMODIFY_UPDATE, MSIMODIFY_ASSIGN, MSIMODIFY_REPLACE,
MSIMODIFY_MERGE, MSIMODIFY_DELETE, MSIMODIFY_INSERT_TEMPORARY,
MSIMODIFY_VALIDATE, MSIMODIFY_VALIDATE_NEW,
MSIMODIFY_VALIDATE_FIELD, or MSIMODIFY_VALIDATE_DELETE.
Return a property of the summary, through MsiSummaryInfoGetProperty().
field is the name of the property, and can be one of the constants
PID_CODEPAGE, PID_TITLE, PID_SUBJECT, PID_AUTHOR,
PID_KEYWORDS, PID_COMMENTS, PID_TEMPLATE, PID_LASTAUTHOR,
PID_REVNUMBER, PID_LASTPRINTED, PID_CREATE_DTM,
PID_LASTSAVE_DTM, PID_PAGECOUNT, PID_WORDCOUNT, PID_CHARCOUNT,
PID_APPNAME, or PID_SECURITY.
Set a property through MsiSummaryInfoSetProperty(). field can have the
same values as in GetProperty(), value is the new value of the property.
Possible value types are integer and string.
The class CAB represents a CAB file. During MSI construction, files
will be added simultaneously to the Files table, and to a CAB file. Then,
when all files have been added, the CAB file can be written, then added to the
MSI file.
class msilib.Directory(database, cab, basedir, physical, logical, default[, componentflags])¶
Create a new directory in the Directory table. There is a current component at
each point in time for the directory, which is either explicitly created through
start_component(), or implicitly when files are added for the first time.
Files are added into the current component, and into the cab file. To create a
directory, a base directory object needs to be specified (can be None), the
path to the physical directory, and a logical directory name. default
specifies the DefaultDir slot in the directory table. componentflags specifies
the default flags that new components get.
Add an entry to the Component table, and make this component the current
component for this directory. If no component name is given, the directory
name is used. If no feature is given, the current feature is used. If no
flags are given, the directory’s default flags are used. If no keyfile
is given, the KeyPath is left null in the Component table.
Add a file to the current component of the directory, starting a new one
if there is no current component. By default, the file name in the source
and the file table will be identical. If the src file is specified, it
is interpreted relative to the current directory. Optionally, a version
and a language can be specified for the entry in the File table.
class msilib.Feature(db, id, title, desc, display, level=1, parent=None, directory=None, attributes=0)¶
Add a new record to the Feature table, using the values id, parent.id,
title, desc, display, level, directory, and attributes. The
resulting feature object can be passed to the start_component() method of
Directory.
Make this feature the current feature of msilib. New components are
automatically added to the default feature, unless a feature is explicitly
specified.
msilib provides several classes that wrap the GUI tables in an MSI
database. However, no standard user interface is provided; use bdist_msi
to create MSI files with a user-interface for installing Python packages.
Add a radio button named name to the group, at the coordinates x, y,
width, height, and with the label text. If value is None, it
defaults to name.
class msilib.Dialog(db, name, x, y, w, h, attr, title, first, default, cancel)¶
Return a new Dialog object. An entry in the Dialog table is made,
with the specified coordinates, dialog attributes, title, name of the first,
default, and cancel controls.
This is the standard MSI schema for MSI 2.0, with the tables variable
providing a list of table definitions, and _Validation_records providing the
data for MSI validation.
This module contains table contents for the standard sequence tables:
AdminExecuteSequence, AdminUISequence, AdvtExecuteSequence,
InstallExecuteSequence, and InstallUISequence.
This module contains definitions for the UIText and ActionText tables, for the
standard installer actions.
msvcrt – Useful routines from the MS VC++ runtime¶
These functions provide access to some useful capabilities on Windows platforms.
Some higher-level modules use these functions to build the Windows
implementations of their services. For example, the getpass module uses
this in the implementation of the getpass() function.
Further documentation on these functions can be found in the Platform API
documentation.
The module implements both the normal and wide char variants of the console I/O
api. The normal API deals only with ASCII characters and is of limited use
for internationalized applications. The wide char API should be used where
ever possible
Lock part of a file based on file descriptor fd from the C runtime. Raises
IOError on failure. The locked region of the file extends from the
current file position for nbytes bytes, and may continue beyond the end of the
file. mode must be one of the LK_* constants listed below. Multiple
regions in a file may be locked at the same time, but may not overlap. Adjacent
regions are not merged; they must be unlocked individually.
Locks the specified bytes. If the bytes cannot be locked, the program
immediately tries again after 1 second. If, after 10 attempts, the bytes cannot
be locked, IOError is raised.
Create a C runtime file descriptor from the file handle handle. The flags
parameter should be a bitwise OR of os.O_APPEND, os.O_RDONLY,
and os.O_TEXT. The returned file descriptor may be used as a parameter
to os.fdopen() to create a file object.
Read a keypress and return the resulting character as a byte string.
Nothing is echoed to the console. This call will block if a keypress
is not already available, but will not wait for Enter to be
pressed. If the pressed key was a special function key, this will
return '\000' or '\xe0'; the next call will return the keycode.
The Control-C keypress cannot be read with this function.
These functions expose the Windows registry API to Python. Instead of using an
integer as the registry handle, a handle object is used
to ensure that the handles are closed correctly, even if the programmer neglects
to explicitly close them.
Creates or opens the specified key, returning a
handle object.
key is an already open key, or one of the predefined
HKEY_* constants.
sub_key is a string that names the key this method opens or creates.
res is a reserved integer, and must be zero. The default is zero.
sam is an integer that specifies an access mask that describes the desired
security access for the key. Default is KEY_ALL_ACCESS. See
Access Rights for other allowed values.
If key is one of the predefined keys, sub_key may be None. In that
case, the handle returned is the same key handle passed in to the function.
If the key already exists, this function opens the existing key.
The return value is the handle of the opened key. If the function fails, a
WindowsError exception is raised.
The DeleteKeyEx() function is implemented with the RegDeleteKeyEx
Windows API function, which is specific to 64-bit versions of Windows.
See the RegDeleteKeyEx documentation.
key is an already open key, or one of the predefined
HKEY_* constants.
sub_key is a string that must be a subkey of the key identified by the
key parameter. This value must not be None, and the key may not have
subkeys.
res is a reserved integer, and must be zero. The default is zero.
sam is an integer that specifies an access mask that describes the desired
security access for the key. Default is KEY_ALL_ACCESS. See
Access Rights for other allowed values.
This method can not delete keys with subkeys.
If the method succeeds, the entire key, including all of its values, is
removed. If the method fails, a WindowsError exception is raised.
Enumerates subkeys of an open registry key, returning a string.
key is an already open key, or one of the predefined
HKEY_* constants.
index is an integer that identifies the index of the key to retrieve.
The function retrieves the name of one subkey each time it is called. It is
typically called repeatedly until a WindowsError exception is
raised, indicating, no more values are available.
Enumerates values of an open registry key, returning a tuple.
key is an already open key, or one of the predefined
HKEY_* constants.
index is an integer that identifies the index of the value to retrieve.
The function retrieves the name of one subkey each time it is called. It is
typically called repeatedly, until a WindowsError exception is
raised, indicating no more values.
The result is a tuple of 3 items:
Index
Meaning
0
A string that identifies the value name
1
An object that holds the value data, and
whose type depends on the underlying
registry type
2
An integer that identifies the type of the
value data (see table in docs for
SetValueEx())
Writes all the attributes of a key to the registry.
key is an already open key, or one of the predefined
HKEY_* constants.
It is not necessary to call FlushKey() to change a key. Registry changes are
flushed to disk by the registry using its lazy flusher. Registry changes are
also flushed to disk at system shutdown. Unlike CloseKey(), the
FlushKey() method returns only when all the data has been written to the
registry. An application should only call FlushKey() if it requires
absolute certainty that registry changes are on disk.
Note
If you don’t know whether a FlushKey() call is required, it probably
isn’t.
sub_key is a string that identifies the subkey to load.
file_name is the name of the file to load registry data from. This file must
have been created with the SaveKey() function. Under the file allocation
table (FAT) file system, the filename may not have an extension.
A call to LoadKey() fails if the calling process does not have the
SE_RESTORE_PRIVILEGE privilege. Note that privileges are different
from permissions – see the RegLoadKey documentation for
more details.
If key is a handle returned by ConnectRegistry(), then the path
specified in file_name is relative to the remote computer.
key is an already open key, or one of the predefined
HKEY_* constants.
sub_key is a string that identifies the sub_key to open.
res is a reserved integer, and must be zero. The default is zero.
sam is an integer that specifies an access mask that describes the desired
security access for the key. Default is KEY_READ. See Access
Rights for other allowed values.
Retrieves the unnamed value for a key, as a string.
key is an already open key, or one of the predefined
HKEY_* constants.
sub_key is a string that holds the name of the subkey with which the value is
associated. If this parameter is None or empty, the function retrieves the
value set by the SetValue() method for the key identified by key.
Values in the registry have name, type, and data components. This method
retrieves the data for a key’s first value that has a NULL name. But the
underlying API call doesn’t return the type, so always use
QueryValueEx() if possible.
Saves the specified key, and all its subkeys to the specified file.
key is an already open key, or one of the predefined
HKEY_* constants.
file_name is the name of the file to save registry data to. This file
cannot already exist. If this filename includes an extension, it cannot be
used on file allocation table (FAT) file systems by the LoadKey()
method.
If key represents a key on a remote computer, the path described by
file_name is relative to the remote computer. The caller of this method must
possess the SeBackupPrivilege security privilege. Note that
privileges are different than permissions – see the
Conflicts Between User Rights and Permissions documentation
for more details.
This function passes NULL for security_attributes to the API.
key is an already open key, or one of the predefined
HKEY_* constants.
sub_key is a string that names the subkey with which the value is associated.
type is an integer that specifies the type of the data. Currently this must be
REG_SZ, meaning only strings are supported. Use the SetValueEx()
function for support for other data types.
value is a string that specifies the new value.
If the key specified by the sub_key parameter does not exist, the SetValue
function creates it.
Value lengths are limited by available memory. Long values (more than 2048
bytes) should be stored as files with the filenames stored in the configuration
registry. This helps the registry perform efficiently.
The key identified by the key parameter must have been opened with
KEY_SET_VALUE access.
Stores data in the value field of an open registry key.
key is an already open key, or one of the predefined
HKEY_* constants.
value_name is a string that names the subkey with which the value is
associated.
type is an integer that specifies the type of the data. See
Value Types for the available types.
reserved can be anything – zero is always passed to the API.
value is a string that specifies the new value.
This method can also set additional value and type information for the specified
key. The key identified by the key parameter must have been opened with
KEY_SET_VALUE access.
Value lengths are limited by available memory. Long values (more than 2048
bytes) should be stored as files with the filenames stored in the configuration
registry. This helps the registry perform efficiently.
Disables registry reflection for 32-bit processes running on a 64-bit
operating system.
key is an already open key, or one of the predefined HKEY_* constants.
Will generally raise NotImplemented if executed on a 32-bit operating
system.
If the key is not on the reflection list, the function succeeds but has no
effect. Disabling reflection for a key does not affect reflection of any
subkeys.
Registry entries subordinate to this key define types (or classes) of
documents and the properties associated with those types. Shell and
COM applications use the information stored under this key.
Registry entries subordinate to this key define the preferences of
the current user. These preferences include the settings of
environment variables, data about program groups, colors, printers,
network connections, and application preferences.
Registry entries subordinate to this key define the physical state
of the computer, including data about the bus type, system memory,
and installed hardware and software.
Registry entries subordinate to this key define the default user
configuration for new users on the local computer and the user
configuration for the current user.
Registry entries subordinate to this key allow you to access
performance data. The data is not actually stored in the registry;
the registry functions cause the system to collect the data from
its source.
This object wraps a Windows HKEY object, automatically closing it when the
object is destroyed. To guarantee cleanup, you can call either the
Close() method on the object, or the CloseKey() function.
All registry functions in this module return one of these objects.
All registry functions in this module which accept a handle object also accept
an integer, however, use of the handle object is encouraged.
Handle objects provide semantics for __bool__() – thus
if handle:
print("Yes")
will print Yes if the handle is currently valid (has not been closed or
detached).
The object also support comparison semantics, so handle objects will compare
true if they both reference the same underlying Windows handle value.
Handle objects can be converted to an integer (e.g., using the built-in
int() function), in which case the underlying Windows handle value is
returned. You can also use the Detach() method to return the
integer handle, and also disconnect the Windows handle from the handle object.
Detaches the Windows handle from the handle object.
The result is an integer that holds the value of the handle before it is
detached. If the handle is already detached or closed, this will return
zero.
After calling this function, the handle is effectively invalidated, but the
handle is not closed. You would call this function when you need the
underlying Win32 handle to exist beyond the lifetime of the handle object.
Beep the PC’s speaker. The frequency parameter specifies frequency, in hertz,
of the sound, and must be in the range 37 through 32,767. The duration
parameter specifies the number of milliseconds the sound should last. If the
system is not able to beep the speaker, RuntimeError is raised.
Call the underlying PlaySound() function from the Platform API. The
sound parameter may be a filename, audio data as a string, or None. Its
interpretation depends on the value of flags, which can be a bitwise ORed
combination of the constants described below. If the sound parameter is
None, any currently playing waveform sound is stopped. If the system
indicates an error, RuntimeError is raised.
Call the underlying MessageBeep() function from the Platform API. This
plays a sound as specified in the registry. The type argument specifies which
sound to play; possible values are -1, MB_ICONASTERISK,
MB_ICONEXCLAMATION, MB_ICONHAND, MB_ICONQUESTION, and MB_OK, all
described below. The value -1 produces a “simple beep”; this is the final
fallback if a sound cannot be played otherwise.
The sound parameter is a sound association name from the registry. If the
registry contains no such name, play the system default sound unless
SND_NODEFAULT is also specified. If no default sound is registered,
raise RuntimeError. Do not use with SND_FILENAME.
All Win32 systems support at least the following; most systems support many
more:
import winsound
# Play Windows exit sound.
winsound.PlaySound("SystemExit", winsound.SND_ALIAS)
# Probably play Windows default sound, if any is registered (because
# "*" probably isn't the registered name of any sound).
winsound.PlaySound("*", winsound.SND_ALIAS)
The modules described in this chapter provide interfaces to features that are
unique to the Unix operating system, or in some cases to some or many variants
of it. Here’s an overview:
This module provides access to operating system functionality that is
standardized by the C Standard and the POSIX standard (a thinly disguised Unix
interface).
Do not import this module directly. Instead, import the module os,
which provides a portable version of this interface. On Unix, the os
module provides a superset of the posix interface. On non-Unix operating
systems the posix module is not available, but a subset is always
available through the os interface. Once os is imported, there is
no performance penalty in using it instead of posix. In addition,
os provides some additional functionality, such as automatically calling
putenv() when an entry in os.environ is changed.
Errors are reported as exceptions; the usual exceptions are given for type
errors, while errors reported by the system calls raise OSError.
Several operating systems (including AIX, HP-UX, Irix and Solaris) provide
support for files that are larger than 2 GB from a C programming model where
int and long are 32-bit values. This is typically accomplished
by defining the relevant size and offset types as 64-bit values. Such files are
sometimes referred to as large files.
Large file support is enabled in Python when the size of an off_t is
larger than a long and the longlong type is available and is
at least as large as an off_t.
It may be necessary to configure and compile Python with certain compiler flags
to enable this mode. For example, it is enabled by default with recent versions
of Irix, but with Solaris 2.6 and 2.7 you need to do something like:
A dictionary representing the string environment at the time the interpreter
was started. Keys and values are bytes on Unix and str on Windows. For
example, environ[b'HOME'] (environ['HOME'] on Windows) is the
pathname of your home directory, equivalent to getenv("HOME") in C.
Modifying this dictionary does not affect the string environment passed on by
execv(), popen() or system(); if you need to change the
environment, pass environ to execve() or add variable assignments and
export statements to the command string for system() or popen().
Changed in version 3.2:
Changed in version 3.2: On Unix, keys and values are bytes.
Note
The os module provides an alternate implementation of environ
which updates the environment on modification. Note also that updating
os.environ will render this dictionary obsolete. Use of the
os module version of this is recommended over direct access to the
posix module.
This module provides access to the Unix user account and password database. It
is available on all Unix versions.
Password database entries are reported as a tuple-like object, whose attributes
correspond to the members of the passwd structure (Attribute field below,
see <pwd.h>):
Index
Attribute
Meaning
0
pw_name
Login name
1
pw_passwd
Optional encrypted password
2
pw_uid
Numerical user ID
3
pw_gid
Numerical group ID
4
pw_gecos
User name or comment field
5
pw_dir
User home directory
6
pw_shell
User command interpreter
The uid and gid items are integers, all others are strings. KeyError is
raised if the entry asked for cannot be found.
Note
In traditional Unix the field pw_passwd usually contains a password
encrypted with a DES derived algorithm (see module crypt). However most
modern unices use a so-called shadow password system. On those unices the
pw_passwd field only contains an asterisk ('*') or the letter 'x'
where the encrypted password is stored in a file /etc/shadow which is
not world readable. Whether the pw_passwd field contains anything useful is
system-dependent. If available, the spwd module should be used where
access to the encrypted password is required.
This module provides access to the Unix shadow password database. It is
available on various Unix versions.
You must have enough privileges to access the shadow password database (this
usually means you have to be root).
Shadow password database entries are reported as a tuple-like object, whose
attributes correspond to the members of the spwd structure (Attribute field
below, see <shadow.h>):
Index
Attribute
Meaning
0
sp_nam
Login name
1
sp_pwd
Encrypted password
2
sp_lstchg
Date of last change
3
sp_min
Minimal number of days between
changes
4
sp_max
Maximum number of days between
changes
5
sp_warn
Number of days before password
expires to warn user about it
6
sp_inact
Number of days after password
expires until account is
blocked
7
sp_expire
Number of days since 1970-01-01
until account is disabled
8
sp_flag
Reserved
The sp_nam and sp_pwd items are strings, all others are integers.
KeyError is raised if the entry asked for cannot be found.
This module provides access to the Unix group database. It is available on all
Unix versions.
Group database entries are reported as a tuple-like object, whose attributes
correspond to the members of the group structure (Attribute field below, see
<pwd.h>):
Index
Attribute
Meaning
0
gr_name
the name of the group
1
gr_passwd
the (encrypted) group password;
often empty
2
gr_gid
the numerical group ID
3
gr_mem
all the group member’s user
names
The gid is an integer, name and password are strings, and the member list is a
list of strings. (Note that most users are not explicitly listed as members of
the group they are in according to the password database. Check both databases
to get complete membership information. Also note that a gr_name that
starts with a + or - is likely to be a YP/NIS reference and may not be
accessible via getgrnam() or getgrgid().)
This module implements an interface to the crypt(3) routine, which is
a one-way hash function based upon a modified DES algorithm; see the Unix man
page for further details. Possible uses include allowing Python scripts to
accept typed passwords from the user, or attempting to crack Unix passwords with
a dictionary.
Notice that the behavior of this module depends on the actual implementation of
the crypt(3) routine in the running system. Therefore, any
extensions available on the current implementation will also be available on
this module.
word will usually be a user’s password as typed at a prompt or in a graphical
interface. salt is usually a random two-character string which will be used
to perturb the DES algorithm in one of 4096 ways. The characters in salt must
be in the set [./a-zA-Z0-9]. Returns the hashed password as a string, which
will be composed of characters from the same alphabet as the salt (the first two
characters represent the salt itself).
Since a few crypt(3) extensions allow different values, with
different sizes in the salt, it is recommended to use the full crypted
password as salt when checking for a password.
A simple example illustrating typical use:
import crypt, getpass, pwd
def login():
username = input('Python login:')
cryptedpasswd = pwd.getpwnam(username)[1]
if cryptedpasswd:
if cryptedpasswd == 'x' or cryptedpasswd == '*':
raise "Sorry, currently no support for shadow passwords"
cleartext = getpass.getpass()
return crypt.crypt(cleartext, cryptedpasswd) == cryptedpasswd
else:
return 1
This module provides an interface to the POSIX calls for tty I/O control. For a
complete description of these calls, see the POSIX or Unix manual pages. It is
only available for those Unix versions that support POSIX termios style tty
I/O control (and then only if configured at installation time).
All functions in this module take a file descriptor fd as their first
argument. This can be an integer file descriptor, such as returned by
sys.stdin.fileno(), or a file object, such as sys.stdin itself.
This module also defines all the constants needed to work with the functions
provided here; these have the same name as their counterparts in C. Please
refer to your system documentation for more information on using these terminal
control interfaces.
Return a list containing the tty attributes for file descriptor fd, as
follows: [iflag,oflag,cflag,lflag,ispeed,ospeed,cc] where cc is a
list of the tty special characters (each a string of length 1, except the
items with indices VMIN and VTIME, which are integers when
these fields are defined). The interpretation of the flags and the speeds as
well as the indexing in the cc array must be done using the symbolic
constants defined in the termios module.
Set the tty attributes for file descriptor fd from the attributes, which is
a list like the one returned by tcgetattr(). The when argument
determines when the attributes are changed: TCSANOW to change
immediately, TCSADRAIN to change after transmitting all queued output,
or TCSAFLUSH to change after transmitting all queued output and
discarding all queued input.
Discard queued data on file descriptor fd. The queue selector specifies
which queue: TCIFLUSH for the input queue, TCOFLUSH for the
output queue, or TCIOFLUSH for both queues.
Suspend or resume input or output on file descriptor fd. The action
argument can be TCOOFF to suspend output, TCOON to restart
output, TCIOFF to suspend input, or TCION to restart input.
Here’s a function that prompts for a password with echoing turned off. Note the
technique using a separate tcgetattr() call and a try ...
finally statement to ensure that the old tty attributes are restored
exactly no matter what happens:
The pty module defines operations for handling the pseudo-terminal
concept: starting another process and being able to write to and read from its
controlling terminal programmatically.
Because pseudo-terminal handling is highly platform dependent, there is code to
do it only for Linux. (The Linux code is supposed to work on other platforms,
but hasn’t been tested yet.)
Fork. Connect the child’s controlling terminal to a pseudo-terminal. Return
value is (pid,fd). Note that the child gets pid 0, and the fd is
invalid. The parent’s return value is the pid of the child, and fd is a
file descriptor connected to the child’s controlling terminal (and also to the
child’s standard input and output).
Open a new pseudo-terminal pair, using os.openpty() if possible, or
emulation code for generic Unix systems. Return a pair of file descriptors
(master,slave), for the master and the slave end, respectively.
Spawn a process, and connect its controlling terminal with the current
process’s standard io. This is often used to baffle programs which insist on
reading from the controlling terminal.
The functions master_read and stdin_read should be functions which read from
a file descriptor. The defaults try to read 1024 bytes each time they are
called.
The following program acts like the Unix command script(1), using a
pseudo-terminal to record all input and output of a terminal session in a
“typescript”.
import sys, os, time, getopt
import pty
mode = 'wb'
shell = 'sh'
filename = 'typescript'
if 'SHELL' in os.environ:
shell = os.environ['SHELL']
try:
opts, args = getopt.getopt(sys.argv[1:], 'ap')
except getopt.error as msg:
print('%s: %s' % (sys.argv[0], msg))
sys.exit(2)
for opt, arg in opts:
# option -a: append to typescript file
if opt == '-a':
mode = 'ab'
# option -p: use a Python shell as the terminal command
elif opt == '-p':
shell = sys.executable
if args:
filename = args[0]
script = open(filename, mode)
def read(fd):
data = os.read(fd, 1024)
script.write(data)
return data
sys.stdout.write('Script started, file is %s\n' % filename)
script.write(('Script started on %s\n' % time.asctime()).encode())
pty.spawn(shell, read)
script.write(('Script done on %s\n' % time.asctime()).encode())
sys.stdout.write('Script done, file is %s\n' % filename)
This module performs file control and I/O control on file descriptors. It is an
interface to the fcntl() and ioctl() Unix routines.
All functions in this module take a file descriptor fd as their first
argument. This can be an integer file descriptor, such as returned by
sys.stdin.fileno(), or a io.IOBase object, such as sys.stdin
itself, which provides a fileno() that returns a genuine file descriptor.
Perform the requested operation on file descriptor fd (file objects providing
a fileno() method are accepted as well). The operation is defined by op
and is operating system dependent. These codes are also found in the
fcntl module. The argument arg is optional, and defaults to the integer
value 0. When present, it can either be an integer value, or a string.
With the argument missing or an integer value, the return value of this function
is the integer return value of the C fcntl() call. When the argument is
a string it represents a binary structure, e.g. created by struct.pack().
The binary data is copied to a buffer whose address is passed to the C
fcntl() call. The return value after a successful call is the contents
of the buffer, converted to a string object. The length of the returned string
will be the same as the length of the arg argument. This is limited to 1024
bytes. If the information returned in the buffer by the operating system is
larger than 1024 bytes, this is most likely to result in a segmentation
violation or a more subtle data corruption.
This function is identical to the fcntl() function, except that the
argument handling is even more complicated.
The op parameter is limited to values that can fit in 32-bits.
The parameter arg can be one of an integer, absent (treated identically to the
integer 0), an object supporting the read-only buffer interface (most likely
a plain Python string) or an object supporting the read-write buffer interface.
In all but the last case, behaviour is as for the fcntl() function.
If a mutable buffer is passed, then the behaviour is determined by the value of
the mutate_flag parameter.
If it is false, the buffer’s mutability is ignored and behaviour is as for a
read-only buffer, except that the 1024 byte limit mentioned above is avoided –
so long as the buffer you pass is as least as long as what the operating system
wants to put there, things should work.
If mutate_flag is true (the default), then the buffer is (in effect) passed
to the underlying ioctl() system call, the latter’s return code is
passed back to the calling Python, and the buffer’s new contents reflect the
action of the ioctl(). This is a slight simplification, because if the
supplied buffer is less than 1024 bytes long it is first copied into a static
buffer 1024 bytes long which is then passed to ioctl() and copied back
into the supplied buffer.
Perform the lock operation op on file descriptor fd (file objects providing
a fileno() method are accepted as well). See the Unix manual
flock(2) for details. (On some systems, this function is emulated
using fcntl().)
This is essentially a wrapper around the fcntl() locking calls. fd is
the file descriptor of the file to lock or unlock, and operation is one of the
following values:
LOCK_UN – unlock
LOCK_SH – acquire a shared lock
LOCK_EX – acquire an exclusive lock
When operation is LOCK_SH or LOCK_EX, it can also be
bitwise ORed with LOCK_NB to avoid blocking on lock acquisition.
If LOCK_NB is used and the lock cannot be acquired, an
IOError will be raised and the exception will have an errno
attribute set to EACCES or EAGAIN (depending on the
operating system; for portability, check for both values). On at least some
systems, LOCK_EX can only be used if the file descriptor refers to a
file opened for writing.
length is the number of bytes to lock, start is the byte offset at which the
lock starts, relative to whence, and whence is as with fileobj.seek(),
specifically:
0 – relative to the start of the file (SEEK_SET)
1 – relative to the current buffer position (SEEK_CUR)
2 – relative to the end of the file (SEEK_END)
The default for start is 0, which means to start at the beginning of the file.
The default for length is 0 which means to lock to the end of the file. The
default for whence is also 0.
Note that in the first example the return value variable rv will hold an
integer value; in the second example it will hold a string value. The structure
lay-out for the lockdata variable is system dependent — therefore using the
flock() call may be better.
If the locking flags O_SHLOCK and O_EXLOCK are present
in the os module (on BSD only), the os.open() function
provides an alternative to the lockf() and flock() functions.
If flag is true, turn debugging on. Otherwise, turn debugging off. When
debugging is on, commands to be executed are printed, and the shell is given
set-x command to be more verbose.
Append a new action at the end. The cmd variable must be a valid bourne shell
command. The kind variable consists of two letters.
The first letter can be either of '-' (which means the command reads its
standard input), 'f' (which means the commands reads a given file on the
command line) or '.' (which means the commands reads no input, and hence
must be first.)
Similarly, the second letter can be either of '-' (which means the command
writes to standard output), 'f' (which means the command writes a file on
the command line) or '.' (which means the command does not write anything,
and hence must be last.)
Resources usage can be limited using the setrlimit() function described
below. Each resource is controlled by a pair of limits: a soft limit and a hard
limit. The soft limit is the current limit, and may be lowered or raised by a
process over time. The soft limit can never exceed the hard limit. The hard
limit can be lowered to any value greater than the soft limit, but not raised.
(Only processes with the effective UID of the super-user can raise a hard
limit.)
The specific resources that can be limited are system dependent. They are
described in the getrlimit(2) man page. The resources listed below
are supported when the underlying operating system supports them; resources
which cannot be checked or controlled by the operating system are not defined in
this module for those platforms.
Returns a tuple (soft,hard) with the current soft and hard limits of
resource. Raises ValueError if an invalid resource is specified, or
error if the underlying system call fails unexpectedly.
Sets new limits of consumption of resource. The limits argument must be a
tuple (soft,hard) of two integers describing the new limits. A value of
-1 can be used to specify the maximum possible upper limit.
Raises ValueError if an invalid resource is specified, if the new soft
limit exceeds the hard limit, or if a process tries to raise its hard limit
(unless the process has an effective UID of super-user). Can also raise
error if the underlying system call fails.
These symbols define resources whose consumption can be controlled using the
setrlimit() and getrlimit() functions described below. The values of
these symbols are exactly the constants used by C programs.
The Unix man page for getrlimit(2) lists the available resources.
Note that not all systems use the same symbol or same value to denote the same
resource. This module does not attempt to mask platform differences — symbols
not defined for a platform will not be available from this module on that
platform.
The maximum size (in bytes) of a core file that the current process can create.
This may result in the creation of a partial core file if a larger core would be
required to contain the entire process image.
The maximum amount of processor time (in seconds) that a process can use. If
this limit is exceeded, a SIGXCPU signal is sent to the process. (See
the signal module documentation for information about how to catch this
signal and do something useful, e.g. flush open files to disk.)
This function returns an object that describes the resources consumed by either
the current process or its children, as specified by the who parameter. The
who parameter should be specified using one of the RUSAGE_*
constants described below.
The fields of the return value each describe how a particular system resource
has been used, e.g. amount of time spent running is user mode or number of times
the process was swapped out of main memory. Some values are dependent on the
clock tick internal, e.g. the amount of memory the process is using.
For backward compatibility, the return value is also accessible as a tuple of 16
elements.
The fields ru_utime and ru_stime of the return value are
floating point values representing the amount of time spent executing in user
mode and the amount of time spent executing in system mode, respectively. The
remaining values are integers. Consult the getrusage(2) man page for
detailed information about these values. A brief summary is presented here:
Index
Field
Resource
0
ru_utime
time in user mode (float)
1
ru_stime
time in system mode (float)
2
ru_maxrss
maximum resident set size
3
ru_ixrss
shared memory size
4
ru_idrss
unshared memory size
5
ru_isrss
unshared stack size
6
ru_minflt
page faults not requiring I/O
7
ru_majflt
page faults requiring I/O
8
ru_nswap
number of swap outs
9
ru_inblock
block input operations
10
ru_oublock
block output operations
11
ru_msgsnd
messages sent
12
ru_msgrcv
messages received
13
ru_nsignals
signals received
14
ru_nvcsw
voluntary context switches
15
ru_nivcsw
involuntary context switches
This function will raise a ValueError if an invalid who parameter is
specified. It may also raise error exception in unusual circumstances.
Returns the number of bytes in a system page. (This need not be the same as the
hardware page size.) This function is useful for determining the number of bytes
of memory a process is using. The third element of the tuple returned by
getrusage() describes memory usage in pages; multiplying by page size
produces number of bytes.
The following RUSAGE_* symbols are passed to the getrusage()
function to specify which processes information should be provided for.
Return the match for key in map mapname, or raise an error
(nis.error) if there is none. Both should be strings, key is 8-bit
clean. Return value is an arbitrary array of bytes (may contain NULL and
other joys).
Note that mapname is first checked if it is an alias to another name.
The domain argument allows to override the NIS domain used for the lookup. If
unspecified, lookup is in the default NIS domain.
Return a dictionary mapping key to value such that match(key,mapname)==value. Note that both keys and values of the dictionary are
arbitrary arrays of bytes.
Note that mapname is first checked if it is an alias to another name.
The domain argument allows to override the NIS domain used for the lookup. If
unspecified, lookup is in the default NIS domain.
This module provides an interface to the Unix syslog library routines.
Refer to the Unix manual pages for a detailed description of the syslog
facility.
This module wraps the system syslog family of routines. A pure Python
library that can speak to a syslog server is available in the
logging.handlers module as SysLogHandler.
Send the string message to the system logger. A trailing newline is added
if necessary. Each message is tagged with a priority composed of a
facility and a level. The optional priority argument, which defaults
to LOG_INFO, determines the message priority. If the facility is
not encoded in priority using logical-or (LOG_INFO|LOG_USER), the
value given in the openlog() call is used.
If openlog() has not been called prior to the call to syslog(),
openlog() will be called with no arguments.
Logging options of subsequent syslog() calls can be set by calling
openlog(). syslog() will call openlog() with no arguments
if the log is not currently open.
The optional ident keyword argument is a string which is prepended to every
message, and defaults to sys.argv[0] with leading path components
stripped. The optional logopt keyword argument (default is 0) is a bit
field – see below for possible values to combine. The optional facility
keyword argument (default is LOG_USER) sets the default facility for
messages which do not have a facility explicitly encoded.
Changed in version 3.2:
Changed in version 3.2: In previous versions, keyword arguments were not allowed, and ident was
required. The default for ident was dependent on the system libraries,
and often was python instead of the name of the python program file.
Reset the syslog module values and call the system library closelog().
This causes the module to behave as it does when initially imported. For
example, openlog() will be called on the first syslog() call (if
openlog() hasn’t already been called), and ident and other
openlog() parameters are reset to defaults.
Set the priority mask to maskpri and return the previous mask value. Calls
to syslog() with a priority level not set in maskpri are ignored.
The default is to log all priorities. The function LOG_MASK(pri)
calculates the mask for the individual priority pri. The function
LOG_UPTO(pri) calculates the mask for all priorities up to and including
pri.
import syslog
syslog.syslog('Processing started')
if error:
syslog.syslog(syslog.LOG_ERR, 'Processing started')
An example of setting some log options, these would include the process ID in
logged messages, and write the messages to the destination facility used for
mail logging:
Here’s a quick listing of modules that are currently undocumented, but that
should be documented. Feel free to contribute documentation for them! (Send
via email to docs@python.org.)
The idea and original contents for this chapter were taken from a posting by
Fredrik Lundh; the specific contents of this chapter have been substantially
revised.
It is quite easy to add new built-in modules to Python, if you know how to
program in C. Such extension modules can do two things that can’t be
done directly in Python: they can implement new built-in object types, and they
can call C library functions and system calls.
To support extensions, the Python API (Application Programmers Interface)
defines a set of functions, macros and variables that provide access to most
aspects of the Python run-time system. The Python API is incorporated in a C
source file by including the header "Python.h".
The compilation of an extension module depends on its intended use as well as on
your system setup; details are given in later chapters.
Do note that if your use case is calling C library functions or system calls,
you should consider using the ctypes module rather than writing custom
C code. Not only does ctypes let you write Python code to interface
with C code, but it is more portable between implementations of Python than
writing and compiling an extension module which typically ties you to CPython.
Let’s create an extension module called spam (the favorite food of Monty
Python fans...) and let’s say we want to create a Python interface to the C
library function system(). [1] This function takes a null-terminated
character string as argument and returns an integer. We want this function to
be callable from Python as follows:
>>>importspam>>>status=spam.system("ls -l")
Begin by creating a file spammodule.c. (Historically, if a module is
called spam, the C file containing its implementation is called
spammodule.c; if the module name is very long, like spammify, the
module name can be just spammify.c.)
The first line of our file can be:
#include <Python.h>
which pulls in the Python API (you can add a comment describing the purpose of
the module and a copyright notice if you like).
Note
Since Python may define some pre-processor definitions which affect the standard
headers on some systems, you must include Python.h before any standard
headers are included.
All user-visible symbols defined by Python.h have a prefix of Py or
PY, except those defined in standard header files. For convenience, and
since they are used extensively by the Python interpreter, "Python.h"
includes a few standard header files: <stdio.h>, <string.h>,
<errno.h>, and <stdlib.h>. If the latter header file does not exist on
your system, it declares the functions malloc(), free() and
realloc() directly.
The next thing we add to our module file is the C function that will be called
when the Python expression spam.system(string) is evaluated (we’ll see
shortly how it ends up being called):
There is a straightforward translation from the argument list in Python (for
example, the single expression "ls-l") to the arguments passed to the C
function. The C function always has two arguments, conventionally named self
and args.
The self argument points to the module object for module-level functions;
for a method it would point to the object instance.
The args argument will be a pointer to a Python tuple object containing the
arguments. Each item of the tuple corresponds to an argument in the call’s
argument list. The arguments are Python objects — in order to do anything
with them in our C function we have to convert them to C values. The function
PyArg_ParseTuple() in the Python API checks the argument types and
converts them to C values. It uses a template string to determine the required
types of the arguments as well as the types of the C variables into which to
store the converted values. More about this later.
PyArg_ParseTuple() returns true (nonzero) if all arguments have the right
type and its components have been stored in the variables whose addresses are
passed. It returns false (zero) if an invalid argument list was passed. In the
latter case it also raises an appropriate exception so the calling function can
return NULL immediately (as we saw in the example).
An important convention throughout the Python interpreter is the following: when
a function fails, it should set an exception condition and return an error value
(usually a NULL pointer). Exceptions are stored in a static global variable
inside the interpreter; if this variable is NULL no exception has occurred. A
second global variable stores the “associated value” of the exception (the
second argument to raise). A third variable contains the stack
traceback in case the error originated in Python code. These three variables
are the C equivalents of the result in Python of sys.exc_info() (see the
section on module sys in the Python Library Reference). It is important
to know about them to understand how errors are passed around.
The Python API defines a number of functions to set various types of exceptions.
The most common one is PyErr_SetString(). Its arguments are an exception
object and a C string. The exception object is usually a predefined object like
PyExc_ZeroDivisionError. The C string indicates the cause of the error
and is converted to a Python string object and stored as the “associated value”
of the exception.
Another useful function is PyErr_SetFromErrno(), which only takes an
exception argument and constructs the associated value by inspection of the
global variable errno. The most general function is
PyErr_SetObject(), which takes two object arguments, the exception and
its associated value. You don’t need to Py_INCREF() the objects passed
to any of these functions.
You can test non-destructively whether an exception has been set with
PyErr_Occurred(). This returns the current exception object, or NULL
if no exception has occurred. You normally don’t need to call
PyErr_Occurred() to see whether an error occurred in a function call,
since you should be able to tell from the return value.
When a function f that calls another function g detects that the latter
fails, f should itself return an error value (usually NULL or -1). It
should not call one of the PyErr_*() functions — one has already
been called by g. f‘s caller is then supposed to also return an error
indication to its caller, again without calling PyErr_*(), and so on
— the most detailed cause of the error was already reported by the function
that first detected it. Once the error reaches the Python interpreter’s main
loop, this aborts the currently executing Python code and tries to find an
exception handler specified by the Python programmer.
(There are situations where a module can actually give a more detailed error
message by calling another PyErr_*() function, and in such cases it is
fine to do so. As a general rule, however, this is not necessary, and can cause
information about the cause of the error to be lost: most operations can fail
for a variety of reasons.)
To ignore an exception set by a function call that failed, the exception
condition must be cleared explicitly by calling PyErr_Clear(). The only
time C code should call PyErr_Clear() is if it doesn’t want to pass the
error on to the interpreter but wants to handle it completely by itself
(possibly by trying something else, or pretending nothing went wrong).
Every failing malloc() call must be turned into an exception — the
direct caller of malloc() (or realloc()) must call
PyErr_NoMemory() and return a failure indicator itself. All the
object-creating functions (for example, PyLong_FromLong()) already do
this, so this note is only relevant to those who call malloc() directly.
Also note that, with the important exception of PyArg_ParseTuple() and
friends, functions that return an integer status usually return a positive value
or zero for success and -1 for failure, like Unix system calls.
Finally, be careful to clean up garbage (by making Py_XDECREF() or
Py_DECREF() calls for objects you have already created) when you return
an error indicator!
The choice of which exception to raise is entirely yours. There are predeclared
C objects corresponding to all built-in Python exceptions, such as
PyExc_ZeroDivisionError, which you can use directly. Of course, you
should choose exceptions wisely — don’t use PyExc_TypeError to mean
that a file couldn’t be opened (that should probably be PyExc_IOError).
If something’s wrong with the argument list, the PyArg_ParseTuple()
function usually raises PyExc_TypeError. If you have an argument whose
value must be in a particular range or must satisfy other conditions,
PyExc_ValueError is appropriate.
You can also define a new exception that is unique to your module. For this, you
usually declare a static object variable at the beginning of your file:
staticPyObject*SpamError;
and initialize it in your module’s initialization function (PyInit_spam())
with an exception object (leaving out the error checking for now):
Note that the Python name for the exception object is spam.error. The
PyErr_NewException() function may create a class with the base class
being Exception (unless another class is passed in instead of NULL),
described in Built-in Exceptions.
Note also that the SpamError variable retains a reference to the newly
created exception class; this is intentional! Since the exception could be
removed from the module by external code, an owned reference to the class is
needed to ensure that it will not be discarded, causing SpamError to
become a dangling pointer. Should it become a dangling pointer, C code which
raises the exception could cause a core dump or other unintended side effects.
We discuss the use of PyMODINIT_FUNC as a function return type later in this
sample.
The spam.error exception can be raised in your extension module using a
call to PyErr_SetString() as shown below:
It returns NULL (the error indicator for functions returning object pointers)
if an error is detected in the argument list, relying on the exception set by
PyArg_ParseTuple(). Otherwise the string value of the argument has been
copied to the local variable command. This is a pointer assignment and
you are not supposed to modify the string to which it points (so in Standard C,
the variable command should properly be declared as constchar*command).
The next statement is a call to the Unix function system(), passing it
the string we just got from PyArg_ParseTuple():
sts=system(command);
Our spam.system() function must return the value of sts as a
Python object. This is done using the function PyLong_FromLong().
returnPyLong_FromLong(sts);
In this case, it will return an integer object. (Yes, even integers are objects
on the heap in Python!)
If you have a C function that returns no useful argument (a function returning
void), the corresponding Python function must return None. You
need this idiom to do so (which is implemented by the Py_RETURN_NONE
macro):
Py_INCREF(Py_None);returnPy_None;
Py_None is the C name for the special Python object None. It is a
genuine Python object rather than a NULL pointer, which means “error” in most
contexts, as we have seen.
The Module’s Method Table and Initialization Function¶
I promised to show how spam_system() is called from Python programs.
First, we need to list its name and address in a “method table”:
staticPyMethodDefSpamMethods[]={...{"system",spam_system,METH_VARARGS,"Execute a shell command."},...{NULL,NULL,0,NULL}/* Sentinel */};
Note the third entry (METH_VARARGS). This is a flag telling the interpreter
the calling convention to be used for the C function. It should normally always
be METH_VARARGS or METH_VARARGS|METH_KEYWORDS; a value of 0 means
that an obsolete variant of PyArg_ParseTuple() is used.
When using only METH_VARARGS, the function should expect the Python-level
parameters to be passed in as a tuple acceptable for parsing via
PyArg_ParseTuple(); more information on this function is provided below.
The METH_KEYWORDS bit may be set in the third field if keyword
arguments should be passed to the function. In this case, the C function should
accept a third PyObject\* parameter which will be a dictionary of keywords.
Use PyArg_ParseTupleAndKeywords() to parse the arguments to such a
function.
The method table must be referenced in the module definition structure:
staticstructPyModuleDefspammodule={PyModuleDef_HEAD_INIT,"spam",/* name of module */spam_doc,/* module documentation, may be NULL */-1,/* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */SpamMethods};
This structure, in turn, must be passed to the interpreter in the module’s
initialization function. The initialization function must be named
PyInit_name(), where name is the name of the module, and should be the
only non-static item defined in the module file:
Note that PyMODINIT_FUNC declares the function as PyObject* return type,
declares any special linkage declarations required by the platform, and for C++
declares the function as extern"C".
When the Python program imports module spam for the first time,
PyInit_spam() is called. (See below for comments about embedding Python.)
It calls PyModule_Create(), which returns a module object, and
inserts built-in function objects into the newly created module based upon the
table (an array of PyMethodDef structures) found in the module definition.
PyModule_Create() returns a pointer to the module object
that it creates. It may abort with a fatal error for
certain errors, or return NULL if the module could not be initialized
satisfactorily. The init function must return the module object to its caller,
so that it then gets inserted into sys.modules.
When embedding Python, the PyInit_spam() function is not called
automatically unless there’s an entry in the PyImport_Inittab table.
To add the module to the initialization table, use PyImport_AppendInittab(),
optionally followed by an import of the module:
intmain(intargc,char*argv[]){/* Add a built-in module, before Py_Initialize */PyImport_AppendInittab("spam",PyInit_spam);/* Pass argv[0] to the Python interpreter */Py_SetProgramName(argv[0]);/* Initialize the Python interpreter. Required. */Py_Initialize();/* Optionally import the module; alternatively, import can be deferred until the embedded script imports it. */PyImport_ImportModule("spam");
An example may be found in the file Demo/embed/demo.c in the Python
source distribution.
Note
Removing entries from sys.modules or importing compiled modules into
multiple interpreters within a process (or following a fork() without an
intervening exec()) can create problems for some extension modules.
Extension module authors should exercise caution when initializing internal data
structures.
A more substantial example module is included in the Python source distribution
as Modules/xxmodule.c. This file may be used as a template or simply
read as an example.
There are two more things to do before you can use your new extension: compiling
and linking it with the Python system. If you use dynamic loading, the details
may depend on the style of dynamic loading your system uses; see the chapters
about building extension modules (chapter Building C and C++ Extensions with distutils) and additional
information that pertains only to building on Windows (chapter
Building C and C++ Extensions on Windows) for more information about this.
If you can’t use dynamic loading, or if you want to make your module a permanent
part of the Python interpreter, you will have to change the configuration setup
and rebuild the interpreter. Luckily, this is very simple on Unix: just place
your file (spammodule.c for example) in the Modules/ directory
of an unpacked source distribution, add a line to the file
Modules/Setup.local describing your file:
spamspammodule.o
and rebuild the interpreter by running make in the toplevel
directory. You can also run make in the Modules/
subdirectory, but then you must first rebuild Makefile there by running
‘make Makefile’. (This is necessary each time you change the
Setup file.)
If your module requires additional libraries to link with, these can be listed
on the line in the configuration file as well, for instance:
So far we have concentrated on making C functions callable from Python. The
reverse is also useful: calling Python functions from C. This is especially the
case for libraries that support so-called “callback” functions. If a C
interface makes use of callbacks, the equivalent Python often needs to provide a
callback mechanism to the Python programmer; the implementation will require
calling the Python callback functions from a C callback. Other uses are also
imaginable.
Fortunately, the Python interpreter is easily called recursively, and there is a
standard interface to call a Python function. (I won’t dwell on how to call the
Python parser with a particular string as input — if you’re interested, have a
look at the implementation of the -c command line option in
Modules/main.c from the Python source code.)
Calling a Python function is easy. First, the Python program must somehow pass
you the Python function object. You should provide a function (or some other
interface) to do this. When this function is called, save a pointer to the
Python function object (be careful to Py_INCREF() it!) in a global
variable — or wherever you see fit. For example, the following function might
be part of a module definition:
staticPyObject*my_callback=NULL;staticPyObject*my_set_callback(PyObject*dummy,PyObject*args){PyObject*result=NULL;PyObject*temp;if(PyArg_ParseTuple(args,"O:set_callback",&temp)){if(!PyCallable_Check(temp)){PyErr_SetString(PyExc_TypeError,"parameter must be callable");returnNULL;}Py_XINCREF(temp);/* Add a reference to new callback */Py_XDECREF(my_callback);/* Dispose of previous callback */my_callback=temp;/* Remember new callback *//* Boilerplate to return "None" */Py_INCREF(Py_None);result=Py_None;}returnresult;}
The macros Py_XINCREF() and Py_XDECREF() increment/decrement the
reference count of an object and are safe in the presence of NULL pointers
(but note that temp will not be NULL in this context). More info on them
in section Reference Counts.
Later, when it is time to call the function, you call the C function
PyObject_CallObject(). This function has two arguments, both pointers to
arbitrary Python objects: the Python function, and the argument list. The
argument list must always be a tuple object, whose length is the number of
arguments. To call the Python function with no arguments, pass in NULL, or
an empty tuple; to call it with one argument, pass a singleton tuple.
Py_BuildValue() returns a tuple when its format string consists of zero
or more format codes between parentheses. For example:
intarg;PyObject*arglist;PyObject*result;...arg=123;.../* Time to call the callback */arglist=Py_BuildValue("(i)",arg);result=PyObject_CallObject(my_callback,arglist);Py_DECREF(arglist);
PyObject_CallObject() returns a Python object pointer: this is the return
value of the Python function. PyObject_CallObject() is
“reference-count-neutral” with respect to its arguments. In the example a new
tuple was created to serve as the argument list, which is Py_DECREF()-ed immediately after the call.
The return value of PyObject_CallObject() is “new”: either it is a brand
new object, or it is an existing object whose reference count has been
incremented. So, unless you want to save it in a global variable, you should
somehow Py_DECREF() the result, even (especially!) if you are not
interested in its value.
Before you do this, however, it is important to check that the return value
isn’t NULL. If it is, the Python function terminated by raising an exception.
If the C code that called PyObject_CallObject() is called from Python, it
should now return an error indication to its Python caller, so the interpreter
can print a stack trace, or the calling Python code can handle the exception.
If this is not possible or desirable, the exception should be cleared by calling
PyErr_Clear(). For example:
if(result==NULL)returnNULL;/* Pass error back */...useresult...Py_DECREF(result);
Depending on the desired interface to the Python callback function, you may also
have to provide an argument list to PyObject_CallObject(). In some cases
the argument list is also provided by the Python program, through the same
interface that specified the callback function. It can then be saved and used
in the same manner as the function object. In other cases, you may have to
construct a new tuple to pass as the argument list. The simplest way to do this
is to call Py_BuildValue(). For example, if you want to pass an integral
event code, you might use the following code:
PyObject*arglist;...arglist=Py_BuildValue("(l)",eventcode);result=PyObject_CallObject(my_callback,arglist);Py_DECREF(arglist);if(result==NULL)returnNULL;/* Pass error back *//* Here maybe use the result */Py_DECREF(result);
Note the placement of Py_DECREF(arglist) immediately after the call, before
the error check! Also note that strictly speaking this code is not complete:
Py_BuildValue() may run out of memory, and this should be checked.
You may also call a function with keyword arguments by using
PyObject_Call(), which supports arguments and keyword arguments. As in
the above example, we use Py_BuildValue() to construct the dictionary.
PyObject*dict;...dict=Py_BuildValue("{s:i}","name",val);result=PyObject_Call(my_callback,NULL,dict);Py_DECREF(dict);if(result==NULL)returnNULL;/* Pass error back *//* Here maybe use the result */Py_DECREF(result);
The arg argument must be a tuple object containing an argument list passed
from Python to a C function. The format argument must be a format string,
whose syntax is explained in Parsing arguments and building values in the Python/C API Reference
Manual. The remaining arguments must be addresses of variables whose type is
determined by the format string.
Note that while PyArg_ParseTuple() checks that the Python arguments have
the required types, it cannot check the validity of the addresses of C variables
passed to the call: if you make mistakes there, your code will probably crash or
at least overwrite random bits in memory. So be careful!
Note that any Python object references which are provided to the caller are
borrowed references; do not decrement their reference count!
Some example calls:
#define PY_SSIZE_T_CLEAN /* Make "s#" use Py_ssize_t rather than int. */#include <Python.h>
intok;inti,j;longk,l;constchar*s;Py_ssize_tsize;ok=PyArg_ParseTuple(args,"");/* No arguments *//* Python call: f() */
ok=PyArg_ParseTuple(args,"s",&s);/* A string *//* Possible Python call: f('whoops!') */
ok=PyArg_ParseTuple(args,"lls",&k,&l,&s);/* Two longs and a string *//* Possible Python call: f(1, 2, 'three') */
ok=PyArg_ParseTuple(args,"(ii)s#",&i,&j,&s,&size);/* A pair of ints and a string, whose size is also returned *//* Possible Python call: f((1, 2), 'three') */
{constchar*file;constchar*mode="r";intbufsize=0;ok=PyArg_ParseTuple(args,"s|si",&file,&mode,&bufsize);/* A string, and optionally another string and an integer *//* Possible Python calls: f('spam') f('spam', 'w') f('spam', 'wb', 100000) */}
{intleft,top,right,bottom,h,v;ok=PyArg_ParseTuple(args,"((ii)(ii))(ii)",&left,&top,&right,&bottom,&h,&v);/* A rectangle and a point *//* Possible Python call: f(((0, 0), (400, 300)), (10, 10)) */}
{Py_complexc;ok=PyArg_ParseTuple(args,"D:myfunction",&c);/* a complex, also providing a function name for errors *//* Possible Python call: myfunction(1+2j) */}
The arg and format parameters are identical to those of the
PyArg_ParseTuple() function. The kwdict parameter is the dictionary of
keywords received as the third parameter from the Python runtime. The kwlist
parameter is a NULL-terminated list of strings which identify the parameters;
the names are matched with the type information from format from left to
right. On success, PyArg_ParseTupleAndKeywords() returns true, otherwise
it returns false and raises an appropriate exception.
Note
Nested tuples cannot be parsed when using keyword arguments! Keyword parameters
passed in which are not present in the kwlist will cause TypeError to
be raised.
Here is an example module which uses keywords, based on an example by Geoff
Philbrick (philbrick@hks.com):
#include "Python.h"staticPyObject*keywdarg_parrot(PyObject*self,PyObject*args,PyObject*keywds){intvoltage;char*state="a stiff";char*action="voom";char*type="Norwegian Blue";staticchar*kwlist[]={"voltage","state","action","type",NULL};if(!PyArg_ParseTupleAndKeywords(args,keywds,"i|sss",kwlist,&voltage,&state,&action,&type))returnNULL;printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",action,voltage);printf("-- Lovely plumage, the %s -- It's %s!\n",type,state);Py_INCREF(Py_None);returnPy_None;}staticPyMethodDefkeywdarg_methods[]={/* The cast of the function is necessary since PyCFunction values * only take two PyObject* parameters, and keywdarg_parrot() takes * three. */{"parrot",(PyCFunction)keywdarg_parrot,METH_VARARGS|METH_KEYWORDS,"Print a lovely skit to standard output."},{NULL,NULL,0,NULL}/* sentinel */};
voidinitkeywdarg(void){/* Create the module and add the functions */Py_InitModule("keywdarg",keywdarg_methods);}
This function is the counterpart to PyArg_ParseTuple(). It is declared
as follows:
PyObject*Py_BuildValue(char*format,...);
It recognizes a set of format units similar to the ones recognized by
PyArg_ParseTuple(), but the arguments (which are input to the function,
not output) must not be pointers, just values. It returns a new Python object,
suitable for returning from a C function called from Python.
One difference with PyArg_ParseTuple(): while the latter requires its
first argument to be a tuple (since Python argument lists are always represented
as tuples internally), Py_BuildValue() does not always build a tuple. It
builds a tuple only if its format string contains two or more format units. If
the format string is empty, it returns None; if it contains exactly one
format unit, it returns whatever object is described by that format unit. To
force it to return a tuple of size 0 or one, parenthesize the format string.
Examples (to the left the call, to the right the resulting Python value):
In languages like C or C++, the programmer is responsible for dynamic allocation
and deallocation of memory on the heap. In C, this is done using the functions
malloc() and free(). In C++, the operators new and
delete are used with essentially the same meaning and we’ll restrict
the following discussion to the C case.
Every block of memory allocated with malloc() should eventually be
returned to the pool of available memory by exactly one call to free().
It is important to call free() at the right time. If a block’s address
is forgotten but free() is not called for it, the memory it occupies
cannot be reused until the program terminates. This is called a memory
leak. On the other hand, if a program calls free() for a block and then
continues to use the block, it creates a conflict with re-use of the block
through another malloc() call. This is called using freed memory.
It has the same bad consequences as referencing uninitialized data — core
dumps, wrong results, mysterious crashes.
Common causes of memory leaks are unusual paths through the code. For instance,
a function may allocate a block of memory, do some calculation, and then free
the block again. Now a change in the requirements for the function may add a
test to the calculation that detects an error condition and can return
prematurely from the function. It’s easy to forget to free the allocated memory
block when taking this premature exit, especially when it is added later to the
code. Such leaks, once introduced, often go undetected for a long time: the
error exit is taken only in a small fraction of all calls, and most modern
machines have plenty of virtual memory, so the leak only becomes apparent in a
long-running process that uses the leaking function frequently. Therefore, it’s
important to prevent leaks from happening by having a coding convention or
strategy that minimizes this kind of errors.
Since Python makes heavy use of malloc() and free(), it needs a
strategy to avoid memory leaks as well as the use of freed memory. The chosen
method is called reference counting. The principle is simple: every
object contains a counter, which is incremented when a reference to the object
is stored somewhere, and which is decremented when a reference to it is deleted.
When the counter reaches zero, the last reference to the object has been deleted
and the object is freed.
An alternative strategy is called automatic garbage collection.
(Sometimes, reference counting is also referred to as a garbage collection
strategy, hence my use of “automatic” to distinguish the two.) The big
advantage of automatic garbage collection is that the user doesn’t need to call
free() explicitly. (Another claimed advantage is an improvement in speed
or memory usage — this is no hard fact however.) The disadvantage is that for
C, there is no truly portable automatic garbage collector, while reference
counting can be implemented portably (as long as the functions malloc()
and free() are available — which the C Standard guarantees). Maybe some
day a sufficiently portable automatic garbage collector will be available for C.
Until then, we’ll have to live with reference counts.
While Python uses the traditional reference counting implementation, it also
offers a cycle detector that works to detect reference cycles. This allows
applications to not worry about creating direct or indirect circular references;
these are the weakness of garbage collection implemented using only reference
counting. Reference cycles consist of objects which contain (possibly indirect)
references to themselves, so that each object in the cycle has a reference count
which is non-zero. Typical reference counting implementations are not able to
reclaim the memory belonging to any objects in a reference cycle, or referenced
from the objects in the cycle, even though there are no further references to
the cycle itself.
The cycle detector is able to detect garbage cycles and can reclaim them so long
as there are no finalizers implemented in Python (__del__() methods).
When there are such finalizers, the detector exposes the cycles through the
gc module (specifically, the
garbage variable in that module). The gc module also exposes a way
to run the detector (the collect() function), as well as configuration
interfaces and the ability to disable the detector at runtime. The cycle
detector is considered an optional component; though it is included by default,
it can be disabled at build time using the --without-cycle-gc option
to the configure script on Unix platforms (including Mac OS X). If
the cycle detector is disabled in this way, the gc module will not be
available.
There are two macros, Py_INCREF(x) and Py_DECREF(x), which handle the
incrementing and decrementing of the reference count. Py_DECREF() also
frees the object when the count reaches zero. For flexibility, it doesn’t call
free() directly — rather, it makes a call through a function pointer in
the object’s type object. For this purpose (and others), every object
also contains a pointer to its type object.
The big question now remains: when to use Py_INCREF(x) and Py_DECREF(x)?
Let’s first introduce some terms. Nobody “owns” an object; however, you can
own a reference to an object. An object’s reference count is now defined
as the number of owned references to it. The owner of a reference is
responsible for calling Py_DECREF() when the reference is no longer
needed. Ownership of a reference can be transferred. There are three ways to
dispose of an owned reference: pass it on, store it, or call Py_DECREF().
Forgetting to dispose of an owned reference creates a memory leak.
It is also possible to borrow[2] a reference to an object. The
borrower of a reference should not call Py_DECREF(). The borrower must
not hold on to the object longer than the owner from which it was borrowed.
Using a borrowed reference after the owner has disposed of it risks using freed
memory and should be avoided completely. [3]
The advantage of borrowing over owning a reference is that you don’t need to
take care of disposing of the reference on all possible paths through the code
— in other words, with a borrowed reference you don’t run the risk of leaking
when a premature exit is taken. The disadvantage of borrowing over owning is
that there are some subtle situations where in seemingly correct code a borrowed
reference can be used after the owner from which it was borrowed has in fact
disposed of it.
A borrowed reference can be changed into an owned reference by calling
Py_INCREF(). This does not affect the status of the owner from which the
reference was borrowed — it creates a new owned reference, and gives full
owner responsibilities (the new owner must dispose of the reference properly, as
well as the previous owner).
Whenever an object reference is passed into or out of a function, it is part of
the function’s interface specification whether ownership is transferred with the
reference or not.
Most functions that return a reference to an object pass on ownership with the
reference. In particular, all functions whose function it is to create a new
object, such as PyLong_FromLong() and Py_BuildValue(), pass
ownership to the receiver. Even if the object is not actually new, you still
receive ownership of a new reference to that object. For instance,
PyLong_FromLong() maintains a cache of popular values and can return a
reference to a cached item.
The function PyImport_AddModule() also returns a borrowed reference, even
though it may actually create the object it returns: this is possible because an
owned reference to the object is stored in sys.modules.
When you pass an object reference into another function, in general, the
function borrows the reference from you — if it needs to store it, it will use
Py_INCREF() to become an independent owner. There are exactly two
important exceptions to this rule: PyTuple_SetItem() and
PyList_SetItem(). These functions take over ownership of the item passed
to them — even if they fail! (Note that PyDict_SetItem() and friends
don’t take over ownership — they are “normal.”)
When a C function is called from Python, it borrows references to its arguments
from the caller. The caller owns a reference to the object, so the borrowed
reference’s lifetime is guaranteed until the function returns. Only when such a
borrowed reference must be stored or passed on, it must be turned into an owned
reference by calling Py_INCREF().
The object reference returned from a C function that is called from Python must
be an owned reference — ownership is transferred from the function to its
caller.
There are a few situations where seemingly harmless use of a borrowed reference
can lead to problems. These all have to do with implicit invocations of the
interpreter, which can cause the owner of a reference to dispose of it.
The first and most important case to know about is using Py_DECREF() on
an unrelated object while borrowing a reference to a list item. For instance:
This function first borrows a reference to list[0], then replaces
list[1] with the value 0, and finally prints the borrowed reference.
Looks harmless, right? But it’s not!
Let’s follow the control flow into PyList_SetItem(). The list owns
references to all its items, so when item 1 is replaced, it has to dispose of
the original item 1. Now let’s suppose the original item 1 was an instance of a
user-defined class, and let’s further suppose that the class defined a
__del__() method. If this class instance has a reference count of 1,
disposing of it will call its __del__() method.
Since it is written in Python, the __del__() method can execute arbitrary
Python code. Could it perhaps do something to invalidate the reference to
item in bug()? You bet! Assuming that the list passed into
bug() is accessible to the __del__() method, it could execute a
statement to the effect of dellist[0], and assuming this was the last
reference to that object, it would free the memory associated with it, thereby
invalidating item.
The solution, once you know the source of the problem, is easy: temporarily
increment the reference count. The correct version of the function reads:
This is a true story. An older version of Python contained variants of this bug
and someone spent a considerable amount of time in a C debugger to figure out
why his __del__() methods would fail...
The second case of problems with a borrowed reference is a variant involving
threads. Normally, multiple threads in the Python interpreter can’t get in each
other’s way, because there is a global lock protecting Python’s entire object
space. However, it is possible to temporarily release this lock using the macro
Py_BEGIN_ALLOW_THREADS, and to re-acquire it using
Py_END_ALLOW_THREADS. This is common around blocking I/O calls, to
let other threads use the processor while waiting for the I/O to complete.
Obviously, the following function has the same problem as the previous one:
In general, functions that take object references as arguments do not expect you
to pass them NULL pointers, and will dump core (or cause later core dumps) if
you do so. Functions that return object references generally return NULL only
to indicate that an exception occurred. The reason for not testing for NULL
arguments is that functions often pass the objects they receive on to other
function — if each function were to test for NULL, there would be a lot of
redundant tests and the code would run more slowly.
It is better to test for NULL only at the “source:” when a pointer that may be
NULL is received, for example, from malloc() or from a function that
may raise an exception.
The macros for checking for a particular object type (Pytype_Check()) don’t
check for NULL pointers — again, there is much code that calls several of
these in a row to test an object against various different expected types, and
this would generate redundant tests. There are no variants with NULL
checking.
The C function calling mechanism guarantees that the argument list passed to C
functions (args in the examples) is never NULL — in fact it guarantees
that it is always a tuple. [4]
It is a severe error to ever let a NULL pointer “escape” to the Python user.
It is possible to write extension modules in C++. Some restrictions apply. If
the main program (the Python interpreter) is compiled and linked by the C
compiler, global or static objects with constructors cannot be used. This is
not a problem if the main program is linked by the C++ compiler. Functions that
will be called by the Python interpreter (in particular, module initialization
functions) have to be declared using extern"C". It is unnecessary to
enclose the Python header files in extern"C"{...} — they use this form
already if the symbol __cplusplus is defined (all recent C++ compilers
define this symbol).
Many extension modules just provide new functions and types to be used from
Python, but sometimes the code in an extension module can be useful for other
extension modules. For example, an extension module could implement a type
“collection” which works like lists without order. Just like the standard Python
list type has a C API which permits extension modules to create and manipulate
lists, this new collection type should have a set of C functions for direct
manipulation from other extension modules.
At first sight this seems easy: just write the functions (without declaring them
static, of course), provide an appropriate header file, and document
the C API. And in fact this would work if all extension modules were always
linked statically with the Python interpreter. When modules are used as shared
libraries, however, the symbols defined in one module may not be visible to
another module. The details of visibility depend on the operating system; some
systems use one global namespace for the Python interpreter and all extension
modules (Windows, for example), whereas others require an explicit list of
imported symbols at module link time (AIX is one example), or offer a choice of
different strategies (most Unices). And even if symbols are globally visible,
the module whose functions one wishes to call might not have been loaded yet!
Portability therefore requires not to make any assumptions about symbol
visibility. This means that all symbols in extension modules should be declared
static, except for the module’s initialization function, in order to
avoid name clashes with other extension modules (as discussed in section
The Module’s Method Table and Initialization Function). And it means that symbols that should be accessible from
other extension modules must be exported in a different way.
Python provides a special mechanism to pass C-level information (pointers) from
one extension module to another one: Capsules. A Capsule is a Python data type
which stores a pointer (void*). Capsules can only be created and
accessed via their C API, but they can be passed around like any other Python
object. In particular, they can be assigned to a name in an extension module’s
namespace. Other extension modules can then import this module, retrieve the
value of this name, and then retrieve the pointer from the Capsule.
There are many ways in which Capsules can be used to export the C API of an
extension module. Each function could get its own Capsule, or all C API pointers
could be stored in an array whose address is published in a Capsule. And the
various tasks of storing and retrieving the pointers can be distributed in
different ways between the module providing the code and the client modules.
Whichever method you choose, it’s important to name your Capsules properly.
The function PyCapsule_New() takes a name parameter
(constchar*); you’re permitted to pass in a NULL name, but
we strongly encourage you to specify a name. Properly named Capsules provide
a degree of runtime type-safety; there is no feasible way to tell one unnamed
Capsule from another.
In particular, Capsules used to expose C APIs should be given a name following
this convention:
modulename.attributename
The convenience function PyCapsule_Import() makes it easy to
load a C API provided via a Capsule, but only if the Capsule’s name
matches this convention. This behavior gives C API users a high degree
of certainty that the Capsule they load contains the correct C API.
The following example demonstrates an approach that puts most of the burden on
the writer of the exporting module, which is appropriate for commonly used
library modules. It stores all C API pointers (just one in the example!) in an
array of void pointers which becomes the value of a Capsule. The header
file corresponding to the module provides a macro that takes care of importing
the module and retrieving its C API pointers; client modules only have to call
this macro before accessing the C API.
The exporting module is a modification of the spam module from section
A Simple Example. The function spam.system() does not call
the C library function system() directly, but a function
PySpam_System(), which would of course do something more complicated in
reality (such as adding “spam” to every command). This function
PySpam_System() is also exported to other extension modules.
The function PySpam_System() is a plain C function, declared
static like everything else:
In the beginning of the module, right after the line
#include "Python.h"
two more lines must be added:
#define SPAM_MODULE#include "spammodule.h"
The #define is used to tell the header file that it is being included in the
exporting module, not a client module. Finally, the module’s initialization
function must take care of initializing the C API pointer array:
PyMODINIT_FUNCPyInit_spam(void){PyObject*m;staticvoid*PySpam_API[PySpam_API_pointers];PyObject*c_api_object;m=PyModule_Create(&spammodule);if(m==NULL)returnNULL;/* Initialize the C API pointer array */PySpam_API[PySpam_System_NUM]=(void*)PySpam_System;/* Create a Capsule containing the API pointer array's address */c_api_object=PyCapsule_New((void*)PySpam_API,"spam._C_API",NULL);if(c_api_object!=NULL)PyModule_AddObject(m,"_C_API",c_api_object);returnm;}
Note that PySpam_API is declared static; otherwise the pointer
array would disappear when PyInit_spam() terminates!
The bulk of the work is in the header file spammodule.h, which looks
like this:
#ifndef Py_SPAMMODULE_H#define Py_SPAMMODULE_H#ifdef __cplusplusextern"C"{#endif/* Header file for spammodule *//* C API functions */#define PySpam_System_NUM 0#define PySpam_System_RETURN int#define PySpam_System_PROTO (const char *command)/* Total number of C API pointers */#define PySpam_API_pointers 1#ifdef SPAM_MODULE/* This section is used when compiling spammodule.c */staticPySpam_System_RETURNPySpam_SystemPySpam_System_PROTO;#else/* This section is used in modules that use spammodule's API */staticvoid**PySpam_API;#define PySpam_System \ (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])/* Return -1 on error, 0 on success. * PyCapsule_Import will set an exception if there's an error. */staticintimport_spam(void){PySpam_API=(void**)PyCapsule_Import("spam._C_API",0);return(PySpam_API!=NULL)?0:-1;}#endif#ifdef __cplusplus}#endif#endif /* !defined(Py_SPAMMODULE_H) */
All that a client module must do in order to have access to the function
PySpam_System() is to call the function (or rather macro)
import_spam() in its initialization function:
PyMODINIT_FUNCPyInit_client(void){PyObject*m;m=PyModule_Create(&clientmodule);if(m==NULL)returnNULL;if(import_spam()<0)returnNULL;/* additional initialization can happen here */returnm;}
The main disadvantage of this approach is that the file spammodule.h is
rather complicated. However, the basic structure is the same for each function
that is exported, so it has to be learned only once.
Finally it should be mentioned that Capsules offer additional functionality,
which is especially useful for memory allocation and deallocation of the pointer
stored in a Capsule. The details are described in the Python/C API Reference
Manual in the section Capsules and in the implementation of Capsules (files
Include/pycapsule.h and Objects/pycapsule.c in the Python source
code distribution).
Checking that the reference count is at least 1 does not work — the
reference count itself could be in freed memory and may thus be reused for
another object!
As mentioned in the last chapter, Python allows the writer of an extension
module to define new types that can be manipulated from Python code, much like
strings and lists in core Python.
This is not hard; the code for all extension types follows a pattern, but there
are some details that you need to understand before you can get started.
The Python runtime sees all Python objects as variables of type
PyObject*. A PyObject is not a very magnificent object - it
just contains the refcount and a pointer to the object’s “type object”. This is
where the action is; the type object determines which (C) functions get called
when, for instance, an attribute gets looked up on an object or it is multiplied
by another object. These C functions are called “type methods”.
So, if you want to define a new object type, you need to create a new type
object.
This sort of thing can only be explained by example, so here’s a minimal, but
complete, module that defines a new type:
Now that’s quite a bit to take in at once, but hopefully bits will seem familiar
from the last chapter.
The first bit that will be new is:
typedefstruct{PyObject_HEAD}noddy_NoddyObject;
This is what a Noddy object will contain—in this case, nothing more than every
Python object contains, namely a refcount and a pointer to a type object. These
are the fields the PyObject_HEAD macro brings in. The reason for the macro
is to standardize the layout and to enable special debugging fields in debug
builds. Note that there is no semicolon after the PyObject_HEAD macro; one
is included in the macro definition. Be wary of adding one by accident; it’s
easy to do from habit, and your compiler might not complain, but someone else’s
probably will! (On Windows, MSVC is known to call this an error and refuse to
compile the code.)
For contrast, let’s take a look at the corresponding definition for standard
Python floats:
Now if you go and look up the definition of PyTypeObject in
object.h you’ll see that it has many more fields that the definition
above. The remaining fields will be filled with zeros by the C compiler, and
it’s common practice to not specify them explicitly unless you need them.
This is so important that we’re going to pick the top of it apart still
further:
PyVarObject_HEAD_INIT(NULL,0)
This line is a bit of a wart; what we’d like to write is:
PyVarObject_HEAD_INIT(&PyType_Type,0)
as the type of a type object is “type”, but this isn’t strictly conforming C and
some compilers complain. Fortunately, this member will be filled in for us by
PyType_Ready().
"noddy.Noddy",/* tp_name */
The name of our type. This will appear in the default textual representation of
our objects and in some error messages, for example:
Note that the name is a dotted name that includes both the module name and the
name of the type within the module. The module in this case is noddy and
the type is Noddy, so we set the type name to noddy.Noddy.
sizeof(noddy_NoddyObject),/* tp_basicsize */
This is so that Python knows how much memory to allocate when you call
PyObject_New().
Note
If you want your type to be subclassable from Python, and your type has the same
tp_basicsize as its base type, you may have problems with multiple
inheritance. A Python subclass of your type will have to list your type first
in its __bases__, or else it will not be able to call your type’s
__new__() method without getting an error. You can avoid this problem by
ensuring that your type has a larger value for tp_basicsize than its
base type does. Most of the time, this will be true anyway, because either your
base type will be object, or else you will be adding data members to
your base type, and therefore increasing its size.
0,/* tp_itemsize */
This has to do with variable length objects like lists and strings. Ignore this
for now.
Skipping a number of type methods that we don’t provide, we set the class flags
to Py_TPFLAGS_DEFAULT.
Py_TPFLAGS_DEFAULT,/* tp_flags */
All types should include this constant in their flags. It enables all of the
members defined by the current version of Python.
We provide a doc string for the type in tp_doc.
"Noddy objects",/* tp_doc */
Now we get into the type methods, the things that make your objects different
from the others. We aren’t going to implement any of these in this version of
the module. We’ll expand this example later to have more interesting behavior.
For now, all we want to be able to do is to create new Noddy objects.
To enable object creation, we have to provide a tp_new implementation.
In this case, we can just use the default implementation provided by the API
function PyType_GenericNew(). We’d like to just assign this to the
tp_new slot, but we can’t, for portability sake, On some platforms or
compilers, we can’t statically initialize a structure member with a function
defined in another C module, so, instead, we’ll assign the tp_new slot
in the module initialization function just before calling
PyType_Ready():
at a shell should produce a file noddy.so in a subdirectory; move to
that directory and fire up Python — you should be able to importnoddy and
play around with Noddy objects.
That wasn’t so hard, was it?
Of course, the current Noddy type is pretty uninteresting. It has no data and
doesn’t do anything. It can’t even be subclassed.
Let’s expend the basic example to add some data and methods. Let’s also make
the type usable as a base class. We’ll create a new module, noddy2 that
adds these capabilities:
#include <Python.h>#include "structmember.h"typedefstruct{PyObject_HEADPyObject*first;/* first name */PyObject*last;/* last name */intnumber;}Noddy;staticvoidNoddy_dealloc(Noddy*self){Py_XDECREF(self->first);Py_XDECREF(self->last);Py_TYPE(self)->tp_free((PyObject*)self);}staticPyObject*Noddy_new(PyTypeObject*type,PyObject*args,PyObject*kwds){Noddy*self;self=(Noddy*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}staticintNoddy_init(Noddy*self,PyObject*args,PyObject*kwds){PyObject*first=NULL,*last=NULL,*tmp;staticchar*kwlist[]={"first","last","number",NULL};if(!PyArg_ParseTupleAndKeywords(args,kwds,"|OOi",kwlist,&first,&last,&self->number))return-1;if(first){tmp=self->first;Py_INCREF(first);self->first=first;Py_XDECREF(tmp);}if(last){tmp=self->last;Py_INCREF(last);self->last=last;Py_XDECREF(tmp);}return0;}staticPyMemberDefNoddy_members[]={{"first",T_OBJECT_EX,offsetof(Noddy,first),0,"first name"},{"last",T_OBJECT_EX,offsetof(Noddy,last),0,"last name"},{"number",T_INT,offsetof(Noddy,number),0,"noddy number"},{NULL}/* Sentinel */};staticPyObject*Noddy_name(Noddy*self){staticPyObject*format=NULL;PyObject*args,*result;if(format==NULL){format=PyUnicode_FromString("%s %s");if(format==NULL)returnNULL;}if(self->first==NULL){PyErr_SetString(PyExc_AttributeError,"first");returnNULL;}if(self->last==NULL){PyErr_SetString(PyExc_AttributeError,"last");returnNULL;}args=Py_BuildValue("OO",self->first,self->last);if(args==NULL)returnNULL;result=PyUnicode_Format(format,args);Py_DECREF(args);returnresult;}staticPyMethodDefNoddy_methods[]={{"name",(PyCFunction)Noddy_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};staticPyTypeObjectNoddyType={PyVarObject_HEAD_INIT(NULL,0)"noddy.Noddy",/* tp_name */sizeof(Noddy),/* tp_basicsize */0,/* tp_itemsize */(destructor)Noddy_dealloc,/* tp_dealloc */0,/* tp_print */0,/* tp_getattr */0,/* tp_setattr */0,/* tp_reserved */0,/* tp_repr */0,/* tp_as_number */0,/* tp_as_sequence */0,/* tp_as_mapping */0,/* tp_hash */0,/* tp_call */0,/* tp_str */0,/* tp_getattro */0,/* tp_setattro */0,/* tp_as_buffer */Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,/* tp_flags */"Noddy objects",/* tp_doc */0,/* tp_traverse */0,/* tp_clear */0,/* tp_richcompare */0,/* tp_weaklistoffset */0,/* tp_iter */0,/* tp_iternext */Noddy_methods,/* tp_methods */Noddy_members,/* tp_members */0,/* tp_getset */0,/* tp_base */0,/* tp_dict */0,/* tp_descr_get */0,/* tp_descr_set */0,/* tp_dictoffset */(initproc)Noddy_init,/* tp_init */0,/* tp_alloc */Noddy_new,/* tp_new */};staticPyModuleDefnoddy2module={PyModuleDef_HEAD_INIT,"noddy2","Example module that creates an extension type.",-1,NULL,NULL,NULL,NULL,NULL};PyMODINIT_FUNCPyInit_noddy2(void){PyObject*m;if(PyType_Ready(&NoddyType)<0)returnNULL;m=PyModule_Create(&noddy2module);if(m==NULL)returnNULL;Py_INCREF(&NoddyType);PyModule_AddObject(m,"Noddy",(PyObject*)&NoddyType);returnm;}
This version of the module has a number of changes.
We’ve added an extra include:
#include <structmember.h>
This include provides declarations that we use to handle attributes, as
described a bit later.
The name of the Noddy object structure has been shortened to
Noddy. The type object name has been shortened to NoddyType.
The Noddy type now has three data attributes, first, last, and
number. The first and last variables are Python strings containing first
and last names. The number attribute is an integer.
This method decrements the reference counts of the two Python attributes. We use
Py_XDECREF() here because the first and last members
could be NULL. It then calls the tp_free member of the object’s type
to free the object’s memory. Note that the object’s type might not be
NoddyType, because the object may be an instance of a subclass.
We want to make sure that the first and last names are initialized to empty
strings, so we provide a new method:
The new member is responsible for creating (as opposed to initializing) objects
of the type. It is exposed in Python as the __new__() method. See the
paper titled “Unifying types and classes in Python” for a detailed discussion of
the __new__() method. One reason to implement a new method is to assure
the initial values of instance variables. In this case, we use the new method
to make sure that the initial values of the members first and
last are not NULL. If we didn’t care whether the initial values were
NULL, we could have used PyType_GenericNew() as our new method, as we
did before. PyType_GenericNew() initializes all of the instance variable
members to NULL.
The new method is a static method that is passed the type being instantiated and
any arguments passed when the type was called, and that returns the new object
created. New methods always accept positional and keyword arguments, but they
often ignore the arguments, leaving the argument handling to initializer
methods. Note that if the type supports subclassing, the type passed may not be
the type being defined. The new method calls the tp_alloc slot to allocate
memory. We don’t fill the tp_alloc slot ourselves. Rather
PyType_Ready() fills it for us by inheriting it from our base class,
which is object by default. Most types use the default allocation.
Note
If you are creating a co-operative tp_new (one that calls a base type’s
tp_new or __new__()), you must not try to determine what method
to call using method resolution order at runtime. Always statically determine
what type you are going to call, and call its tp_new directly, or via
type->tp_base->tp_new. If you do not do this, Python subclasses of your
type that also inherit from other Python-defined classes may not work correctly.
(Specifically, you may not be able to create instances of such subclasses
without getting a TypeError.)
The tp_init slot is exposed in Python as the __init__() method. It
is used to initialize an object after it’s created. Unlike the new method, we
can’t guarantee that the initializer is called. The initializer isn’t called
when unpickling objects and it can be overridden. Our initializer accepts
arguments to provide initial values for our instance. Initializers always accept
positional and keyword arguments.
Initializers can be called multiple times. Anyone can call the __init__()
method on our objects. For this reason, we have to be extra careful when
assigning the new values. We might be tempted, for example to assign the
first member like this:
But this would be risky. Our type doesn’t restrict the type of the
first member, so it could be any kind of object. It could have a
destructor that causes code to be executed that tries to access the
first member. To be paranoid and protect ourselves against this
possibility, we almost always reassign members before decrementing their
reference counts. When don’t we have to do this?
when we absolutely know that the reference count is greater than 1
when we know that deallocation of the object [1] will not cause any calls
back into our type’s code
when decrementing a reference count in a tp_dealloc handler when
garbage-collections is not supported [2]
We want to expose our instance variables as attributes. There are a
number of ways to do that. The simplest way is to define member definitions:
Each member definition has a member name, type, offset, access flags and
documentation string. See the Generic Attribute Management section below for
details.
A disadvantage of this approach is that it doesn’t provide a way to restrict the
types of objects that can be assigned to the Python attributes. We expect the
first and last names to be strings, but any Python objects can be assigned.
Further, the attributes can be deleted, setting the C pointers to NULL. Even
though we can make sure the members are initialized to non-NULL values, the
members can be set to NULL if the attributes are deleted.
We define a single method, name(), that outputs the objects name as the
concatenation of the first and last names.
The method is implemented as a C function that takes a Noddy (or
Noddy subclass) instance as the first argument. Methods always take an
instance as the first argument. Methods often take positional and keyword
arguments as well, but in this cased we don’t take any and don’t need to accept
a positional argument tuple or keyword argument dictionary. This method is
equivalent to the Python method:
Note that we have to check for the possibility that our first and
last members are NULL. This is because they can be deleted, in which
case they are set to NULL. It would be better to prevent deletion of these
attributes and to restrict the attribute values to be strings. We’ll see how to
do that in the next section.
Now that we’ve defined the method, we need to create an array of method
definitions:
staticPyMethodDefNoddy_methods[]={{"name",(PyCFunction)Noddy_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};
and assign them to the tp_methods slot:
Noddy_methods,/* tp_methods */
Note that we used the METH_NOARGS flag to indicate that the method is
passed no arguments.
Finally, we’ll make our type usable as a base class. We’ve written our methods
carefully so far so that they don’t make any assumptions about the type of the
object being created or used, so all we need to do is to add the
Py_TPFLAGS_BASETYPE to our class flag definition:
In this section, we’ll provide finer control over how the first and
last attributes are set in the Noddy example. In the previous
version of our module, the instance variables first and last
could be set to non-string values or even deleted. We want to make sure that
these attributes always contain strings.
#include <Python.h>#include "structmember.h"typedefstruct{PyObject_HEADPyObject*first;PyObject*last;intnumber;}Noddy;staticvoidNoddy_dealloc(Noddy*self){Py_XDECREF(self->first);Py_XDECREF(self->last);Py_TYPE(self)->tp_free((PyObject*)self);}staticPyObject*Noddy_new(PyTypeObject*type,PyObject*args,PyObject*kwds){Noddy*self;self=(Noddy*)type->tp_alloc(type,0);if(self!=NULL){self->first=PyUnicode_FromString("");if(self->first==NULL){Py_DECREF(self);returnNULL;}self->last=PyUnicode_FromString("");if(self->last==NULL){Py_DECREF(self);returnNULL;}self->number=0;}return(PyObject*)self;}staticintNoddy_init(Noddy*self,PyObject*args,PyObject*kwds){PyObject*first=NULL,*last=NULL,*tmp;staticchar*kwlist[]={"first","last","number",NULL};if(!PyArg_ParseTupleAndKeywords(args,kwds,"|SSi",kwlist,&first,&last,&self->number))return-1;if(first){tmp=self->first;Py_INCREF(first);self->first=first;Py_DECREF(tmp);}if(last){tmp=self->last;Py_INCREF(last);self->last=last;Py_DECREF(tmp);}return0;}staticPyMemberDefNoddy_members[]={{"number",T_INT,offsetof(Noddy,number),0,"noddy number"},{NULL}/* Sentinel */};staticPyObject*Noddy_getfirst(Noddy*self,void*closure){Py_INCREF(self->first);returnself->first;}staticintNoddy_setfirst(Noddy*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the first attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The first attribute value must be a string");return-1;}Py_DECREF(self->first);Py_INCREF(value);self->first=value;return0;}staticPyObject*Noddy_getlast(Noddy*self,void*closure){Py_INCREF(self->last);returnself->last;}staticintNoddy_setlast(Noddy*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the last attribute");return-1;}if(!PyUnicode_Check(value)){PyErr_SetString(PyExc_TypeError,"The last attribute value must be a string");return-1;}Py_DECREF(self->last);Py_INCREF(value);self->last=value;return0;}staticPyGetSetDefNoddy_getseters[]={{"first",(getter)Noddy_getfirst,(setter)Noddy_setfirst,"first name",NULL},{"last",(getter)Noddy_getlast,(setter)Noddy_setlast,"last name",NULL},{NULL}/* Sentinel */};staticPyObject*Noddy_name(Noddy*self){staticPyObject*format=NULL;PyObject*args,*result;if(format==NULL){format=PyUnicode_FromString("%s %s");if(format==NULL)returnNULL;}args=Py_BuildValue("OO",self->first,self->last);if(args==NULL)returnNULL;result=PyUnicode_Format(format,args);Py_DECREF(args);returnresult;}staticPyMethodDefNoddy_methods[]={{"name",(PyCFunction)Noddy_name,METH_NOARGS,"Return the name, combining the first and last name"},{NULL}/* Sentinel */};staticPyTypeObjectNoddyType={PyVarObject_HEAD_INIT(NULL,0)"noddy.Noddy",/* tp_name */sizeof(Noddy),/* tp_basicsize */0,/* tp_itemsize */(destructor)Noddy_dealloc,/* tp_dealloc */0,/* tp_print */0,/* tp_getattr */0,/* tp_setattr */0,/* tp_reserved */0,/* tp_repr */0,/* tp_as_number */0,/* tp_as_sequence */0,/* tp_as_mapping */0,/* tp_hash */0,/* tp_call */0,/* tp_str */0,/* tp_getattro */0,/* tp_setattro */0,/* tp_as_buffer */Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE,/* tp_flags */"Noddy objects",/* tp_doc */0,/* tp_traverse */0,/* tp_clear */0,/* tp_richcompare */0,/* tp_weaklistoffset */0,/* tp_iter */0,/* tp_iternext */Noddy_methods,/* tp_methods */Noddy_members,/* tp_members */Noddy_getseters,/* tp_getset */0,/* tp_base */0,/* tp_dict */0,/* tp_descr_get */0,/* tp_descr_set */0,/* tp_dictoffset */(initproc)Noddy_init,/* tp_init */0,/* tp_alloc */Noddy_new,/* tp_new */};staticPyModuleDefnoddy3module={PyModuleDef_HEAD_INIT,"noddy3","Example module that creates an extension type.",-1,NULL,NULL,NULL,NULL,NULL};PyMODINIT_FUNCPyInit_noddy3(void){PyObject*m;if(PyType_Ready(&NoddyType)<0)returnNULL;m=PyModule_Create(&noddy3module);if(m==NULL)returnNULL;Py_INCREF(&NoddyType);PyModule_AddObject(m,"Noddy",(PyObject*)&NoddyType);returnm;}
To provide greater control, over the first and last attributes,
we’ll use custom getter and setter functions. Here are the functions for
getting and setting the first attribute:
Noddy_getfirst(Noddy*self,void*closure){Py_INCREF(self->first);returnself->first;}staticintNoddy_setfirst(Noddy*self,PyObject*value,void*closure){if(value==NULL){PyErr_SetString(PyExc_TypeError,"Cannot delete the first attribute");return-1;}if(!PyString_Check(value)){PyErr_SetString(PyExc_TypeError,"The first attribute value must be a string");return-1;}Py_DECREF(self->first);Py_INCREF(value);self->first=value;return0;}
The getter function is passed a Noddy object and a “closure”, which is
void pointer. In this case, the closure is ignored. (The closure supports an
advanced usage in which definition data is passed to the getter and setter. This
could, for example, be used to allow a single set of getter and setter functions
that decide the attribute to get or set based on data in the closure.)
The setter function is passed the Noddy object, the new value, and the
closure. The new value may be NULL, in which case the attribute is being
deleted. In our setter, we raise an error if the attribute is deleted or if the
attribute value is not a string.
With these changes, we can assure that the first and last
members are never NULL so we can remove checks for NULL values in almost all
cases. This means that most of the Py_XDECREF() calls can be converted to
Py_DECREF() calls. The only place we can’t change these calls is in the
deallocator, where there is the possibility that the initialization of these
members failed in the constructor.
We also rename the module initialization function and module name in the
initialization function, as we did before, and we add an extra definition to the
setup.py file.
Python has a cyclic-garbage collector that can identify unneeded objects even
when their reference counts are not zero. This can happen when objects are
involved in cycles. For example, consider:
>>>l=[]>>>l.append(l)>>>dell
In this example, we create a list that contains itself. When we delete it, it
still has a reference from itself. Its reference count doesn’t drop to zero.
Fortunately, Python’s cyclic-garbage collector will eventually figure out that
the list is garbage and free it.
In the second version of the Noddy example, we allowed any kind of
object to be stored in the first or last attributes. [4] This
means that Noddy objects can participate in cycles:
This is pretty silly, but it gives us an excuse to add support for the
cyclic-garbage collector to the Noddy example. To support cyclic
garbage collection, types need to fill two slots and set a class flag that
enables these slots:
For each subobject that can participate in cycles, we need to call the
visit() function, which is passed to the traversal method. The
visit() function takes as arguments the subobject and the extra argument
arg passed to the traversal method. It returns an integer value that must be
returned if it is non-zero.
Python provides a Py_VISIT() macro that automates calling visit
functions. With Py_VISIT(), Noddy_traverse() can be simplified:
Note that the tp_traverse implementation must name its arguments exactly
visit and arg in order to use Py_VISIT(). This is to encourage
uniformity across these boring implementations.
We also need to provide a method for clearing any subobjects that can
participate in cycles. We implement the method and reimplement the deallocator
to use it:
Notice the use of a temporary variable in Noddy_clear(). We use the
temporary variable so that we can set each member to NULL before decrementing
its reference count. We do this because, as was discussed earlier, if the
reference count drops to zero, we might cause code to run that calls back into
the object. In addition, because we now support garbage collection, we also
have to worry about code being run that triggers garbage collection. If garbage
collection is run, our tp_traverse handler could get called. We can’t
take a chance of having Noddy_traverse() called when a member’s reference
count has dropped to zero and its value hasn’t been set to NULL.
Python provides a Py_CLEAR() that automates the careful decrementing of
reference counts. With Py_CLEAR(), the Noddy_clear() function can
be simplified:
That’s pretty much it. If we had written custom tp_alloc or
tp_free slots, we’d need to modify them for cyclic-garbage collection.
Most extensions will use the versions automatically provided.
It is possible to create new extension types that are derived from existing
types. It is easiest to inherit from the built in types, since an extension can
easily use the PyTypeObject it needs. It can be difficult to share
these PyTypeObject structures between extension modules.
In this example we will create a Shoddy type that inherits from the
built-in list type. The new type will be completely compatible with
regular lists, but will have an additional increment() method that
increases an internal counter.
As you can see, the source code closely resembles the Noddy examples in
previous sections. We will break down the main differences between them.
typedefstruct{PyListObjectlist;intstate;}Shoddy;
The primary difference for derived type objects is that the base type’s object
structure must be the first value. The base type will already include the
PyObject_HEAD() at the beginning of its structure.
When a Python object is a Shoddy instance, its PyObject* pointer can
be safely cast to both PyListObject* and Shoddy*.
In the __init__ method for our type, we can see how to call through to
the __init__ method of the base type.
This pattern is important when writing a type with custom new and
dealloc methods. The new method should not actually create the
memory for the object with tp_alloc, that will be handled by the base
class when calling its tp_new.
When filling out the PyTypeObject() for the Shoddy type, you see
a slot for tp_base(). Due to cross platform compiler issues, you can’t
fill that field directly with the PyList_Type(); it can be done later in
the module’s init() function.
Before calling PyType_Ready(), the type structure must have the
tp_base slot filled in. When we are deriving a new type, it is not
necessary to fill out the tp_alloc slot with PyType_GenericNew()
– the allocate function from the base type will be inherited.
After that, calling PyType_Ready() and adding the type object to the
module is the same as with the basic Noddy examples.
This section aims to give a quick fly-by on the various type methods you can
implement and what they do.
Here is the definition of PyTypeObject, with some fields only used in
debug builds omitted:
typedefstruct_typeobject{PyObject_VAR_HEADchar*tp_name;/* For printing, in format "<module>.<name>" */inttp_basicsize,tp_itemsize;/* For allocation *//* Methods to implement standard operations */destructortp_dealloc;printfunctp_print;getattrfunctp_getattr;setattrfunctp_setattr;void*tp_reserved;reprfunctp_repr;/* Method suites for standard classes */PyNumberMethods*tp_as_number;PySequenceMethods*tp_as_sequence;PyMappingMethods*tp_as_mapping;/* More standard operations (here for binary compatibility) */hashfunctp_hash;ternaryfunctp_call;reprfunctp_str;getattrofunctp_getattro;setattrofunctp_setattro;/* Functions to access object as input/output buffer */PyBufferProcs*tp_as_buffer;/* Flags to define presence of optional/expanded features */longtp_flags;char*tp_doc;/* Documentation string *//* call function for all accessible objects */traverseproctp_traverse;/* delete references to contained objects */inquirytp_clear;/* rich comparisons */richcmpfunctp_richcompare;/* weak reference enabler */longtp_weaklistoffset;/* Iterators */getiterfunctp_iter;iternextfunctp_iternext;/* Attribute descriptor and subclassing stuff */structPyMethodDef*tp_methods;structPyMemberDef*tp_members;structPyGetSetDef*tp_getset;struct_typeobject*tp_base;PyObject*tp_dict;descrgetfunctp_descr_get;descrsetfunctp_descr_set;longtp_dictoffset;initproctp_init;allocfunctp_alloc;newfunctp_new;freefunctp_free;/* Low-level free-memory routine */inquirytp_is_gc;/* For PyObject_IS_GC */PyObject*tp_bases;PyObject*tp_mro;/* method resolution order */PyObject*tp_cache;PyObject*tp_subclasses;PyObject*tp_weaklist;}PyTypeObject;
Now that’s a lot of methods. Don’t worry too much though - if you have a type
you want to define, the chances are very good that you will only implement a
handful of these.
As you probably expect by now, we’re going to go over this and give more
information about the various handlers. We won’t go in the order they are
defined in the structure, because there is a lot of historical baggage that
impacts the ordering of the fields; be sure your type initialization keeps the
fields in the right order! It’s often easiest to find an example that includes
all the fields you need (even if they’re initialized to 0) and then change
the values to suit your new type.
char*tp_name;/* For printing */
The name of the type - as mentioned in the last section, this will appear in
various places, almost entirely for diagnostic purposes. Try to choose something
that will be helpful in such a situation!
inttp_basicsize,tp_itemsize;/* For allocation */
These fields tell the runtime how much memory to allocate when new objects of
this type are created. Python has some built-in support for variable length
structures (think: strings, lists) which is where the tp_itemsize field
comes in. This will be dealt with later.
char*tp_doc;
Here you can put a string (or its address) that you want returned when the
Python script references obj.__doc__ to retrieve the doc string.
Now we come to the basic type methods—the ones most extension types will
implement.
This function is called when the reference count of the instance of your type is
reduced to zero and the Python interpreter wants to reclaim it. If your type
has memory to free or other clean-up to perform, put it here. The object itself
needs to be freed here as well. Here is an example of this function:
One important requirement of the deallocator function is that it leaves any
pending exceptions alone. This is important since deallocators are frequently
called as the interpreter unwinds the Python stack; when the stack is unwound
due to an exception (rather than normal returns), nothing is done to protect the
deallocators from seeing that an exception has already been set. Any actions
which a deallocator performs which may cause additional Python code to be
executed may detect that an exception has been set. This can lead to misleading
errors from the interpreter. The proper way to protect against this is to save
a pending exception before performing the unsafe action, and restoring it when
done. This can be done using the PyErr_Fetch() and
PyErr_Restore() functions:
In Python, there are two ways to generate a textual representation of an object:
the repr() function, and the str() function. (The print()
function just calls str().) These handlers are both optional.
reprfunctp_repr;reprfunctp_str;
The tp_repr handler should return a string object containing a
representation of the instance for which it is called. Here is a simple
example:
If no tp_repr handler is specified, the interpreter will supply a
representation that uses the type’s tp_name and a uniquely-identifying
value for the object.
The tp_str handler is to str() what the tp_repr handler
described above is to repr(); that is, it is called when Python code calls
str() on an instance of your object. Its implementation is very similar
to the tp_repr function, but the resulting string is intended for human
consumption. If tp_str is not specified, the tp_repr handler is
used instead.
For every object which can support attributes, the corresponding type must
provide the functions that control how the attributes are resolved. There needs
to be a function which can retrieve attributes (if any are defined), and another
to set attributes (if setting attributes is allowed). Removing an attribute is
a special case, for which the new value passed to the handler is NULL.
Python supports two pairs of attribute handlers; a type that supports attributes
only needs to implement the functions for one pair. The difference is that one
pair takes the name of the attribute as a char*, while the other
accepts a PyObject*. Each type can use whichever pair makes more
sense for the implementation’s convenience.
getattrfunctp_getattr;/* char * version */setattrfunctp_setattr;/* ... */getattrofunctp_getattro;/* PyObject * version */setattrofunctp_setattro;
If accessing attributes of an object is always a simple operation (this will be
explained shortly), there are generic implementations which can be used to
provide the PyObject* version of the attribute management functions.
The actual need for type-specific attribute handlers almost completely
disappeared starting with Python 2.2, though there are many examples which have
not been updated to use some of the new generic mechanism that is available.
Most extension types only use simple attributes. So, what makes the
attributes simple? There are only a couple of conditions that must be met:
The name of the attributes must be known when PyType_Ready() is
called.
No special processing is needed to record that an attribute was looked up or
set, nor do actions need to be taken based on the value.
Note that this list does not place any restrictions on the values of the
attributes, when the values are computed, or how relevant data is stored.
When PyType_Ready() is called, it uses three tables referenced by the
type object to create descriptors which are placed in the dictionary of the
type object. Each descriptor controls access to one attribute of the instance
object. Each of the tables is optional; if all three are NULL, instances of
the type will only have attributes that are inherited from their base type, and
should leave the tp_getattro and tp_setattro fields NULL as
well, allowing the base type to handle attributes.
The tables are declared as three fields of the type object:
If tp_methods is not NULL, it must refer to an array of
PyMethodDef structures. Each entry in the table is an instance of this
structure:
typedefstructPyMethodDef{char*ml_name;/* method name */PyCFunctionml_meth;/* implementation function */intml_flags;/* flags */char*ml_doc;/* docstring */}PyMethodDef;
One entry should be defined for each method provided by the type; no entries are
needed for methods inherited from a base type. One additional entry is needed
at the end; it is a sentinel that marks the end of the array. The
ml_name field of the sentinel must be NULL.
XXX Need to refer to some unified discussion of the structure fields, shared
with the next section.
The second table is used to define attributes which map directly to data stored
in the instance. A variety of primitive C types are supported, and access may
be read-only or read-write. The structures in the table are defined as:
For each entry in the table, a descriptor will be constructed and added to the
type which will be able to extract a value from the instance structure. The
type field should contain one of the type codes defined in the
structmember.h header; the value will be used to determine how to
convert Python values to and from C values. The flags field is used to
store flags which control how the attribute can be accessed.
XXX Need to move some of this to a shared section!
The following flag constants are defined in structmember.h; they may be
combined using bitwise-OR.
Constant
Meaning
READONLY
Never writable.
READ_RESTRICTED
Not readable in restricted mode.
WRITE_RESTRICTED
Not writable in restricted mode.
RESTRICTED
Not readable or writable in restricted mode.
An interesting advantage of using the tp_members table to build
descriptors that are used at runtime is that any attribute defined this way can
have an associated doc string simply by providing the text in the table. An
application can use the introspection API to retrieve the descriptor from the
class object, and get the doc string using its __doc__ attribute.
As with the tp_methods table, a sentinel entry with a name value
of NULL is required.
For simplicity, only the char* version will be demonstrated here; the
type of the name parameter is the only difference between the char*
and PyObject* flavors of the interface. This example effectively does
the same thing as the generic example above, but does not use the generic
support added in Python 2.2. It explains how the handler functions are
called, so that if you do need to extend their functionality, you’ll understand
what needs to be done.
The tp_getattr handler is called when the object requires an attribute
look-up. It is called in the same situations where the __getattr__()
method of a class would be called.
Here is an example:
staticPyObject*newdatatype_getattr(newdatatypeobject*obj,char*name){if(strcmp(name,"data")==0){returnPyInt_FromLong(obj->data);}PyErr_Format(PyExc_AttributeError,"'%.50s' object has no attribute '%.400s'",tp->tp_name,name);returnNULL;}
The tp_setattr handler is called when the __setattr__() or
__delattr__() method of a class instance would be called. When an
attribute should be deleted, the third parameter will be NULL. Here is an
example that simply raises an exception; if this were really all you wanted, the
tp_setattr handler should be set to NULL.
This function is called with two Python objects and the operator as arguments,
where the operator is one of Py_EQ, Py_NE, Py_LE, Py_GT,
Py_LT or Py_GT. It should compare the two objects with respect to the
specified operator and return Py_True or Py_False if the comparison is
successful, Py_NotImplemented to indicate that comparison is not
implemented and the other object’s comparison method should be tried, or NULL
if an exception was set.
Here is a sample implementation, for a datatype that is considered equal if the
size of an internal pointer is equal:
staticintnewdatatype_richcmp(PyObject*obj1,PyObject*obj2,intop){PyObject*result;intc,size1,size2;/* code to make sure that both arguments are of type newdatatype omitted */size1=obj1->obj_UnderlyingDatatypePtr->size;size2=obj2->obj_UnderlyingDatatypePtr->size;switch(op){casePy_LT:c=size1<size2;break;casePy_LE:c=size1<=size2;break;casePy_EQ:c=size1==size2;break;casePy_NE:c=size1!=size2;break;casePy_GT:c=size1>size2;break;casePy_GE:c=size1>=size2;break;}result=c?Py_True:Py_False;Py_INCREF(result);returnresult;}
Python supports a variety of abstract ‘protocols;’ the specific interfaces
provided to use these interfaces are documented in Abstract Objects Layer.
A number of these abstract interfaces were defined early in the development of
the Python implementation. In particular, the number, mapping, and sequence
protocols have been part of Python since the beginning. Other protocols have
been added over time. For protocols which depend on several handler routines
from the type implementation, the older protocols have been defined as optional
blocks of handlers referenced by the type object. For newer protocols there are
additional slots in the main type object, with a flag bit being set to indicate
that the slots are present and should be checked by the interpreter. (The flag
bit does not indicate that the slot values are non-NULL. The flag may be set
to indicate the presence of a slot, but a slot may still be unfilled.)
If you wish your object to be able to act like a number, a sequence, or a
mapping object, then you place the address of a structure that implements the C
type PyNumberMethods, PySequenceMethods, or
PyMappingMethods, respectively. It is up to you to fill in this
structure with appropriate values. You can find examples of the use of each of
these in the Objects directory of the Python source distribution.
hashfunctp_hash;
This function, if you choose to provide it, should return a hash number for an
instance of your data type. Here is a moderately pointless example:
This function is called when an instance of your data type is “called”, for
example, if obj1 is an instance of your data type and the Python script
contains obj1('hello'), the tp_call handler is invoked.
This function takes three arguments:
arg1 is the instance of the data type which is the subject of the call. If
the call is obj1('hello'), then arg1 is obj1.
arg2 is a tuple containing the arguments to the call. You can use
PyArg_ParseTuple() to extract the arguments.
arg3 is a dictionary of keyword arguments that were passed. If this is
non-NULL and you support keyword arguments, use
PyArg_ParseTupleAndKeywords() to extract the arguments. If you do not
want to support keyword arguments and this is non-NULL, raise a
TypeError with a message saying that keyword arguments are not supported.
Here is a desultory example of the implementation of the call function.
/* Implement the call function. * obj1 is the instance receiving the call. * obj2 is a tuple containing the arguments to the call, in this * case 3 strings. */staticPyObject*newdatatype_call(newdatatypeobject*obj,PyObject*args,PyObject*other){PyObject*result;char*arg1;char*arg2;char*arg3;if(!PyArg_ParseTuple(args,"sss:call",&arg1,&arg2,&arg3)){returnNULL;}result=PyString_FromFormat("Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n",obj->obj_UnderlyingDatatypePtr->size,arg1,arg2,arg3);printf("\%s",PyString_AS_STRING(result));returnresult;}
These functions provide support for the iterator protocol. Any object which
wishes to support iteration over its contents (which may be generated during
iteration) must implement the tp_iter handler. Objects which are returned
by a tp_iter handler must implement both the tp_iter and tp_iternext
handlers. Both handlers take exactly one parameter, the instance for which they
are being called, and return a new reference. In the case of an error, they
should set an exception and return NULL.
For an object which represents an iterable collection, the tp_iter handler
must return an iterator object. The iterator object is responsible for
maintaining the state of the iteration. For collections which can support
multiple iterators which do not interfere with each other (as lists and tuples
do), a new iterator should be created and returned. Objects which can only be
iterated over once (usually due to side effects of iteration) should implement
this handler by returning a new reference to themselves, and should also
implement the tp_iternext handler. File objects are an example of such an
iterator.
Iterator objects should implement both handlers. The tp_iter handler should
return a new reference to the iterator (this is the same as the tp_iter
handler for objects which can only be iterated over destructively). The
tp_iternext handler should return a new reference to the next object in the
iteration if there is one. If the iteration has reached the end, it may return
NULL without setting an exception or it may set StopIteration; avoiding
the exception can yield slightly better performance. If an actual error occurs,
it should set an exception and return NULL.
One of the goals of Python’s weak-reference implementation is to allow any type
to participate in the weak reference mechanism without incurring the overhead on
those objects which do not benefit by weak referencing (such as numbers).
For an object to be weakly referencable, the extension must include a
PyObject* field in the instance structure for the use of the weak
reference mechanism; it must be initialized to NULL by the object’s
constructor. It must also set the tp_weaklistoffset field of the
corresponding type object to the offset of the field. For example, the instance
type is defined with the following structure:
typedefstruct{PyObject_HEADPyClassObject*in_class;/* The class object */PyObject*in_dict;/* A dictionary */PyObject*in_weakreflist;/* List of weak references */}PyInstanceObject;
The statically-declared type object for instances is defined this way:
PyTypeObjectPyInstance_Type={PyVarObject_HEAD_INIT(&PyType_Type,0)0,"module.instance",/* Lots of stuff omitted for brevity... */Py_TPFLAGS_DEFAULT,/* tp_flags */0,/* tp_doc */0,/* tp_traverse */0,/* tp_clear */0,/* tp_richcompare */offsetof(PyInstanceObject,in_weakreflist),/* tp_weaklistoffset */};
The type constructor is responsible for initializing the weak reference list to
NULL:
staticPyObject*instance_new(){/* Other initialization stuff omitted for brevity */self->in_weakreflist=NULL;return(PyObject*)self;}
The only further addition is that the destructor needs to call the weak
reference manager to clear any weak references. This should be done before any
other parts of the destruction have occurred, but is only required if the weak
reference list is non-NULL:
staticvoidinstance_dealloc(PyInstanceObject*inst){/* Allocate temporaries if needed, but do not begin destruction just yet. */if(inst->in_weakreflist!=NULL)PyObject_ClearWeakRefs((PyObject*)inst);/* Proceed with object destruction normally. */}
Remember that you can omit most of these functions, in which case you provide
0 as a value. There are type definitions for each of the functions you must
provide. They are in object.h in the Python include directory that
comes with the source distribution of Python.
In order to learn how to implement any specific method for your new data type,
do the following: Download and unpack the Python source distribution. Go to
the Objects directory, then search the C source files for tp_ plus
the function you want (for example, tp_richcompare). You will find examples
of the function you want to implement.
When you need to verify that an object is an instance of the type you are
implementing, use the PyObject_TypeCheck() function. A sample of its use
might be something like the following:
if(!PyObject_TypeCheck(some_object,&MyType)){PyErr_SetString(PyExc_TypeError,"arg #1 not a mything");returnNULL;}
We relied on this in the tp_dealloc handler in this example, because our
type doesn’t support garbage collection. Even if a type supports garbage
collection, there are calls that can be made to “untrack” the object from
garbage collection, however, these calls are advanced and not covered here.
We now know that the first and last members are strings, so perhaps we could be
less careful about decrementing their reference counts, however, we accept
instances of string subclasses. Even though deallocating normal strings won’t
call back into our objects, we can’t guarantee that deallocating an instance of
a string subclass won’t call back into our objects.
Even in the third version, we aren’t guaranteed to avoid cycles. Instances of
string subclasses are allowed and string subclasses could allow cycles even if
normal strings don’t.
Starting in Python 1.4, Python provides, on Unix, a special make file for
building make files for building dynamically-linked extensions and custom
interpreters. Starting with Python 2.0, this mechanism (known as related to
Makefile.pre.in, and Setup files) is no longer supported. Building custom
interpreters was rarely used, and extension modules can be built using
distutils.
Building an extension module using distutils requires that distutils is
installed on the build machine, which is included in Python 2.x and available
separately for Python 1.5. Since distutils also supports creation of binary
packages, users don’t necessarily need a compiler and distutils to install the
extension.
A distutils package contains a driver script, setup.py. This is a plain
Python file, which, in the most simple case, could look like this:
from distutils.core import setup, Extension
module1 = Extension('demo',
sources = ['demo.c'])
setup (name = 'PackageName',
version = '1.0',
description = 'This is a demo package',
ext_modules = [module1])
With this setup.py, and a file demo.c, running
pythonsetup.pybuild
will compile demo.c, and produce an extension module named demo in
the build directory. Depending on the system, the module file will end
up in a subdirectory build/lib.system, and may have a name like
demo.so or demo.pyd.
In the setup.py, all execution is performed by calling the setup
function. This takes a variable number of keyword arguments, of which the
example above uses only a subset. Specifically, the example specifies
meta-information to build packages, and it specifies the contents of the
package. Normally, a package will contain of addition modules, like Python
source modules, documentation, subpackages, etc. Please refer to the distutils
documentation in Distributing Python Modules to learn more about the features of
distutils; this section explains building extension modules only.
It is common to pre-compute arguments to setup(), to better structure the
driver script. In the example above, theext_modules argument to
setup() is a list of extension modules, each of which is an instance of
the Extension. In the example, the instance defines an extension named
demo which is build by compiling a single source file, demo.c.
In many cases, building an extension is more complex, since additional
preprocessor defines and libraries may be needed. This is demonstrated in the
example below.
from distutils.core import setup, Extension
module1 = Extension('demo',
define_macros = [('MAJOR_VERSION', '1'),
('MINOR_VERSION', '0')],
include_dirs = ['/usr/local/include'],
libraries = ['tcl83'],
library_dirs = ['/usr/local/lib'],
sources = ['demo.c'])
setup (name = 'PackageName',
version = '1.0',
description = 'This is a demo package',
author = 'Martin v. Loewis',
author_email = 'martin@v.loewis.de',
url = 'http://docs.python.org/extending/building',
long_description = '''
This is really just a demo package.
''',
ext_modules = [module1])
In this example, setup() is called with additional meta-information, which
is recommended when distribution packages have to be built. For the extension
itself, it specifies preprocessor defines, include directories, library
directories, and libraries. Depending on the compiler, distutils passes this
information in different ways to the compiler. For example, on Unix, this may
result in the compilation commands
When an extension has been successfully build, there are three ways to use it.
End-users will typically want to install the module, they do so by running
pythonsetup.pyinstall
Module maintainers should produce source packages; to do so, they run
pythonsetup.pysdist
In some cases, additional files need to be included in a source distribution;
this is done through a MANIFEST.in file; see the distutils documentation
for details.
If the source distribution has been build successfully, maintainers can also
create binary distributions. Depending on the platform, one of the following
commands can be used to do so.
This chapter briefly explains how to create a Windows extension module for
Python using Microsoft Visual C++, and follows with more detailed background
information on how it works. The explanatory material is useful for both the
Windows programmer learning to build Python extensions and the Unix programmer
interested in producing software which can be successfully built on both Unix
and Windows.
Module authors are encouraged to use the distutils approach for building
extension modules, instead of the one described in this section. You will still
need the C compiler that was used to build Python; typically Microsoft Visual
C++.
Note
This chapter mentions a number of filenames that include an encoded Python
version number. These filenames are represented with the version number shown
as XY; in practice, 'X' will be the major version number and 'Y'
will be the minor version number of the Python release you’re working with. For
example, if you are using Python 2.2.1, XY will actually be 22.
There are two approaches to building extension modules on Windows, just as there
are on Unix: use the distutils package to control the build process, or
do things manually. The distutils approach works well for most extensions;
documentation on using distutils to build and package extension modules
is available in Distributing Python Modules. This section describes the manual
approach to building Python extensions written in C or C++.
To build extensions using these instructions, you need to have a copy of the
Python sources of the same version as your installed Python. You will need
Microsoft Visual C++ “Developer Studio”; project files are supplied for VC++
version 7.1, but you can use older versions of VC++. Notice that you should use
the same version of VC++that was used to build Python itself. The example files
described here are distributed with the Python sources in the
PC\example_nt\ directory.
Copy the example files — The example_nt directory is a
subdirectory of the PC directory, in order to keep all the PC-specific
files under the same directory in the source distribution. However, the
example_nt directory can’t actually be used from this location. You
first need to copy or move it up one level, so that example_nt is a
sibling of the PC and Include directories. Do all your work
from within this new location.
Open the project — From VC++, use the File ‣ Open
Solution dialog (not File ‣ Open!). Navigate to and select
the file example.sln, in the copy of the example_nt directory
you made above. Click Open.
Build the example DLL — In order to check that everything is set up
right, try building:
Select a configuration. This step is optional. Choose
Build ‣ Configuration Manager ‣ Active Solution Configuration
and select either Release or Debug. If you skip this
step, VC++ will use the Debug configuration by default.
Build the DLL. Choose Build ‣ Build Solution. This
creates all intermediate and result files in a subdirectory called either
Debug or Release, depending on which configuration you selected
in the preceding step.
Testing the debug-mode DLL — Once the Debug build has succeeded, bring
up a DOS box, and change to the example_nt\Debug directory. You should
now be able to repeat the following session (C> is the DOS prompt, >>>
is the Python prompt; note that build information and various debug output from
Python may not match this screen dump exactly):
C>..\..\PCbuild\python_d
Adding parser accelerators ...
Done.
Python 2.2 (#28, Dec 19 2001, 23:26:37) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> import example
[4897 refs]
>>> example.foo()
Hello, world
[4903 refs]
>>>
Congratulations! You’ve successfully built your first Python extension module.
Creating your own project — Choose a name and create a directory for
it. Copy your C sources into it. Note that the module source file name does
not necessarily have to match the module name, but the name of the
initialization function should match the module name — you can only import a
module spam if its initialization function is called initspam(),
and it should call Py_InitModule() with the string "spam" as its
first argument (use the minimal example.c in this directory as a guide).
By convention, it lives in a file called spam.c or spammodule.c.
The output file should be called spam.pyd (in Release mode) or
spam_d.pyd (in Debug mode). The extension .pyd was chosen
to avoid confusion with a system library spam.dll to which your module
could be a Python interface.
Now your options are:
Copy example.sln and example.vcproj, rename them to
spam.*, and edit them by hand, or
Create a brand new project; instructions are below.
In either case, copy example_nt\example.def to spam\spam.def,
and edit the new spam.def so its second line contains the string
‘initspam‘. If you created a new project yourself, add the file
spam.def to the project now. (This is an annoying little file with only
two lines. An alternative approach is to forget about the .def file,
and add the option /export:initspam somewhere to the Link settings, by
manually editing the setting in Project Properties dialog).
Creating a brand new project — Use the File ‣ New
‣ Project dialog to create a new Project Workspace. Select Visual
C++ Projects/Win32/ Win32 Project, enter the name (spam), and make sure the
Location is set to parent of the spam directory you have created (which
should be a direct subdirectory of the Python build tree, a sibling of
Include and PC). Select Win32 as the platform (in my version,
this is the only choice). Make sure the Create new workspace radio button is
selected. Click OK.
You should now create the file spam.def as instructed in the previous
section. Add the source files to the project, using Project ‣
Add Existing Item. Set the pattern to *.* and select both spam.c
and spam.def and click OK. (Inserting them one by one is fine too.)
Now open the Project ‣ spam properties dialog. You only need
to change a few settings. Make sure All Configurations is selected
from the Settings for: dropdown list. Select the C/C++ tab. Choose
the General category in the popup menu at the top. Type the following text in
the entry box labeled Additional Include Directories:
..\Include,..\PC
Then, choose the General category in the Linker tab, and enter
..\PCbuild
in the text box labelled Additional library Directories.
Now you need to add some mode-specific settings:
Select Release in the Configuration dropdown list.
Choose the Link tab, choose the Input category, and
append pythonXY.lib to the list in the Additional Dependencies
box.
Select Debug in the Configuration dropdown list, and
append pythonXY_d.lib to the list in the Additional Dependencies
box. Then click the C/C++ tab, select Code Generation, and select
Multi-threaded Debug DLL from the Runtime library
dropdown list.
Select Release again from the Configuration dropdown
list. Select Multi-threaded DLL from the Runtime
library dropdown list.
If your module creates a new type, you may have trouble with this line:
PyVarObject_HEAD_INIT(&PyType_Type,0)
Static type object initializers in extension modules may cause
compiles to fail with an error message like “initializer not a
constant”. This shows up when building DLL under MSVC. Change it to:
PyVarObject_HEAD_INIT(NULL,0)
and add the following to the module initialization function:
Unix and Windows use completely different paradigms for run-time loading of
code. Before you try to build a module that can be dynamically loaded, be aware
of how your system works.
In Unix, a shared object (.so) file contains code to be used by the
program, and also the names of functions and data that it expects to find in the
program. When the file is joined to the program, all references to those
functions and data in the file’s code are changed to point to the actual
locations in the program where the functions and data are placed in memory.
This is basically a link operation.
In Windows, a dynamic-link library (.dll) file has no dangling
references. Instead, an access to functions or data goes through a lookup
table. So the DLL code does not have to be fixed up at runtime to refer to the
program’s memory; instead, the code already uses the DLL’s lookup table, and the
lookup table is modified at runtime to point to the functions and data.
In Unix, there is only one type of library file (.a) which contains code
from several object files (.o). During the link step to create a shared
object file (.so), the linker may find that it doesn’t know where an
identifier is defined. The linker will look for it in the object files in the
libraries; if it finds it, it will include all the code from that object file.
In Windows, there are two types of library, a static library and an import
library (both called .lib). A static library is like a Unix .a
file; it contains code to be included as necessary. An import library is
basically used only to reassure the linker that a certain identifier is legal,
and will be present in the program when the DLL is loaded. So the linker uses
the information from the import library to build the lookup table for using
identifiers that are not included in the DLL. When an application or a DLL is
linked, an import library may be generated, which will need to be used for all
future DLLs that depend on the symbols in the application or DLL.
Suppose you are building two dynamic-load modules, B and C, which should share
another block of code A. On Unix, you would not pass A.a to the
linker for B.so and C.so; that would cause it to be included
twice, so that B and C would each have their own copy. In Windows, building
A.dll will also build A.lib. You do pass A.lib to the
linker for B and C. A.lib does not contain code; it just contains
information which will be used at runtime to access A’s code.
In Windows, using an import library is sort of like using importspam; it
gives you access to spam’s names, but does not create a separate copy. On Unix,
linking with a library is more like fromspamimport*; it does create a
separate copy.
Windows Python is built in Microsoft Visual C++; using other compilers may or
may not work (though Borland seems to). The rest of this section is MSVC++
specific.
When creating DLLs in Windows, you must pass pythonXY.lib to the linker.
To build two DLLs, spam and ni (which uses C functions found in spam), you could
use these commands:
The first command created three files: spam.obj, spam.dll and
spam.lib. Spam.dll does not contain any Python functions (such
as PyArg_ParseTuple()), but it does know how to find the Python code
thanks to pythonXY.lib.
The second command created ni.dll (and .obj and .lib),
which knows how to find the necessary functions from spam, and also from the
Python executable.
Not every identifier is exported to the lookup table. If you want any other
modules (including Python) to be able to see your identifiers, you have to say
_declspec(dllexport), as in void_declspec(dllexport)initspam(void) or
PyObject_declspec(dllexport)*NiGetSpamData(void).
Developer Studio will throw in a lot of import libraries that you do not really
need, adding about 100K to your executable. To get rid of them, use the Project
Settings dialog, Link tab, to specify ignore default libraries. Add the
correct msvcrtxx.lib to the list of libraries.
The previous chapters discussed how to extend Python, that is, how to extend the
functionality of Python by attaching a library of C functions to it. It is also
possible to do it the other way around: enrich your C/C++ application by
embedding Python in it. Embedding provides your application with the ability to
implement some of the functionality of your application in Python rather than C
or C++. This can be used for many purposes; one example would be to allow users
to tailor the application to their needs by writing some scripts in Python. You
can also use it yourself if some of the functionality can be written in Python
more easily.
Embedding Python is similar to extending it, but not quite. The difference is
that when you extend Python, the main program of the application is still the
Python interpreter, while if you embed Python, the main program may have nothing
to do with Python — instead, some parts of the application occasionally call
the Python interpreter to run some Python code.
So if you are embedding Python, you are providing your own main program. One of
the things this main program has to do is initialize the Python interpreter. At
the very least, you have to call the function Py_Initialize(). There are
optional calls to pass command line arguments to Python. Then later you can
call the interpreter from any part of the application.
There are several different ways to call the interpreter: you can pass a string
containing Python statements to PyRun_SimpleString(), or you can pass a
stdio file pointer and a file name (for identification in error messages only)
to PyRun_SimpleFile(). You can also call the lower-level operations
described in the previous chapters to construct and use Python objects.
The simplest form of embedding Python is the use of the very high level
interface. This interface is intended to execute a Python script without needing
to interact with the application directly. This can for example be used to
perform some operation on a file.
#include <Python.h>intmain(intargc,char*argv[]){Py_Initialize();PyRun_SimpleString("from time import time,ctime\n""print('Today is', ctime(time()))\n");Py_Finalize();return0;}
The above code first initializes the Python interpreter with
Py_Initialize(), followed by the execution of a hard-coded Python script
that print the date and time. Afterwards, the Py_Finalize() call shuts
the interpreter down, followed by the end of the program. In a real program,
you may want to get the Python script from another source, perhaps a text-editor
routine, a file, or a database. Getting the Python code from a file can better
be done by using the PyRun_SimpleFile() function, which saves you the
trouble of allocating memory space and loading the file contents.
The high level interface gives you the ability to execute arbitrary pieces of
Python code from your application, but exchanging data values is quite
cumbersome to say the least. If you want that, you should use lower level calls.
At the cost of having to write more C code, you can achieve almost anything.
It should be noted that extending Python and embedding Python is quite the same
activity, despite the different intent. Most topics discussed in the previous
chapters are still valid. To show this, consider what the extension code from
Python to C really does:
Convert data values from Python to C,
Perform a function call to a C routine using the converted values, and
Convert the data values from the call from C to Python.
When embedding Python, the interface code does:
Convert data values from C to Python,
Perform a function call to a Python interface routine using the converted
values, and
Convert the data values from the call from Python to C.
As you can see, the data conversion steps are simply swapped to accommodate the
different direction of the cross-language transfer. The only difference is the
routine that you call between both data conversions. When extending, you call a
C routine, when embedding, you call a Python routine.
This chapter will not discuss how to convert data from Python to C and vice
versa. Also, proper use of references and dealing with errors is assumed to be
understood. Since these aspects do not differ from extending the interpreter,
you can refer to earlier chapters for the required information.
The first program aims to execute a function in a Python script. Like in the
section about the very high level interface, the Python interpreter does not
directly interact with the application (but that will change in the next
section).
The code to run a function defined in a Python script is:
#include <Python.h>intmain(intargc,char*argv[]){PyObject*pName,*pModule,*pDict,*pFunc;PyObject*pArgs,*pValue;inti;if(argc<3){fprintf(stderr,"Usage: call pythonfile funcname [args]\n");return1;}Py_Initialize();pName=PyUnicode_FromString(argv[1]);/* Error checking of pName left out */pModule=PyImport_Import(pName);Py_DECREF(pName);if(pModule!=NULL){pFunc=PyObject_GetAttrString(pModule,argv[2]);/* pFunc is a new reference */if(pFunc&&PyCallable_Check(pFunc)){pArgs=PyTuple_New(argc-3);for(i=0;i<argc-3;++i){pValue=PyLong_FromLong(atoi(argv[i+3]));if(!pValue){Py_DECREF(pArgs);Py_DECREF(pModule);fprintf(stderr,"Cannot convert argument\n");return1;}/* pValue reference stolen here: */PyTuple_SetItem(pArgs,i,pValue);}pValue=PyObject_CallObject(pFunc,pArgs);Py_DECREF(pArgs);if(pValue!=NULL){printf("Result of call: %ld\n",PyLong_AsLong(pValue));Py_DECREF(pValue);}else{Py_DECREF(pFunc);Py_DECREF(pModule);PyErr_Print();fprintf(stderr,"Call failed\n");return1;}}else{if(PyErr_Occurred())PyErr_Print();fprintf(stderr,"Cannot find function \"%s\"\n",argv[2]);}Py_XDECREF(pFunc);Py_DECREF(pModule);}else{PyErr_Print();fprintf(stderr,"Failed to load \"%s\"\n",argv[1]);return1;}Py_Finalize();return0;}
This code loads a Python script using argv[1], and calls the function named
in argv[2]. Its integer arguments are the other values of the argv
array. If you compile and link this program (let’s call the finished executable
call), and use it to execute a Python script, such as:
$ call multiply multiply 3 2
Will compute 3 times 2
Result of call: 6
Although the program is quite large for its functionality, most of the code is
for data conversion between Python and C, and for error reporting. The
interesting part with respect to embedding Python starts with
Py_Initialize();pName=PyString_FromString(argv[1]);/* Error checking of pName left out */pModule=PyImport_Import(pName);
After initializing the interpreter, the script is loaded using
PyImport_Import(). This routine needs a Python string as its argument,
which is constructed using the PyString_FromString() data conversion
routine.
pFunc=PyObject_GetAttrString(pModule,argv[2]);/* pFunc is a new reference */if(pFunc&&PyCallable_Check(pFunc)){...}Py_XDECREF(pFunc);
Once the script is loaded, the name we’re looking for is retrieved using
PyObject_GetAttrString(). If the name exists, and the object returned is
callable, you can safely assume that it is a function. The program then
proceeds by constructing a tuple of arguments as normal. The call to the Python
function is then made with:
pValue=PyObject_CallObject(pFunc,pArgs);
Upon return of the function, pValue is either NULL or it contains a
reference to the return value of the function. Be sure to release the reference
after examining the value.
Until now, the embedded Python interpreter had no access to functionality from
the application itself. The Python API allows this by extending the embedded
interpreter. That is, the embedded interpreter gets extended with routines
provided by the application. While it sounds complex, it is not so bad. Simply
forget for a while that the application starts the Python interpreter. Instead,
consider the application to be a set of subroutines, and write some glue code
that gives Python access to those routines, just like you would write a normal
Python extension. For example:
staticintnumargs=0;/* Return the number of arguments of the application command line */staticPyObject*emb_numargs(PyObject*self,PyObject*args){if(!PyArg_ParseTuple(args,":numargs"))returnNULL;returnPyLong_FromLong(numargs);}staticPyMethodDefEmbMethods[]={{"numargs",emb_numargs,METH_VARARGS,"Return the number of arguments received by the process."},{NULL,NULL,0,NULL}};staticPyModuleDefEmbModule={PyModuleDef_HEAD_INIT,"emb",NULL,-1,EmbMethods,NULL,NULL,NULL,NULL};staticPyObject*PyInit_emb(void){returnPyModule_Create(&EmbModule);}
Insert the above code just above the main() function. Also, insert the
following two statements before the call to Py_Initialize():
These two lines initialize the numargs variable, and make the
emb.numargs() function accessible to the embedded Python interpreter.
With these extensions, the Python script can do things like
importembprint("Number of arguments",emb.numargs())
In a real application, the methods will expose an API of the application to
Python.
It is also possible to embed Python in a C++ program; precisely how this is done
will depend on the details of the C++ system used; in general you will need to
write the main program in C++, and use the C++ compiler to compile and link your
program. There is no need to recompile Python itself using C++.
While the configure script shipped with the Python sources will
correctly build Python to export the symbols needed by dynamically linked
extensions, this is not automatically inherited by applications which embed the
Python library statically, at least on Unix. This is an issue when the
application is linked to the static runtime library (libpython.a) and
needs to load dynamic extensions (implemented as .so files).
The problem is that some entry points are defined by the Python runtime solely
for extension modules to use. If the embedding application does not use any of
these entry points, some linkers will not include those entries in the symbol
table of the finished executable. Some additional options are needed to inform
the linker not to remove these symbols.
Determining the right options to use for any given platform can be quite
difficult, but fortunately the Python configuration already has those values.
To retrieve them from an installed Python interpreter, start an interactive
interpreter and have a short session like this:
The contents of the string presented will be the options that should be used.
If the string is empty, there’s no need to add any additional options. The
LINKFORSHARED definition corresponds to the variable of the same name
in Python’s top-level Makefile.
The Application Programmer’s Interface to Python gives C and C++ programmers
access to the Python interpreter at a variety of levels. The API is equally
usable from C++, but for brevity it is generally referred to as the Python/C
API. There are two fundamentally different reasons for using the Python/C API.
The first reason is to write extension modules for specific purposes; these
are C modules that extend the Python interpreter. This is probably the most
common use. The second reason is to use Python as a component in a larger
application; this technique is generally referred to as embedding Python
in an application.
Writing an extension module is a relatively well-understood process, where a
“cookbook” approach works well. There are several tools that automate the
process to some extent. While people have embedded Python in other
applications since its early existence, the process of embedding Python is less
straightforward than writing an extension.
Many API functions are useful independent of whether you’re embedding or
extending Python; moreover, most applications that embed Python will need to
provide a custom extension as well, so it’s probably a good idea to become
familiar with writing an extension before attempting to embed Python in a real
application.
All function, type and macro definitions needed to use the Python/C API are
included in your code by the following line:
#include "Python.h"
This implies inclusion of the following standard headers: <stdio.h>,
<string.h>, <errno.h>, <limits.h>, <assert.h> and <stdlib.h>
(if available).
Note
Since Python may define some pre-processor definitions which affect the standard
headers on some systems, you must include Python.h before any standard
headers are included.
All user visible names defined by Python.h (except those defined by the included
standard headers) have one of the prefixes Py or _Py. Names beginning
with _Py are for internal use by the Python implementation and should not be
used by extension writers. Structure member names do not have a reserved prefix.
Important: user code should never define names that begin with Py or
_Py. This confuses the reader, and jeopardizes the portability of the user
code to future Python versions, which may define additional names beginning with
one of these prefixes.
The header files are typically installed with Python. On Unix, these are
located in the directories prefix/include/pythonversion/ and
exec_prefix/include/pythonversion/, where prefix and
exec_prefix are defined by the corresponding parameters to Python’s
configure script and version is sys.version[:3]. On Windows,
the headers are installed in prefix/include, where prefix is
the installation directory specified to the installer.
To include the headers, place both directories (if different) on your compiler’s
search path for includes. Do not place the parent directories on the search
path and then use #include<pythonX.Y/Python.h>; this will break on
multi-platform builds since the platform independent headers under
prefix include the platform specific headers from
exec_prefix.
C++ users should note that though the API is defined entirely using C, the
header files do properly declare the entry points to be extern"C", so there
is no need to do anything special to use the API from C++.
Most Python/C API functions have one or more arguments as well as a return value
of type PyObject*. This type is a pointer to an opaque data type
representing an arbitrary Python object. Since all Python object types are
treated the same way by the Python language in most situations (e.g.,
assignments, scope rules, and argument passing), it is only fitting that they
should be represented by a single C type. Almost all Python objects live on the
heap: you never declare an automatic or static variable of type
PyObject, only pointer variables of type PyObject* can be
declared. The sole exception are the type objects; since these must never be
deallocated, they are typically static PyTypeObject objects.
All Python objects (even Python integers) have a type and a
reference count. An object’s type determines what kind of object it is
(e.g., an integer, a list, or a user-defined function; there are many more as
explained in 标准类型层次). For each of the well-known types there is a macro
to check whether an object is of that type; for instance, PyList_Check(a) is
true if (and only if) the object pointed to by a is a Python list.
The reference count is important because today’s computers have a finite (and
often severely limited) memory size; it counts how many different places there
are that have a reference to an object. Such a place could be another object,
or a global (or static) C variable, or a local variable in some C function.
When an object’s reference count becomes zero, the object is deallocated. If
it contains references to other objects, their reference count is decremented.
Those other objects may be deallocated in turn, if this decrement makes their
reference count become zero, and so on. (There’s an obvious problem with
objects that reference each other here; for now, the solution is “don’t do
that.”)
Reference counts are always manipulated explicitly. The normal way is to use
the macro Py_INCREF() to increment an object’s reference count by one,
and Py_DECREF() to decrement it by one. The Py_DECREF() macro
is considerably more complex than the incref one, since it must check whether
the reference count becomes zero and then cause the object’s deallocator to be
called. The deallocator is a function pointer contained in the object’s type
structure. The type-specific deallocator takes care of decrementing the
reference counts for other objects contained in the object if this is a compound
object type, such as a list, as well as performing any additional finalization
that’s needed. There’s no chance that the reference count can overflow; at
least as many bits are used to hold the reference count as there are distinct
memory locations in virtual memory (assuming sizeof(Py_ssize_t)>=sizeof(void*)).
Thus, the reference count increment is a simple operation.
It is not necessary to increment an object’s reference count for every local
variable that contains a pointer to an object. In theory, the object’s
reference count goes up by one when the variable is made to point to it and it
goes down by one when the variable goes out of scope. However, these two
cancel each other out, so at the end the reference count hasn’t changed. The
only real reason to use the reference count is to prevent the object from being
deallocated as long as our variable is pointing to it. If we know that there
is at least one other reference to the object that lives at least as long as
our variable, there is no need to increment the reference count temporarily.
An important situation where this arises is in objects that are passed as
arguments to C functions in an extension module that are called from Python;
the call mechanism guarantees to hold a reference to every argument for the
duration of the call.
However, a common pitfall is to extract an object from a list and hold on to it
for a while without incrementing its reference count. Some other operation might
conceivably remove the object from the list, decrementing its reference count
and possible deallocating it. The real danger is that innocent-looking
operations may invoke arbitrary Python code which could do this; there is a code
path which allows control to flow back to the user from a Py_DECREF(), so
almost any operation is potentially dangerous.
A safe approach is to always use the generic operations (functions whose name
begins with PyObject_, PyNumber_, PySequence_ or PyMapping_).
These operations always increment the reference count of the object they return.
This leaves the caller with the responsibility to call Py_DECREF() when
they are done with the result; this soon becomes second nature.
The reference count behavior of functions in the Python/C API is best explained
in terms of ownership of references. Ownership pertains to references, never
to objects (objects are not owned: they are always shared). “Owning a
reference” means being responsible for calling Py_DECREF on it when the
reference is no longer needed. Ownership can also be transferred, meaning that
the code that receives ownership of the reference then becomes responsible for
eventually decref’ing it by calling Py_DECREF() or Py_XDECREF()
when it’s no longer needed—or passing on this responsibility (usually to its
caller). When a function passes ownership of a reference on to its caller, the
caller is said to receive a new reference. When no ownership is transferred,
the caller is said to borrow the reference. Nothing needs to be done for a
borrowed reference.
Conversely, when a calling function passes in a reference to an object, there
are two possibilities: the function steals a reference to the object, or it
does not. Stealing a reference means that when you pass a reference to a
function, that function assumes that it now owns that reference, and you are not
responsible for it any longer.
Few functions steal references; the two notable exceptions are
PyList_SetItem() and PyTuple_SetItem(), which steal a reference
to the item (but not to the tuple or list into which the item is put!). These
functions were designed to steal a reference because of a common idiom for
populating a tuple or list with newly created objects; for example, the code to
create the tuple (1,2,"three") could look like this (forgetting about
error handling for the moment; a better way to code this is shown below):
Here, PyLong_FromLong() returns a new reference which is immediately
stolen by PyTuple_SetItem(). When you want to keep using an object
although the reference to it will be stolen, use Py_INCREF() to grab
another reference before calling the reference-stealing function.
However, in practice, you will rarely use these ways of creating and populating
a tuple or list. There’s a generic function, Py_BuildValue(), that can
create most common objects from C values, directed by a format string.
For example, the above two blocks of code could be replaced by the following
(which also takes care of the error checking):
It is much more common to use PyObject_SetItem() and friends with items
whose references you are only borrowing, like arguments that were passed in to
the function you are writing. In that case, their behaviour regarding reference
counts is much saner, since you don’t have to increment a reference count so you
can give a reference away (“have it be stolen”). For example, this function
sets all items of a list (actually, any mutable sequence) to a given item:
The situation is slightly different for function return values. While passing
a reference to most functions does not change your ownership responsibilities
for that reference, many functions that return a reference to an object give
you ownership of the reference. The reason is simple: in many cases, the
returned object is created on the fly, and the reference you get is the only
reference to the object. Therefore, the generic functions that return object
references, like PyObject_GetItem() and PySequence_GetItem(),
always return a new reference (the caller becomes the owner of the reference).
It is important to realize that whether you own a reference returned by a
function depends on which function you call only — the plumage (the type of
the object passed as an argument to the function) doesn’t enter into it!
Thus, if you extract an item from a list using PyList_GetItem(), you
don’t own the reference — but if you obtain the same item from the same list
using PySequence_GetItem() (which happens to take exactly the same
arguments), you do own a reference to the returned object.
Here is an example of how you could write a function that computes the sum of
the items in a list of integers; once using PyList_GetItem(), and once
using PySequence_GetItem().
longsum_list(PyObject*list){inti,n;longtotal=0;PyObject*item;n=PyList_Size(list);if(n<0)return-1;/* Not a list */for(i=0;i<n;i++){item=PyList_GetItem(list,i);/* Can't fail */if(!PyLong_Check(item))continue;/* Skip non-integers */total+=PyLong_AsLong(item);}returntotal;}
longsum_sequence(PyObject*sequence){inti,n;longtotal=0;PyObject*item;n=PySequence_Length(sequence);if(n<0)return-1;/* Has no length */for(i=0;i<n;i++){item=PySequence_GetItem(sequence,i);if(item==NULL)return-1;/* Not a sequence, or other failure */if(PyLong_Check(item))total+=PyLong_AsLong(item);Py_DECREF(item);/* Discard reference ownership */}returntotal;}
There are few other data types that play a significant role in the Python/C
API; most are simple C types such as int, long,
double and char*. A few structure types are used to
describe static tables used to list the functions exported by a module or the
data attributes of a new object type, and another is used to describe the value
of a complex number. These will be discussed together with the functions that
use them.
The Python programmer only needs to deal with exceptions if specific error
handling is required; unhandled exceptions are automatically propagated to the
caller, then to the caller’s caller, and so on, until they reach the top-level
interpreter, where they are reported to the user accompanied by a stack
traceback.
For C programmers, however, error checking always has to be explicit. All
functions in the Python/C API can raise exceptions, unless an explicit claim is
made otherwise in a function’s documentation. In general, when a function
encounters an error, it sets an exception, discards any object references that
it owns, and returns an error indicator. If not documented otherwise, this
indicator is either NULL or -1, depending on the function’s return type.
A few functions return a Boolean true/false result, with false indicating an
error. Very few functions return no explicit error indicator or have an
ambiguous return value, and require explicit testing for errors with
PyErr_Occurred(). These exceptions are always explicitly documented.
Exception state is maintained in per-thread storage (this is equivalent to
using global storage in an unthreaded application). A thread can be in one of
two states: an exception has occurred, or not. The function
PyErr_Occurred() can be used to check for this: it returns a borrowed
reference to the exception type object when an exception has occurred, and
NULL otherwise. There are a number of functions to set the exception state:
PyErr_SetString() is the most common (though not the most general)
function to set the exception state, and PyErr_Clear() clears the
exception state.
The full exception state consists of three objects (all of which can be
NULL): the exception type, the corresponding exception value, and the
traceback. These have the same meanings as the Python result of
sys.exc_info(); however, they are not the same: the Python objects represent
the last exception being handled by a Python try ...
except statement, while the C level exception state only exists while
an exception is being passed on between C functions until it reaches the Python
bytecode interpreter’s main loop, which takes care of transferring it to
sys.exc_info() and friends.
Note that starting with Python 1.5, the preferred, thread-safe way to access the
exception state from Python code is to call the function sys.exc_info(),
which returns the per-thread exception state for Python code. Also, the
semantics of both ways to access the exception state have changed so that a
function which catches an exception will save and restore its thread’s exception
state so as to preserve the exception state of its caller. This prevents common
bugs in exception handling code caused by an innocent-looking function
overwriting the exception being handled; it also reduces the often unwanted
lifetime extension for objects that are referenced by the stack frames in the
traceback.
As a general principle, a function that calls another function to perform some
task should check whether the called function raised an exception, and if so,
pass the exception state on to its caller. It should discard any object
references that it owns, and return an error indicator, but it should not set
another exception — that would overwrite the exception that was just raised,
and lose important information about the exact cause of the error.
A simple example of detecting exceptions and passing them on is shown in the
sum_sequence() example above. It so happens that that example doesn’t
need to clean up any owned references when it detects an error. The following
example function shows some error cleanup. First, to remind you why you like
Python, we show the equivalent Python code:
Here is the corresponding C code, in all its glory:
intincr_item(PyObject*dict,PyObject*key){/* Objects all initialized to NULL for Py_XDECREF */PyObject*item=NULL,*const_one=NULL,*incremented_item=NULL;intrv=-1;/* Return value initialized to -1 (failure) */item=PyObject_GetItem(dict,key);if(item==NULL){/* Handle KeyError only: */if(!PyErr_ExceptionMatches(PyExc_KeyError))gotoerror;/* Clear the error and use zero: */PyErr_Clear();item=PyLong_FromLong(0L);if(item==NULL)gotoerror;}const_one=PyLong_FromLong(1L);if(const_one==NULL)gotoerror;incremented_item=PyNumber_Add(item,const_one);if(incremented_item==NULL)gotoerror;if(PyObject_SetItem(dict,key,incremented_item)<0)gotoerror;rv=0;/* Success *//* Continue with cleanup code */error:/* Cleanup code, shared by success and failure path *//* Use Py_XDECREF() to ignore NULL references */Py_XDECREF(item);Py_XDECREF(const_one);Py_XDECREF(incremented_item);returnrv;/* -1 for error, 0 for success */}
This example represents an endorsed use of the goto statement in C!
It illustrates the use of PyErr_ExceptionMatches() and
PyErr_Clear() to handle specific exceptions, and the use of
Py_XDECREF() to dispose of owned references that may be NULL (note the
'X' in the name; Py_DECREF() would crash when confronted with a
NULL reference). It is important that the variables used to hold owned
references are initialized to NULL for this to work; likewise, the proposed
return value is initialized to -1 (failure) and only set to success after
the final call made is successful.
The one important task that only embedders (as opposed to extension writers) of
the Python interpreter have to worry about is the initialization, and possibly
the finalization, of the Python interpreter. Most functionality of the
interpreter can only be used after the interpreter has been initialized.
The basic initialization function is Py_Initialize(). This initializes
the table of loaded modules, and creates the fundamental modules
builtins, __main__, and sys. It also
initializes the module search path (sys.path).
Py_Initialize() does not set the “script argument list” (sys.argv).
If this variable is needed by Python code that will be executed later, it must
be set explicitly with a call to PySys_SetArgvEx(argc,argv,updatepath)
after the call to Py_Initialize().
On most systems (in particular, on Unix and Windows, although the details are
slightly different), Py_Initialize() calculates the module search path
based upon its best guess for the location of the standard Python interpreter
executable, assuming that the Python library is found in a fixed location
relative to the Python interpreter executable. In particular, it looks for a
directory named lib/pythonX.Y relative to the parent directory
where the executable named python is found on the shell command search
path (the environment variable PATH).
For instance, if the Python executable is found in
/usr/local/bin/python, it will assume that the libraries are in
/usr/local/lib/pythonX.Y. (In fact, this particular path is also
the “fallback” location, used when no executable file named python is
found along PATH.) The user can override this behavior by setting the
environment variable PYTHONHOME, or insert additional directories in
front of the standard path by setting PYTHONPATH.
The embedding application can steer the search by calling
Py_SetProgramName(file)before calling Py_Initialize(). Note that
PYTHONHOME still overrides this and PYTHONPATH is still
inserted in front of the standard path. An application that requires total
control has to provide its own implementation of Py_GetPath(),
Py_GetPrefix(), Py_GetExecPrefix(), and
Py_GetProgramFullPath() (all defined in Modules/getpath.c).
Sometimes, it is desirable to “uninitialize” Python. For instance, the
application may want to start over (make another call to
Py_Initialize()) or the application is simply done with its use of
Python and wants to free memory allocated by Python. This can be accomplished
by calling Py_Finalize(). The function Py_IsInitialized() returns
true if Python is currently in the initialized state. More information about
these functions is given in a later chapter. Notice that Py_Finalize()
does not free all memory allocated by the Python interpreter, e.g. memory
allocated by extension modules currently cannot be released.
Python can be built with several macros to enable extra checks of the
interpreter and extension modules. These checks tend to add a large amount of
overhead to the runtime so they are not enabled by default.
A full list of the various types of debugging builds is in the file
Misc/SpecialBuilds.txt in the Python source distribution. Builds are
available that support tracing of reference counts, debugging the memory
allocator, or low-level profiling of the main interpreter loop. Only the most
frequently-used builds will be described in the remainder of this section.
Compiling the interpreter with the Py_DEBUG macro defined produces
what is generally meant by “a debug build” of Python. Py_DEBUG is
enabled in the Unix build by adding --with-pydebug to the
./configure command. It is also implied by the presence of the
not-Python-specific _DEBUG macro. When Py_DEBUG is enabled
in the Unix build, compiler optimization is disabled.
In addition to the reference count debugging described below, the following
extra checks are performed:
Extra checks are added to the object allocator.
Extra checks are added to the parser and compiler.
Downcasts from wide types to narrow types are checked for loss of information.
A number of assertions are added to the dictionary and set implementations.
In addition, the set object acquires a test_c_api() method.
Sanity checks of the input arguments are added to frame creation.
The storage for ints is initialized with a known invalid pattern to catch
reference to uninitialized digits.
Low-level tracing and extra exception checking are added to the runtime
virtual machine.
Extra checks are added to the memory arena implementation.
Extra debugging is added to the thread module.
There may be additional checks not mentioned here.
Defining Py_TRACE_REFS enables reference tracing. When defined, a
circular doubly linked list of active objects is maintained by adding two extra
fields to every PyObject. Total allocations are tracked as well. Upon
exit, all existing references are printed. (In interactive mode this happens
after every statement run by the interpreter.) Implied by Py_DEBUG.
Please refer to Misc/SpecialBuilds.txt in the Python source distribution
for more detailed information.
The functions in this chapter will let you execute Python source code given in a
file or a buffer, but they will not let you interact in a more detailed way with
the interpreter.
Several of these functions accept a start symbol from the grammar as a
parameter. The available start symbols are Py_eval_input,
Py_file_input, and Py_single_input. These are described
following the functions which accept them as parameters.
Note also that several of these functions take FILE* parameters. One
particular issue which needs to be handled carefully is that the FILE
structure for different C libraries can be different and incompatible. Under
Windows (at least), it is possible for dynamically linked extensions to actually
use different libraries, so care should be taken that FILE* parameters
are only passed to these functions if it is certain that they were created by
the same library that the Python runtime is using.
The main program for the standard interpreter. This is made available for
programs which embed Python. The argc and argv parameters should be
prepared exactly as those which are passed to a C program’s main()
function (converted to wchar_t according to the user’s locale). It is
important to note that the argument list may be modified (but the contents of
the strings pointed to by the argument list are not). The return value will
be 0 if the interpreter exits normally (i.e., without an exception),
1 if the interpreter exits due to an exception, or 2 if the parameter
list does not represent a valid Python command line.
Note that if an otherwise unhandled SystemExit is raised, this
function will not return 1, but exit the process, as long as
Py_InspectFlag is not set.
int PyRun_AnyFile(FILE *fp, const char *filename)¶
This is a simplified interface to PyRun_AnyFileExFlags() below, leaving
closeit set to 0 and flags set to NULL.
int PyRun_AnyFileFlags(FILE *fp, const char *filename, PyCompilerFlags *flags)¶
This is a simplified interface to PyRun_AnyFileExFlags() below, leaving
the closeit argument set to 0.
int PyRun_AnyFileEx(FILE *fp, const char *filename, int closeit)¶
This is a simplified interface to PyRun_AnyFileExFlags() below, leaving
the flags argument set to NULL.
int PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags)¶
If fp refers to a file associated with an interactive device (console or
terminal input or Unix pseudo-terminal), return the value of
PyRun_InteractiveLoop(), otherwise return the result of
PyRun_SimpleFile(). filename is decoded from the filesystem
encoding (sys.getfilesystemencoding()). If filename is NULL, this
function uses "???" as the filename.
This is a simplified interface to PyRun_SimpleStringFlags() below,
leaving the PyCompilerFlags* argument set to NULL.
int PyRun_SimpleStringFlags(const char *command, PyCompilerFlags *flags)¶
Executes the Python source code from command in the __main__ module
according to the flags argument. If __main__ does not already exist, it
is created. Returns 0 on success or -1 if an exception was raised. If
there was an error, there is no way to get the exception information. For the
meaning of flags, see below.
Note that if an otherwise unhandled SystemExit is raised, this
function will not return -1, but exit the process, as long as
Py_InspectFlag is not set.
int PyRun_SimpleFile(FILE *fp, const char *filename)¶
This is a simplified interface to PyRun_SimpleFileExFlags() below,
leaving closeit set to 0 and flags set to NULL.
int PyRun_SimpleFileFlags(FILE *fp, const char *filename, PyCompilerFlags *flags)¶
int PyRun_SimpleFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags)¶
Similar to PyRun_SimpleStringFlags(), but the Python source code is read
from fp instead of an in-memory string. filename should be the name of
the file, it is decoded from the filesystem encoding
(sys.getfilesystemencoding()). If closeit is true, the file is
closed before PyRun_SimpleFileExFlags returns.
int PyRun_InteractiveOne(FILE *fp, const char *filename)¶
int PyRun_InteractiveOneFlags(FILE *fp, const char *filename, PyCompilerFlags *flags)¶
Read and execute a single statement from a file associated with an
interactive device according to the flags argument. The user will be
prompted using sys.ps1 and sys.ps2. filename is decoded from the
filesystem encoding (sys.getfilesystemencoding()).
Returns 0 when the input was
executed successfully, -1 if there was an exception, or an error code
from the errcode.h include file distributed as part of Python if
there was a parse error. (Note that errcode.h is not included by
Python.h, so must be included specifically if needed.)
int PyRun_InteractiveLoop(FILE *fp, const char *filename)¶
int PyRun_InteractiveLoopFlags(FILE *fp, const char *filename, PyCompilerFlags *flags)¶
Read and execute statements from a file associated with an interactive device
until EOF is reached. The user will be prompted using sys.ps1 and
sys.ps2. filename is decoded from the filesystem encoding
(sys.getfilesystemencoding()). Returns 0 at EOF.
struct _node* PyParser_SimpleParseString(const char *str, int start)¶
struct _node* PyParser_SimpleParseStringFlagsFilename(const char *str, const char *filename, int start, int flags)¶
Parse Python source code from str using the start token start according to
the flags argument. The result can be used to create a code object which can
be evaluated efficiently. This is useful if a code fragment must be evaluated
many times. filename is decoded from the filesystem encoding
(sys.getfilesystemencoding()).
struct _node* PyParser_SimpleParseFile(FILE *fp, const char *filename, int start)¶
Execute Python source code from str in the context specified by the
dictionaries globals and locals with the compiler flags specified by
flags. The parameter start specifies the start token that should be used to
parse the source code.
Returns the result of executing the code as a Python object, or NULL if an
exception was raised.
Similar to PyRun_StringFlags(), but the Python source code is read from
fp instead of an in-memory string. filename should be the name of the file,
it is decoded from the filesystem encoding (sys.getfilesystemencoding()).
If closeit is true, the file is closed before PyRun_FileExFlags()
returns.
PyObject* Py_CompileString(const char *str, const char *filename, int start)¶
PyObject* Py_CompileStringExFlags(const char *str, const char *filename, int start, PyCompilerFlags *flags, int optimize)¶
Parse and compile the Python source code in str, returning the resulting code
object. The start token is given by start; this can be used to constrain the
code which can be compiled and should be Py_eval_input,
Py_file_input, or Py_single_input. The filename specified by
filename is used to construct the code object and may appear in tracebacks or
SyntaxError exception messages, it is decoded from the filesystem
encoding (sys.getfilesystemencoding()). This returns NULL if the
code cannot be parsed or compiled.
The integer optimize specifies the optimization level of the compiler; a
value of -1 selects the optimization level of the interpreter as given by
-O options. Explicit levels are 0 (no optimization;
__debug__ is true), 1 (asserts are removed, __debug__ is false)
or 2 (docstrings are removed too).
This is a simplified interface to PyEval_EvalCodeEx(), with just
the code object, and the dictionaries of global and local variables.
The other arguments are set to NULL.
Evaluate a precompiled code object, given a particular environment for its
evaluation. This environment consists of dictionaries of global and local
variables, arrays of arguments, keywords and defaults, and a closure tuple of
cells.
Evaluate an execution frame. This is a simplified interface to
PyEval_EvalFrameEx, for backward compatibility.
PyObject* PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)¶
This is the main, unvarnished function of Python interpretation. It is
literally 2000 lines long. The code object associated with the execution
frame f is executed, interpreting bytecode and executing calls as needed.
The additional throwflag parameter can mostly be ignored - if true, then
it causes an exception to immediately be thrown; this is used for the
throw() methods of generator objects.
The start symbol from the Python grammar for sequences of statements as read
from a file or other source; for use with Py_CompileString(). This is
the symbol to use when compiling arbitrarily long Python source code.
The start symbol from the Python grammar for a single statement; for use with
Py_CompileString(). This is the symbol used for the interactive
interpreter loop.
This is the structure used to hold compiler flags. In cases where code is only
being compiled, it is passed as intflags, and in cases where code is being
executed, it is passed as PyCompilerFlags*flags. In this case, from__future__import can modify flags.
Whenever PyCompilerFlags*flags is NULL, cf_flags is treated as
equal to 0, and any modification due to from__future__import is
discarded.
Decrement the reference count for object o. The object must not be NULL; if
you aren’t sure that it isn’t NULL, use Py_XDECREF(). If the reference
count reaches zero, the object’s type’s deallocation function (which must not be
NULL) is invoked.
Warning
The deallocation function can cause arbitrary Python code to be invoked (e.g.
when a class instance with a __del__() method is deallocated). While
exceptions in such code are not propagated, the executed code has free access to
all Python global variables. This means that any object that is reachable from
a global variable should be in a consistent state before Py_DECREF() is
invoked. For example, code to delete an object from a list should copy a
reference to the deleted object in a temporary variable, update the list data
structure, and then call Py_DECREF() for the temporary variable.
Decrement the reference count for object o. The object may be NULL, in
which case the macro has no effect; otherwise the effect is the same as for
Py_DECREF(), and the same warning applies.
Decrement the reference count for object o. The object may be NULL, in
which case the macro has no effect; otherwise the effect is the same as for
Py_DECREF(), except that the argument is also set to NULL. The warning
for Py_DECREF() does not apply with respect to the object passed because
the macro carefully uses a temporary variable and sets the argument to NULL
before decrementing its reference count.
It is a good idea to use this macro whenever decrementing the value of a
variable that might be traversed during garbage collection.
The following functions are for runtime dynamic embedding of Python:
Py_IncRef(PyObject*o), Py_DecRef(PyObject*o). They are
simply exported function versions of Py_XINCREF() and
Py_XDECREF(), respectively.
The following functions or macros are only for use within the interpreter core:
_Py_Dealloc(), _Py_ForgetReference(), _Py_NewReference(),
as well as the global variable _Py_RefTotal.
The functions described in this chapter will let you handle and raise Python
exceptions. It is important to understand some of the basics of Python
exception handling. It works somewhat like the Unix errno variable:
there is a global indicator (per thread) of the last error that occurred. Most
functions don’t clear this on success, but will set it to indicate the cause of
the error on failure. Most functions also return an error indicator, usually
NULL if they are supposed to return a pointer, or -1 if they return an
integer (exception: the PyArg_*() functions return 1 for success and
0 for failure).
When a function must fail because some function it called failed, it generally
doesn’t set the error indicator; the function it called already set it. It is
responsible for either handling the error and clearing the exception or
returning after cleaning up any resources it holds (such as object references or
memory allocations); it should not continue normally if it is not prepared to
handle the error. If returning due to an error, it is important to indicate to
the caller that an error has been set. If the error is not handled or carefully
propagated, additional calls into the Python/C API may not behave as intended
and may fail in mysterious ways.
The error indicator consists of three Python objects corresponding to the result
of sys.exc_info(). API functions exist to interact with the error indicator
in various ways. There is a separate error indicator for each thread.
Print a standard traceback to sys.stderr and clear the error indicator.
Call this function only when the error indicator is set. (Otherwise it will
cause a fatal error!)
If set_sys_last_vars is nonzero, the variables sys.last_type,
sys.last_value and sys.last_traceback will be set to the
type, value and traceback of the printed exception, respectively.
Test whether the error indicator is set. If set, return the exception type
(the first argument to the last call to one of the PyErr_Set*()
functions or to PyErr_Restore()). If not set, return NULL. You do not
own a reference to the return value, so you do not need to Py_DECREF()
it.
Note
Do not compare the return value to a specific exception; use
PyErr_ExceptionMatches() instead, shown below. (The comparison could
easily fail since the exception may be an instance instead of a class, in the
case of a class exception, or it may the a subclass of the expected exception.)
Equivalent to PyErr_GivenExceptionMatches(PyErr_Occurred(),exc). This
should only be called when an exception is actually set; a memory access
violation will occur if no exception has been raised.
Return true if the given exception matches the exception in exc. If
exc is a class object, this also returns true when given is an instance
of a subclass. If exc is a tuple, all exceptions in the tuple (and
recursively in subtuples) are searched for a match.
Under certain circumstances, the values returned by PyErr_Fetch() below
can be “unnormalized”, meaning that *exc is a class object but *val is
not an instance of the same class. This function can be used to instantiate
the class in that case. If the values are already normalized, nothing happens.
The delayed normalization is implemented to improve performance.
Retrieve the error indicator into three variables whose addresses are passed.
If the error indicator is not set, set all three variables to NULL. If it is
set, it will be cleared and you own a reference to each object retrieved. The
value and traceback object may be NULL even when the type object is not.
Note
This function is normally only used by code that needs to handle exceptions or
by code that needs to save and restore the error indicator temporarily.
Set the error indicator from the three objects. If the error indicator is
already set, it is cleared first. If the objects are NULL, the error
indicator is cleared. Do not pass a NULL type and non-NULL value or
traceback. The exception type should be a class. Do not pass an invalid
exception type or value. (Violating these rules will cause subtle problems
later.) This call takes away a reference to each object: you must own a
reference to each object before the call and after the call you no longer own
these references. (If you don’t understand this, don’t use this function. I
warned you.)
Note
This function is normally only used by code that needs to save and restore the
error indicator temporarily; use PyErr_Fetch() to save the current
exception state.
This is the most common way to set the error indicator. The first argument
specifies the exception type; it is normally one of the standard exceptions,
e.g. PyExc_RuntimeError. You need not increment its reference count.
The second argument is an error message; it is decoded from 'utf-8‘.
This function sets the error indicator and returns NULL. exception
should be a Python exception class. The format and subsequent
parameters help format the error message; they have the same meaning and
values as in PyUnicode_FromFormat(). format is an ASCII-encoded
string.
This is a shorthand for PyErr_SetString(PyExc_TypeError,message), where
message indicates that a built-in operation was invoked with an illegal
argument. It is mostly for internal use.
This is a shorthand for PyErr_SetNone(PyExc_MemoryError); it returns NULL
so an object allocation function can write returnPyErr_NoMemory(); when it
runs out of memory.
This is a convenience function to raise an exception when a C library function
has returned an error and set the C variable errno. It constructs a
tuple object whose first item is the integer errno value and whose
second item is the corresponding error message (gotten from strerror()),
and then calls PyErr_SetObject(type,object). On Unix, when the
errno value is EINTR, indicating an interrupted system call,
this calls PyErr_CheckSignals(), and if that set the error indicator,
leaves it set to that. The function always returns NULL, so a wrapper
function around a system call can write returnPyErr_SetFromErrno(type);
when the system call returns an error.
Similar to PyErr_SetFromErrno(), with the additional behavior that if
filename is not NULL, it is passed to the constructor of type as a third
parameter. In the case of exceptions such as IOError and OSError,
this is used to define the filename attribute of the exception instance.
filename is decoded from the filesystem encoding
(sys.getfilesystemencoding()).
This is a convenience function to raise WindowsError. If called with
ierr of 0, the error code returned by a call to GetLastError()
is used instead. It calls the Win32 function FormatMessage() to retrieve
the Windows description of error code given by ierr or GetLastError(),
then it constructs a tuple object whose first item is the ierr value and whose
second item is the corresponding error message (gotten from
FormatMessage()), and then calls PyErr_SetObject(PyExc_WindowsError,object). This function always returns NULL. Availability: Windows.
Similar to PyErr_SetFromWindowsErr(), with the additional behavior that
if filename is not NULL, it is passed to the constructor of
WindowsError as a third parameter. filename is decoded from the
filesystem encoding (sys.getfilesystemencoding()). Availability:
Windows.
PyObject* PyErr_SetExcFromWindowsErrWithFilename(PyObject *type, int ierr, char *filename)¶
void PyErr_SyntaxLocationEx(char *filename, int lineno, int col_offset)¶
Set file, line, and offset information for the current exception. If the
current exception is not a SyntaxError, then it sets additional
attributes, which make the exception printing subsystem think the exception
is a SyntaxError. filename is decoded from the filesystem encoding
(sys.getfilesystemencoding()).
New in version 3.2:
New in version 3.2.
void PyErr_SyntaxLocation(char *filename, int lineno)¶
Like PyErr_SyntaxLocationExc(), but the col_offset parameter is
omitted.
This is a shorthand for PyErr_SetString(PyExc_SystemError,message),
where message indicates that an internal operation (e.g. a Python/C API
function) was invoked with an illegal argument. It is mostly for internal
use.
int PyErr_WarnEx(PyObject *category, char *message, int stack_level)¶
Issue a warning message. The category argument is a warning category (see
below) or NULL; the message argument is an UTF-8 encoded string. stack_level is a
positive number giving a number of stack frames; the warning will be issued from
the currently executing line of code in that stack frame. A stack_level of 1
is the function calling PyErr_WarnEx(), 2 is the function above that,
and so forth.
This function normally prints a warning message to sys.stderr; however, it is
also possible that the user has specified that warnings are to be turned into
errors, and in that case this will raise an exception. It is also possible that
the function raises an exception because of a problem with the warning machinery
(the implementation imports the warnings module to do the heavy lifting).
The return value is 0 if no exception is raised, or -1 if an exception
is raised. (It is not possible to determine whether a warning message is
actually printed, nor what the reason is for the exception; this is
intentional.) If an exception is raised, the caller should do its normal
exception handling (for example, Py_DECREF() owned references and return
an error value).
Warning categories must be subclasses of Warning; the default warning
category is RuntimeWarning. The standard Python warning categories are
available as global variables whose names are PyExc_ followed by the Python
exception name. These have the type PyObject*; they are all class
objects. Their names are PyExc_Warning, PyExc_UserWarning,
PyExc_UnicodeWarning, PyExc_DeprecationWarning,
PyExc_SyntaxWarning, PyExc_RuntimeWarning, and
PyExc_FutureWarning. PyExc_Warning is a subclass of
PyExc_Exception; the other warning categories are subclasses of
PyExc_Warning.
For information about warning control, see the documentation for the
warnings module and the -W option in the command line
documentation. There is no C API for warning control.
int PyErr_WarnExplicit(PyObject *category, const char *message, const char *filename, int lineno, const char *module, PyObject *registry)¶
Issue a warning message with explicit control over all warning attributes. This
is a straightforward wrapper around the Python function
warnings.warn_explicit(), see there for more information. The module
and registry arguments may be set to NULL to get the default effect
described there. message and module are UTF-8 encoded strings,
filename is decoded from the filesystem encoding
(sys.getfilesystemencoding()).
int PyErr_WarnFormat(PyObject *category, Py_ssize_t stack_level, const char *format, ...)¶
This function interacts with Python’s signal handling. It checks whether a
signal has been sent to the processes and if so, invokes the corresponding
signal handler. If the signal module is supported, this can invoke a
signal handler written in Python. In all cases, the default effect for
SIGINT is to raise the KeyboardInterrupt exception. If an
exception is raised the error indicator is set and the function returns -1;
otherwise the function returns 0. The error indicator may or may not be
cleared if it was previously set.
This function simulates the effect of a SIGINT signal arriving — the
next time PyErr_CheckSignals() is called, KeyboardInterrupt will
be raised. It may be called without holding the interpreter lock.
This utility function specifies a file descriptor to which a '\0' byte will
be written whenever a signal is received. It returns the previous such file
descriptor. The value -1 disables the feature; this is the initial state.
This is equivalent to signal.set_wakeup_fd() in Python, but without any
error checking. fd should be a valid file descriptor. The function should
only be called from the main thread.
This utility function creates and returns a new exception class. The name
argument must be the name of the new exception, a C string of the form
module.classname. The base and dict arguments are normally NULL.
This creates a class object derived from Exception (accessible in C as
PyExc_Exception).
The __module__ attribute of the new class is set to the first part (up
to the last dot) of the name argument, and the class name is set to the last
part (after the last dot). The base argument can be used to specify alternate
base classes; it can either be only one class or a tuple of classes. The dict
argument can be used to specify a dictionary of class variables and methods.
Same as PyErr_NewException(), except that the new exception class can
easily be given a docstring: If doc is non-NULL, it will be used as the
docstring for the exception class.
This utility function prints a warning message to sys.stderr when an
exception has been set but it is impossible for the interpreter to actually
raise the exception. It is used, for example, when an exception occurs in an
__del__() method.
The function is called with a single argument obj that identifies the context
in which the unraisable exception occurred. The repr of obj will be printed in
the warning message.
Return the traceback associated with the exception as a new reference, as
accessible from Python through __traceback__. If there is no
traceback associated, this returns NULL.
Return the context (another exception instance during whose handling ex was
raised) associated with the exception as a new reference, as accessible from
Python through __context__. If there is no context associated, this
returns NULL.
Set the context associated with the exception to ctx. Use NULL to clear
it. There is no type check to make sure that ctx is an exception instance.
This steals a reference to ctx.
Return the cause (another exception instance set by raise...from...)
associated with the exception as a new reference, as accessible from Python
through __cause__. If there is no cause associated, this returns
NULL.
Set the cause associated with the exception to ctx. Use NULL to clear
it. There is no type check to make sure that ctx is an exception instance.
This steals a reference to ctx.
These two functions provide a way to perform safe recursive calls at the C
level, both in the core and in extension modules. They are needed if the
recursive code does not necessarily invoke Python code (which tracks its
recursion depth automatically).
Marks a point where a recursive C-level call is about to be performed.
If USE_STACKCHECK is defined, this function checks if the the OS
stack overflowed using PyOS_CheckStack(). In this is the case, it
sets a MemoryError and returns a nonzero value.
The function then checks if the recursion limit is reached. If this is the
case, a RuntimeError is set and a nonzero value is returned.
Otherwise, zero is returned.
where should be a string such as "ininstancecheck" to be
concatenated to the RuntimeError message caused by the recursion depth
limit.
Properly implementing tp_repr for container types requires
special recursion handling. In addition to protecting the stack,
tp_repr also needs to track objects to prevent cycles. The
following two functions facilitate this functionality. Effectively,
these are the C equivalent to reprlib.recursive_repr().
Called at the beginning of the tp_repr implementation to
detect cycles.
If the object has already been processed, the function returns a
positive integer. In that case the tp_repr implementation
should return a string object indicating a cycle. As examples,
dict objects return {...} and list objects
return [...].
The function will return a negative integer if the recursion limit
is reached. In that case the tp_repr implementation should
typically return NULL.
Otherwise, the function returns zero and the tp_repr
implementation can continue normally.
All standard Python exceptions are available as global variables whose names are
PyExc_ followed by the Python exception name. These have the type
PyObject*; they are all class objects. For completeness, here are all
the variables:
The functions in this chapter perform various utility tasks, ranging from
helping C code be more portable across platforms, using Python modules from C,
and parsing function arguments and constructing Python values from C values.
int Py_FdIsInteractive(FILE *fp, const char *filename)¶
Return true (nonzero) if the standard I/O file fp with name filename is
deemed interactive. This is the case for files for which isatty(fileno(fp))
is true. If the global flag Py_InteractiveFlag is true, this function
also returns true if the filename pointer is NULL or if the name is equal to
one of the strings '<stdin>' or '???'.
Function to update some internal state after a process fork; this should be
called in the new process if the Python interpreter will continue to be used.
If a new executable is loaded into the new process, this function does not need
to be called.
Return true when the interpreter runs out of stack space. This is a reliable
check, but is only available when USE_STACKCHECK is defined (currently
on Windows using the Microsoft Visual C++ compiler). USE_STACKCHECK
will be defined automatically; you should never change the definition in your
own code.
Return the current signal handler for signal i. This is a thin wrapper around
either sigaction() or signal(). Do not call those functions
directly! PyOS_sighandler_t is a typedef alias for void(*)(int).
PyOS_sighandler_t PyOS_setsig(int i, PyOS_sighandler_t h)¶
Set the signal handler for signal i to be h; return the old signal handler.
This is a thin wrapper around either sigaction() or signal(). Do
not call those functions directly! PyOS_sighandler_t is a typedef
alias for void(*)(int).
These are utility functions that make functionality from the sys module
accessible to C code. They all work with the current interpreter thread’s
sys module’s dict, which is contained in the internal thread state structure.
Set sys.path to a list object of paths found in path which should
be a list of paths separated with the platform’s search path delimiter
(: on Unix, ; on Windows).
Write the output string described by format to sys.stdout. No
exceptions are raised, even if truncation occurs (see below).
format should limit the total size of the formatted output string to
1000 bytes or less – after 1000 bytes, the output string is truncated.
In particular, this means that no unrestricted “%s” formats should occur;
these should be limited using “%.<N>s” where <N> is a decimal number
calculated so that <N> plus the maximum size of other formatted text does not
exceed 1000 bytes. Also watch out for “%f”, which can print hundreds of
digits for very large numbers.
If a problem occurs, or sys.stdout is unset, the formatted message
is written to the real (C level) stdout.
Print a fatal error message and kill the process. No cleanup is performed.
This function should only be invoked when a condition is detected that would
make it dangerous to continue using the Python interpreter; e.g., when the
object administration appears to be corrupted. On Unix, the standard C library
function abort() is called which will attempt to produce a core
file.
Register a cleanup function to be called by Py_Finalize(). The cleanup
function will be called with no arguments and should return no value. At most
32 cleanup functions can be registered. When the registration is successful,
Py_AtExit() returns 0; on failure, it returns -1. The cleanup
function registered last is called first. Each cleanup function will be called
at most once. Since Python’s internal finalization will have completed before
the cleanup function, no Python APIs should be called by func.
This is a simplified interface to PyImport_ImportModuleEx() below,
leaving the globals and locals arguments set to NULL and level set
to 0. When the name
argument contains a dot (when it specifies a submodule of a package), the
fromlist argument is set to the list ['*'] so that the return value is the
named module rather than the top-level package containing it as would otherwise
be the case. (Unfortunately, this has an additional side effect when name in
fact specifies a subpackage instead of a submodule: the submodules specified in
the package’s __all__ variable are loaded.) Return a new reference to the
imported module, or NULL with an exception set on failure. A failing
import of a module doesn’t leave the module in sys.modules.
This version of PyImport_ImportModule() does not block. It’s intended
to be used in C functions that import other modules to execute a function.
The import may block if another thread holds the import lock. The function
PyImport_ImportModuleNoBlock() never blocks. It first tries to fetch
the module from sys.modules and falls back to PyImport_ImportModule()
unless the lock is held, in which case the function will raise an
ImportError.
Import a module. This is best described by referring to the built-in Python
function __import__(), as the standard __import__() function calls
this function directly.
The return value is a new reference to the imported module or top-level
package, or NULL with an exception set on failure. Like for
__import__(), the return value when a submodule of a package was
requested is normally the top-level package, unless a non-empty fromlist
was given.
Import a module. This is best described by referring to the built-in Python
function __import__(), as the standard __import__() function calls
this function directly.
The return value is a new reference to the imported module or top-level package,
or NULL with an exception set on failure. Like for __import__(),
the return value when a submodule of a package was requested is normally the
top-level package, unless a non-empty fromlist was given.
This is a higher-level interface that calls the current “import hook
function” (with an explicit level of 0, meaning absolute import). It
invokes the __import__() function from the __builtins__ of the
current globals. This means that the import is done using whatever import
hooks are installed in the current environment.
Return the module object corresponding to a module name. The name argument
may be of the form package.module. First check the modules dictionary if
there’s one there, and if not, create a new one and insert it in the modules
dictionary. Return NULL with an exception set on failure.
Note
This function does not load or import the module; if the module wasn’t already
loaded, you will get an empty module object. Use PyImport_ImportModule()
or one of its variants to import a module. Package structures implied by a
dotted name for name are not created if not already present.
Given a module name (possibly of the form package.module) and a code object
read from a Python bytecode file or obtained from the built-in function
compile(), load the module. Return a new reference to the module object,
or NULL with an exception set if an error occurred. name
is removed from sys.modules in error cases, even if name was already
in sys.modules on entry to PyImport_ExecCodeModule(). Leaving
incompletely initialized modules in sys.modules is dangerous, as imports of
such modules have no way to know that the module object is an unknown (and
probably damaged with respect to the module author’s intents) state.
The module’s __file__ attribute will be set to the code object’s
co_filename.
This function will reload the module if it was already imported. See
PyImport_ReloadModule() for the intended way to reload a module.
If name points to a dotted name of the form package.module, any package
structures not already created will still not be created.
Like PyImport_ExecCodeModuleEx(), but the __cached__
attribute of the module object is set to cpathname if it is
non-NULL. Of the three functions, this is the preferred one to use.
Return the magic number for Python bytecode files (a.k.a. .pyc and
.pyo files). The magic number should be present in the first four bytes
of the bytecode file, in little-endian byte order.
Return an importer object for a sys.path/pkg.__path__ item
path, possibly by fetching it from the sys.path_importer_cache
dict. If it wasn’t yet cached, traverse sys.path_hooks until a hook
is found that can handle the path item. Return None if no hook could;
this tells our caller it should fall back to the built-in import mechanism.
Cache the result in sys.path_importer_cache. Return a new reference
to the importer object.
Load a frozen module named name. Return 1 for success, 0 if the
module is not found, and -1 with an exception set if the initialization
failed. To access the imported module on a successful load, use
PyImport_ImportModule(). (Note the misnomer — this function would
reload the module if it was already imported.)
This is the structure type definition for frozen module descriptors, as
generated by the freeze utility (see Tools/freeze/ in the
Python source distribution). Its definition, found in Include/import.h,
is:
This pointer is initialized to point to an array of struct_frozen
records, terminated by one whose members are all NULL or zero. When a frozen
module is imported, it is searched in this table. Third-party code could play
tricks with this to provide a dynamically created collection of frozen modules.
int PyImport_AppendInittab(const char *name, PyObject* (*initfunc)(void))¶
Add a single module to the existing table of built-in modules. This is a
convenience wrapper around PyImport_ExtendInittab(), returning -1 if
the table could not be extended. The new module can be imported by the name
name, and uses the function initfunc as the initialization function called
on the first attempted import. This should be called before
Py_Initialize().
Structure describing a single entry in the list of built-in modules. Each of
these structures gives the name and initialization function for a module built
into the interpreter. Programs which embed Python may use an array of these
structures in conjunction with PyImport_ExtendInittab() to provide
additional built-in modules. The structure is defined in
Include/import.h as:
int PyImport_ExtendInittab(struct _inittab *newtab)¶
Add a collection of modules to the table of built-in modules. The newtab
array must end with a sentinel entry which contains NULL for the name
field; failure to provide the sentinel value can result in a memory fault.
Returns 0 on success or -1 if insufficient memory could be allocated to
extend the internal table. In the event of failure, no modules are added to the
internal table. This should be called before Py_Initialize().
These routines allow C code to work with serialized objects using the same
data format as the marshal module. There are functions to write data
into the serialization format, and additional functions that can be used to
read the data back. Files used to store marshalled data must be opened in
binary mode.
Numeric values are stored with the least significant byte first.
The module supports two versions of the data format: version 0 is the
historical version, version 1 shares interned strings in the file, and upon
unmarshalling. Version 2 uses a binary format for floating point numbers.
Py_MARSHAL_VERSION indicates the current file format (currently 2).
void PyMarshal_WriteLongToFile(long value, FILE *file, int version)¶
Marshal a long integer, value, to file. This will only write
the least-significant 32 bits of value; regardless of the size of the
native long type. version indicates the file format.
void PyMarshal_WriteObjectToFile(PyObject *value, FILE *file, int version)¶
Marshal a Python object, value, to file.
version indicates the file format.
PyObject* PyMarshal_WriteObjectToString(PyObject *value, int version)¶
Return a string object containing the marshalled representation of value.
version indicates the file format.
The following functions allow marshalled values to be read back in.
XXX What about error detection? It appears that reading past the end of the
file will always result in a negative numeric value (where that’s relevant),
but it’s not clear that negative values won’t be handled properly when there’s
no error. What’s the right way to tell? Should only non-negative values be
written using these routines?
Return a C long from the data stream in a FILE* opened
for reading. Only a 32-bit value can be read in using this function,
regardless of the native size of long.
Return a C short from the data stream in a FILE* opened
for reading. Only a 16-bit value can be read in using this function,
regardless of the native size of short.
Return a Python object from the data stream in a FILE* opened for
reading. On error, sets the appropriate exception (EOFError or
TypeError) and returns NULL.
Return a Python object from the data stream in a FILE* opened for
reading. Unlike PyMarshal_ReadObjectFromFile(), this function
assumes that no further objects will be read from the file, allowing it to
aggressively load file data into memory so that the de-serialization can
operate from data in memory rather than reading a byte at a time from the
file. Only use these variant if you are certain that you won’t be reading
anything else from the file. On error, sets the appropriate exception
(EOFError or TypeError) and returns NULL.
Return a Python object from the data stream in a character buffer
containing len bytes pointed to by string. On error, sets the
appropriate exception (EOFError or TypeError) and returns
NULL.
These functions are useful when creating your own extensions functions and
methods. Additional information and examples are available in
扩展和嵌入 Python 解释器.
The first three of these functions described, PyArg_ParseTuple(),
PyArg_ParseTupleAndKeywords(), and PyArg_Parse(), all use format
strings which are used to tell the function about the expected arguments. The
format strings use the same syntax for each of these functions.
A format string consists of zero or more “format units.” A format unit
describes one Python object; it is usually a single character or a parenthesized
sequence of format units. With a few exceptions, a format unit that is not a
parenthesized sequence normally corresponds to a single address argument to
these functions. In the following description, the quoted form is the format
unit; the entry in (round) parentheses is the Python object type that matches
the format unit; and the entry in [square] brackets is the type of the C
variable(s) whose address should be passed.
These formats allow to access an object as a contiguous chunk of memory.
You don’t have to provide raw storage for the returned unicode or bytes
area. Also, you won’t have to release any memory yourself, except with the
es, es#, et and et# formats.
However, when a Py_buffer structure gets filled, the underlying
buffer is locked so that the caller can subsequently use the buffer even
inside a Py_BEGIN_ALLOW_THREADS block without the risk of mutable data
being resized or destroyed. As a result, you have to callPyBuffer_Release() after you have finished processing the data (or
in any early abort case).
Unless otherwise stated, buffers are not NUL-terminated.
Note
For all # variants of formats (s#, y#, etc.), the type of
the length argument (int or Py_ssize_t) is controlled by
defining the macro PY_SSIZE_T_CLEAN before including
Python.h. If the macro was defined, length is a
Py_ssize_t rather than an int. This behavior will change
in a future Python version to only support Py_ssize_t and
drop int support. It is best to always define PY_SSIZE_T_CLEAN.
Convert a Unicode object to a C pointer to a character string.
A pointer to an existing string is stored in the character pointer
variable whose address you pass. The C string is NUL-terminated.
The Python string must not contain embedded NUL bytes; if it does,
a TypeError exception is raised. Unicode objects are converted
to C strings using 'utf-8' encoding. If this conversion fails, a
UnicodeError is raised.
Note
This format does not accept bytes-like objects. If you want to accept
filesystem paths and convert them to C character strings, it is
preferable to use the O& format with PyUnicode_FSConverter()
as converter.
This format accepts Unicode objects as well as objects supporting the
buffer protocol.
It fills a Py_buffer structure provided by the caller.
In this case the resulting C string may contain embedded NUL bytes.
Unicode objects are converted to C strings using 'utf-8' encoding.
s# (str, bytes or read-only buffer compatible object) [const char *, int or Py_ssize_t]
Like s*, except that it doesn’t accept mutable buffer-like objects
such as bytearray. The result is stored into two C variables,
the first one a pointer to a C string, the second one its length.
The string may contain embedded null bytes. Unicode objects are converted
to C strings using 'utf-8' encoding.
This format converts a bytes-like object to a C pointer to a character
string; it does not accept Unicode objects. The bytes buffer must not
contain embedded NUL bytes; if it does, a TypeError
exception is raised.
y* (bytes, bytearray or buffer compatible object) [Py_buffer]
This variant on s* doesn’t accept Unicode objects, only objects
supporting the buffer protocol. This is the recommended way to accept
binary data.
Requires that the Python object is a bytes object, without
attempting any conversion. Raises TypeError if the object is not
a bytes object. The C variable may also be declared as PyObject*.
Requires that the Python object is a bytearray object, without
attempting any conversion. Raises TypeError if the object is not
a bytearray object. The C variable may also be declared as PyObject*.
Convert a Python Unicode object to a C pointer to a NUL-terminated buffer of
Unicode characters. You must pass the address of a Py_UNICODE
pointer variable, which will be filled with the pointer to an existing
Unicode buffer. Please note that the width of a Py_UNICODE
character depends on compilation options (it is either 16 or 32 bits).
The Python string must not contain embedded NUL characters; if it does,
a TypeError exception is raised.
Note
Since u doesn’t give you back the length of the string, and it
may contain embedded NUL characters, it is recommended to use u#
or U instead.
Requires that the Python object is a Unicode object, without attempting
any conversion. Raises TypeError if the object is not a Unicode
object. The C variable may also be declared as PyObject*.
w* (bytearray or read-write byte-oriented buffer) [Py_buffer]
This format accepts any object which implements the read-write buffer
interface. It fills a Py_buffer structure provided by the caller.
The buffer may contain embedded null bytes. The caller have to call
PyBuffer_Release() when it is done with the buffer.
This variant on s is used for encoding Unicode into a character buffer.
It only works for encoded data without embedded NUL bytes.
This format requires two arguments. The first is only used as input, and
must be a constchar* which points to the name of an encoding as a
NUL-terminated string, or NULL, in which case 'utf-8' encoding is used.
An exception is raised if the named encoding is not known to Python. The
second argument must be a char**; the value of the pointer it
references will be set to a buffer with the contents of the argument text.
The text will be encoded in the encoding specified by the first argument.
PyArg_ParseTuple() will allocate a buffer of the needed size, copy the
encoded data into this buffer and adjust *buffer to reference the newly
allocated storage. The caller is responsible for calling PyMem_Free() to
free the allocated buffer after use.
Same as es except that byte string objects are passed through without
recoding them. Instead, the implementation assumes that the byte string object uses
the encoding passed in as parameter.
es# (str) [const char *encoding, char **buffer, int *buffer_length]
This variant on s# is used for encoding Unicode into a character buffer.
Unlike the es format, this variant allows input data which contains NUL
characters.
It requires three arguments. The first is only used as input, and must be a
constchar* which points to the name of an encoding as a
NUL-terminated string, or NULL, in which case 'utf-8' encoding is used.
An exception is raised if the named encoding is not known to Python. The
second argument must be a char**; the value of the pointer it
references will be set to a buffer with the contents of the argument text.
The text will be encoded in the encoding specified by the first argument.
The third argument must be a pointer to an integer; the referenced integer
will be set to the number of bytes in the output buffer.
There are two modes of operation:
If *buffer points a NULL pointer, the function will allocate a buffer of
the needed size, copy the encoded data into this buffer and set *buffer to
reference the newly allocated storage. The caller is responsible for calling
PyMem_Free() to free the allocated buffer after usage.
If *buffer points to a non-NULL pointer (an already allocated buffer),
PyArg_ParseTuple() will use this location as the buffer and interpret the
initial value of *buffer_length as the buffer size. It will then copy the
encoded data into the buffer and NUL-terminate it. If the buffer is not large
enough, a ValueError will be set.
In both cases, *buffer_length is set to the length of the encoded data
without the trailing NUL byte.
et# (str, bytes or bytearray) [const char *encoding, char **buffer, int *buffer_length]
Same as es# except that byte string objects are passed through without recoding
them. Instead, the implementation assumes that the byte string object uses the
encoding passed in as parameter.
Convert a Python integer to a C unsignedlonglong
without overflow checking. This format is only available on platforms that
support unsignedlonglong (or unsigned_int64 on Windows).
Store a Python object (without any conversion) in a C object pointer. The C
program thus receives the actual object that was passed. The object’s reference
count is not increased. The pointer stored is not NULL.
O! (object) [typeobject, PyObject *]
Store a Python object in a C object pointer. This is similar to O, but
takes two C arguments: the first is the address of a Python type object, the
second is the address of the C variable (of type PyObject*) into which
the object pointer is stored. If the Python object does not have the required
type, TypeError is raised.
O& (object) [converter, anything]
Convert a Python object to a C variable through a converter function. This
takes two arguments: the first is a function, the second is the address of a C
variable (of arbitrary type), converted to void*. The converter
function in turn is called as follows:
status=converter(object,address);
where object is the Python object to be converted and address is the
void* argument that was passed to the PyArg_Parse*() function.
The returned status should be 1 for a successful conversion and 0 if
the conversion has failed. When the conversion fails, the converter function
should raise an exception and leave the content of address unmodified.
If the converter returns Py_CLEANUP_SUPPORTED, it may get called a
second time if the argument parsing eventually fails, giving the converter a
chance to release any memory that it had already allocated. In this second
call, the object parameter will be NULL; address will have the same value
as in the original call.
Changed in version 3.1:
Changed in version 3.1: Py_CLEANUP_SUPPORTED was added.
The object must be a Python sequence whose length is the number of format units
in items. The C arguments must correspond to the individual format units in
items. Format units for sequences may be nested.
It is possible to pass “long” integers (integers whose value exceeds the
platform’s LONG_MAX) however no proper range checking is done — the
most significant bits are silently truncated when the receiving field is too
small to receive the value (actually, the semantics are inherited from downcasts
in C — your mileage may vary).
A few other characters have a meaning in a format string. These may not occur
inside nested parentheses. They are:
|
Indicates that the remaining arguments in the Python argument list are optional.
The C variables corresponding to optional arguments should be initialized to
their default value — when an optional argument is not specified,
PyArg_ParseTuple() does not touch the contents of the corresponding C
variable(s).
:
The list of format units ends here; the string after the colon is used as the
function name in error messages (the “associated value” of the exception that
PyArg_ParseTuple() raises).
;
The list of format units ends here; the string after the semicolon is used as
the error message instead of the default error message. : and ;
mutually exclude each other.
Note that any Python object references which are provided to the caller are
borrowed references; do not decrement their reference count!
Additional arguments passed to these functions must be addresses of variables
whose type is determined by the format string; these are used to store values
from the input tuple. There are a few cases, as described in the list of format
units above, where these parameters are used as input values; they should match
what is specified for the corresponding format unit in that case.
For the conversion to succeed, the arg object must match the format
and the format must be exhausted. On success, the
PyArg_Parse*() functions return true, otherwise they return
false and raise an appropriate exception. When the
PyArg_Parse*() functions fail due to conversion failure in one
of the format units, the variables at the addresses corresponding to that
and the following format units are left untouched.
int PyArg_ParseTuple(PyObject *args, const char *format, ...)¶
Parse the parameters of a function that takes only positional parameters into
local variables. Returns true on success; on failure, it returns false and
raises the appropriate exception.
int PyArg_VaParse(PyObject *args, const char *format, va_list vargs)¶
Identical to PyArg_ParseTuple(), except that it accepts a va_list rather
than a variable number of arguments.
Parse the parameters of a function that takes both positional and keyword
parameters into local variables. Returns true on success; on failure, it
returns false and raises the appropriate exception.
Ensure that the keys in the keywords argument dictionary are strings. This
is only needed if PyArg_ParseTupleAndKeywords() is not used, since the
latter already does this check.
New in version 3.2:
New in version 3.2.
int PyArg_Parse(PyObject *args, const char *format, ...)¶
Function used to deconstruct the argument lists of “old-style” functions —
these are functions which use the METH_OLDARGS parameter parsing
method. This is not recommended for use in parameter parsing in new code, and
most code in the standard interpreter has been modified to no longer use this
for that purpose. It does remain a convenient way to decompose other tuples,
however, and may continue to be used for that purpose.
A simpler form of parameter retrieval which does not use a format string to
specify the types of the arguments. Functions which use this method to retrieve
their parameters should be declared as METH_VARARGS in function or
method tables. The tuple containing the actual parameters should be passed as
args; it must actually be a tuple. The length of the tuple must be at least
min and no more than max; min and max may be equal. Additional
arguments must be passed to the function, each of which should be a pointer to a
PyObject* variable; these will be filled in with the values from
args; they will contain borrowed references. The variables which correspond
to optional parameters not given by args will not be filled in; these should
be initialized by the caller. This function returns true on success and false if
args is not a tuple or contains the wrong number of elements; an exception
will be set if there was a failure.
This is an example of the use of this function, taken from the sources for the
_weakref helper module for weak references:
Create a new value based on a format string similar to those accepted by the
PyArg_Parse*() family of functions and a sequence of values. Returns
the value or NULL in the case of an error; an exception will be raised if
NULL is returned.
Py_BuildValue() does not always build a tuple. It builds a tuple only if
its format string contains two or more format units. If the format string is
empty, it returns None; if it contains exactly one format unit, it returns
whatever object is described by that format unit. To force it to return a tuple
of size 0 or one, parenthesize the format string.
When memory buffers are passed as parameters to supply data to build objects, as
for the s and s# formats, the required data is copied. Buffers provided
by the caller are never referenced by the objects created by
Py_BuildValue(). In other words, if your code invokes malloc()
and passes the allocated memory to Py_BuildValue(), your code is
responsible for calling free() for that memory once
Py_BuildValue() returns.
In the following description, the quoted form is the format unit; the entry in
(round) parentheses is the Python object type that the format unit will return;
and the entry in [square] brackets is the type of the C value(s) to be passed.
The characters space, tab, colon and comma are ignored in format strings (but
not within format units such as s#). This can be used to make long format
strings a tad more readable.
Convert a C string and its length to a Python str object using 'utf-8'
encoding. If the C string pointer is NULL, the length is ignored and
None is returned.
Convert a Unicode (UCS-2 or UCS-4) data buffer and its length to a Python
Unicode object. If the Unicode buffer pointer is NULL, the length is ignored
and None is returned.
Convert a C Py_complex structure to a Python complex number.
O (object) [PyObject *]
Pass a Python object untouched (except for its reference count, which is
incremented by one). If the object passed in is a NULL pointer, it is assumed
that this was caused because the call producing the argument found an error and
set an exception. Therefore, Py_BuildValue() will return NULL but won’t
raise an exception. If no exception has been raised yet, SystemError is
set.
S (object) [PyObject *]
Same as O.
N (object) [PyObject *]
Same as O, except it doesn’t increment the reference count on the object.
Useful when the object is created by a call to an object constructor in the
argument list.
O& (object) [converter, anything]
Convert anything to a Python object through a converter function. The
function is called with anything (which should be compatible with void*) as its argument and should return a “new” Python object, or NULL if an
error occurred.
Convert a sequence of C values to a Python dictionary. Each pair of consecutive
C values adds one item to the dictionary, serving as key and value,
respectively.
If there is an error in the format string, the SystemError exception is
set and NULL returned.
Functions for number conversion and formatted string output.
int PyOS_snprintf(char *str, size_t size, const char *format, ...)¶
Output not more than size bytes to str according to the format string
format and the extra arguments. See the Unix man page snprintf(2).
int PyOS_vsnprintf(char *str, size_t size, const char *format, va_list va)¶
Output not more than size bytes to str according to the format string
format and the variable argument list va. Unix man page
vsnprintf(2).
PyOS_snprintf() and PyOS_vsnprintf() wrap the Standard C library
functions snprintf() and vsnprintf(). Their purpose is to
guarantee consistent behavior in corner cases, which the Standard C functions do
not.
The wrappers ensure that str*[*size-1] is always '\0' upon return. They
never write more than size bytes (including the trailing '\0') into str.
Both functions require that str!=NULL, size>0 and format!=NULL.
If the platform doesn’t have vsnprintf() and the buffer size needed to
avoid truncation exceeds size by more than 512 bytes, Python aborts with a
Py_FatalError.
The return value (rv) for these functions should be interpreted as follows:
When 0<=rv<size, the output conversion was successful and rv
characters were written to str (excluding the trailing '\0' byte at
str*[*rv]).
When rv>=size, the output conversion was truncated and a buffer with
rv+1 bytes would have been needed to succeed. str*[*size-1] is '\0'
in this case.
When rv<0, “something bad happened.” str*[*size-1] is '\0' in
this case too, but the rest of str is undefined. The exact cause of the error
depends on the underlying platform.
The following functions provide locale-independent string to number conversions.
Convert a string s to a double, raising a Python
exception on failure. The set of accepted strings corresponds to
the set of strings accepted by Python’s float() constructor,
except that s must not have leading or trailing whitespace.
The conversion is independent of the current locale.
If endptr is NULL, convert the whole string. Raise
ValueError and return -1.0 if the string is not a valid
representation of a floating-point number.
If endptr is not NULL, convert as much of the string as
possible and set *endptr to point to the first unconverted
character. If no initial segment of the string is the valid
representation of a floating-point number, set *endptr to point
to the beginning of the string, raise ValueError, and return
-1.0.
If s represents a value that is too large to store in a float
(for example, "1e500" is such a string on many platforms) then
if overflow_exception is NULL return Py_HUGE_VAL (with
an appropriate sign) and don’t set any exception. Otherwise,
overflow_exception must point to a Python exception object;
raise that exception and return -1.0. In both cases, set
*endptr to point to the first character after the converted value.
If any other error occurs during the conversion (for example an
out-of-memory error), set the appropriate Python exception and
return -1.0.
New in version 3.1:
New in version 3.1.
char* PyOS_double_to_string(double val, char format_code, int precision, int flags, int *ptype)¶
Convert a doubleval to a string using supplied
format_code, precision, and flags.
format_code must be one of 'e', 'E', 'f', 'F',
'g', 'G' or 'r'. For 'r', the supplied precision
must be 0 and is ignored. The 'r' format code specifies the
standard repr() format.
flags can be zero or more of the values Py_DTSF_SIGN,
Py_DTSF_ADD_DOT_0, or Py_DTSF_ALT, or-ed together:
Py_DTSF_SIGN means to always precede the returned string with a sign
character, even if val is non-negative.
Py_DTSF_ADD_DOT_0 means to ensure that the returned string will not look
like an integer.
Py_DTSF_ALT means to apply “alternate” formatting rules. See the
documentation for the PyOS_snprintf()'#' specifier for
details.
If ptype is non-NULL, then the value it points to will be set to one of
Py_DTST_FINITE, Py_DTST_INFINITE, or Py_DTST_NAN, signifying that
val is a finite number, an infinite number, or not a number, respectively.
The return value is a pointer to buffer with the converted string or
NULL if the conversion failed. The caller is responsible for freeing the
returned string by calling PyMem_Free().
Return a description string, depending on the type of func.
Return values include “()” for functions and methods, ” constructor”,
” instance”, and ” object”. Concatenated with the result of
PyEval_GetFuncName(), the result will be a description of
func.
object is passed through the encoder function found for the given
encoding using the error handling method defined by errors. errors may
be NULL to use the default method defined for the codec. Raises a
LookupError if no encoder can be found.
object is passed through the decoder function found for the given
encoding using the error handling method defined by errors. errors may
be NULL to use the default method defined for the codec. Raises a
LookupError if no encoder can be found.
In the following functions, the encoding string is looked up converted to all
lower-case characters, which makes encodings looked up through this mechanism
effectively case-insensitive. If no codec is found, a KeyError is set
and NULL returned.
int PyCodec_RegisterError(const char *name, PyObject *error)¶
Register the error handling callback function error under the given name.
This callback function will be called by a codec when it encounters
unencodable characters/undecodable bytes and name is specified as the error
parameter in the call to the encode/decode function.
The callback gets a single argument, an instance of
UnicodeEncodeError, UnicodeDecodeError or
UnicodeTranslateError that holds information about the problematic
sequence of characters or bytes and their offset in the original string (see
Unicode Exception Objects for functions to extract this information). The
callback must either raise the given exception, or return a two-item tuple
containing the replacement for the problematic sequence, and an integer
giving the offset in the original string at which encoding/decoding should be
resumed.
Lookup the error handling callback function registered under name. As a
special case NULL can be passed, in which case the error handling callback
for “strict” will be returned.
The functions in this chapter interact with Python objects regardless of their
type, or with wide classes of object types (e.g. all numerical types, or all
sequence types). When used on object types for which they do not apply, they
will raise a Python exception.
It is not possible to use these functions on objects that are not properly
initialized, such as a list object that has been created by PyList_New(),
but whose items have not been set to some non-NULL value yet.
int PyObject_Print(PyObject *o, FILE *fp, int flags)¶
Print an object o, on file fp. Returns -1 on error. The flags argument
is used to enable certain printing options. The only option currently supported
is Py_PRINT_RAW; if given, the str() of the object is written
instead of the repr().
Returns 1 if o has the attribute attr_name, and 0 otherwise. This
is equivalent to the Python expression hasattr(o,attr_name). This function
always succeeds.
int PyObject_HasAttrString(PyObject *o, const char *attr_name)¶
Returns 1 if o has the attribute attr_name, and 0 otherwise. This
is equivalent to the Python expression hasattr(o,attr_name). This function
always succeeds.
Retrieve an attribute named attr_name from object o. Returns the attribute
value on success, or NULL on failure. This is the equivalent of the Python
expression o.attr_name.
Retrieve an attribute named attr_name from object o. Returns the attribute
value on success, or NULL on failure. This is the equivalent of the Python
expression o.attr_name.
Generic attribute getter function that is meant to be put into a type
object’s tp_getattro slot. It looks for a descriptor in the dictionary
of classes in the object’s MRO as well as an attribute in the object’s
__dict__ (if present). As outlined in 实现描述符, data
descriptors take preference over instance attributes, while non-data
descriptors don’t. Otherwise, an AttributeError is raised.
Set the value of the attribute named attr_name, for object o, to the value
v. Returns -1 on failure. This is the equivalent of the Python statement
o.attr_name=v.
int PyObject_SetAttrString(PyObject *o, const char *attr_name, PyObject *v)¶
Set the value of the attribute named attr_name, for object o, to the value
v. Returns -1 on failure. This is the equivalent of the Python statement
o.attr_name=v.
Generic attribute setter function that is meant to be put into a type
object’s tp_setattro slot. It looks for a data descriptor in the
dictionary of classes in the object’s MRO, and if found it takes preference
over setting the attribute in the instance dictionary. Otherwise, the
attribute is set in the object’s __dict__ (if present). Otherwise,
an AttributeError is raised and -1 is returned.
Compare the values of o1 and o2 using the operation specified by opid,
which must be one of Py_LT, Py_LE, Py_EQ,
Py_NE, Py_GT, or Py_GE, corresponding to <,
<=, ==, !=, >, or >= respectively. This is the equivalent of
the Python expression o1opo2, where op is the operator corresponding
to opid. Returns the value of the comparison on success, or NULL on failure.
Compare the values of o1 and o2 using the operation specified by opid,
which must be one of Py_LT, Py_LE, Py_EQ,
Py_NE, Py_GT, or Py_GE, corresponding to <,
<=, ==, !=, >, or >= respectively. Returns -1 on error,
0 if the result is false, 1 otherwise. This is the equivalent of the
Python expression o1opo2, where op is the operator corresponding to
opid.
Note
If o1 and o2 are the same object, PyObject_RichCompareBool()
will always return 1 for Py_EQ and 0 for Py_NE.
Compute a string representation of object o. Returns the string
representation on success, NULL on failure. This is the equivalent of the
Python expression repr(o). Called by the repr() built-in function.
As PyObject_Repr(), compute a string representation of object o, but
escape the non-ASCII characters in the string returned by
PyObject_Repr() with \x, \u or \U escapes. This generates
a string similar to that returned by PyObject_Repr() in Python 2.
Called by the ascii() built-in function.
Compute a string representation of object o. Returns the string
representation on success, NULL on failure. This is the equivalent of the
Python expression str(o). Called by the str() built-in function
and, therefore, by the print() function.
Compute a bytes representation of object o. NULL is returned on
failure and a bytes object on success. This is equivalent to the Python
expression bytes(o), when o is not an integer. Unlike bytes(o),
a TypeError is raised when o is an integer instead of a zero-initialized
bytes object.
Returns 1 if inst is an instance of the class cls or a subclass of
cls, or 0 if not. On error, returns -1 and sets an exception. If
cls is a type object rather than a class object, PyObject_IsInstance()
returns 1 if inst is of type cls. If cls is a tuple, the check will
be done against every entry in cls. The result will be 1 when at least one
of the checks returns 1, otherwise it will be 0. If inst is not a
class instance and cls is neither a type object, nor a class object, nor a
tuple, inst must have a __class__ attribute — the class relationship
of the value of that attribute with cls will be used to determine the result
of this function.
Subclass determination is done in a fairly straightforward way, but includes a
wrinkle that implementors of extensions to the class system may want to be aware
of. If A and B are class objects, B is a subclass of
A if it inherits from A either directly or indirectly. If
either is not a class object, a more general mechanism is used to determine the
class relationship of the two objects. When testing if B is a subclass of
A, if A is B, PyObject_IsSubclass() returns true. If A and B
are different objects, B‘s __bases__ attribute is searched in a
depth-first fashion for A — the presence of the __bases__ attribute
is considered sufficient for this determination.
Returns 1 if the class derived is identical to or derived from the class
cls, otherwise returns 0. In case of an error, returns -1. If cls
is a tuple, the check will be done against every entry in cls. The result will
be 1 when at least one of the checks returns 1, otherwise it will be
0. If either derived or cls is not an actual class object (or tuple),
this function uses the generic algorithm described above.
Call a callable Python object callable_object, with arguments given by the
tuple args, and named arguments given by the dictionary kw. If no named
arguments are needed, kw may be NULL. args must not be NULL, use an
empty tuple if no arguments are needed. Returns the result of the call on
success, or NULL on failure. This is the equivalent of the Python expression
callable_object(*args,**kw).
Call a callable Python object callable_object, with arguments given by the
tuple args. If no arguments are needed, then args may be NULL. Returns
the result of the call on success, or NULL on failure. This is the equivalent
of the Python expression callable_object(*args).
Call a callable Python object callable, with a variable number of C arguments.
The C arguments are described using a Py_BuildValue() style format
string. The format may be NULL, indicating that no arguments are provided.
Returns the result of the call on success, or NULL on failure. This is the
equivalent of the Python expression callable(*args). Note that if you only
pass PyObject* args, PyObject_CallFunctionObjArgs() is a
faster alternative.
Call the method named method of object o with a variable number of C
arguments. The C arguments are described by a Py_BuildValue() format
string that should produce a tuple. The format may be NULL, indicating that
no arguments are provided. Returns the result of the call on success, or NULL
on failure. This is the equivalent of the Python expression o.method(args).
Note that if you only pass PyObject* args,
PyObject_CallMethodObjArgs() is a faster alternative.
Call a callable Python object callable, with a variable number of
PyObject* arguments. The arguments are provided as a variable number
of parameters followed by NULL. Returns the result of the call on success, or
NULL on failure.
Calls a method of the object o, where the name of the method is given as a
Python string object in name. It is called with a variable number of
PyObject* arguments. The arguments are provided as a variable number
of parameters followed by NULL. Returns the result of the call on success, or
NULL on failure.
Set a TypeError indicating that type(o) is not hashable and return -1.
This function receives special treatment when stored in a tp_hash slot,
allowing a type to explicitly indicate to the interpreter that it is not
hashable.
When o is non-NULL, returns a type object corresponding to the object type
of object o. On failure, raises SystemError and returns NULL. This
is equivalent to the Python expression type(o). This function increments the
reference count of the return value. There’s really no reason to use this
function instead of the common expression o->ob_type, which returns a
pointer of type PyTypeObject*, except when the incremented reference
count is needed.
Return the length of object o. If the object o provides either the sequence
and mapping protocols, the sequence length is returned. On error, -1 is
returned. This is the equivalent to the Python expression len(o).
This is equivalent to the Python expression dir(o), returning a (possibly
empty) list of strings appropriate for the object argument, or NULL if there
was an error. If the argument is NULL, this is like the Python dir(),
returning the names of the current locals; in this case, if no execution frame
is active then NULL is returned but PyErr_Occurred() will return false.
This is equivalent to the Python expression iter(o). It returns a new
iterator for the object argument, or the object itself if the object is already
an iterator. Raises TypeError and returns NULL if the object cannot be
iterated.
Return a reasonable approximation for the mathematical value of o1 divided by
o2, or NULL on failure. The return value is “approximate” because binary
floating point numbers are approximate; it is not possible to represent all real
numbers in base two. This function can return a floating point value when
passed two integers.
See the built-in function pow(). Returns NULL on failure. This is the
equivalent of the Python expression pow(o1,o2,o3), where o3 is optional.
If o3 is to be ignored, pass Py_None in its place (passing NULL for
o3 would cause an illegal memory access).
Returns the result of adding o1 and o2, or NULL on failure. The operation
is done in-place when o1 supports it. This is the equivalent of the Python
statement o1+=o2.
Returns the result of subtracting o2 from o1, or NULL on failure. The
operation is done in-place when o1 supports it. This is the equivalent of
the Python statement o1-=o2.
Returns the result of multiplying o1 and o2, or NULL on failure. The
operation is done in-place when o1 supports it. This is the equivalent of
the Python statement o1*=o2.
Returns the mathematical floor of dividing o1 by o2, or NULL on failure.
The operation is done in-place when o1 supports it. This is the equivalent
of the Python statement o1//=o2.
Return a reasonable approximation for the mathematical value of o1 divided by
o2, or NULL on failure. The return value is “approximate” because binary
floating point numbers are approximate; it is not possible to represent all real
numbers in base two. This function can return a floating point value when
passed two integers. The operation is done in-place when o1 supports it.
Returns the remainder of dividing o1 by o2, or NULL on failure. The
operation is done in-place when o1 supports it. This is the equivalent of
the Python statement o1%=o2.
See the built-in function pow(). Returns NULL on failure. The operation
is done in-place when o1 supports it. This is the equivalent of the Python
statement o1**=o2 when o3 is Py_None, or an in-place variant of
pow(o1,o2,o3) otherwise. If o3 is to be ignored, pass Py_None
in its place (passing NULL for o3 would cause an illegal memory access).
Returns the result of left shifting o1 by o2 on success, or NULL on
failure. The operation is done in-place when o1 supports it. This is the
equivalent of the Python statement o1<<=o2.
Returns the result of right shifting o1 by o2 on success, or NULL on
failure. The operation is done in-place when o1 supports it. This is the
equivalent of the Python statement o1>>=o2.
Returns the “bitwise and” of o1 and o2 on success and NULL on failure. The
operation is done in-place when o1 supports it. This is the equivalent of
the Python statement o1&=o2.
Returns the “bitwise exclusive or” of o1 by o2 on success, or NULL on
failure. The operation is done in-place when o1 supports it. This is the
equivalent of the Python statement o1^=o2.
Returns the “bitwise or” of o1 and o2 on success, or NULL on failure. The
operation is done in-place when o1 supports it. This is the equivalent of
the Python statement o1|=o2.
Returns the integer n converted to base as a string with a base
marker of '0b', '0o', or '0x' if applicable. When
base is not 2, 8, 10, or 16, the format is 'x#num' where x is the
base. If n is not an int object, it is converted with
PyNumber_Index() first.
Returns o converted to a Py_ssize_t value if o can be interpreted as an
integer. If the call fails, an exception is raised and -1 is returned.
If o can be converted to a Python int but the attempt to
convert to a Py_ssize_t value would raise an OverflowError, then the
exc argument is the type of exception that will be raised (usually
IndexError or OverflowError). If exc is NULL, then the
exception is cleared and the value is clipped to PY_SSIZE_T_MIN for a negative
integer or PY_SSIZE_T_MAX for a positive integer.
Returns the number of objects in sequence o on success, and -1 on failure.
For objects that do not provide sequence protocol, this is equivalent to the
Python expression len(o).
Return the concatenation of o1 and o2 on success, and NULL on failure.
The operation is done in-place when o1 supports it. This is the equivalent
of the Python expression o1+=o2.
Return the result of repeating sequence object ocount times, or NULL on
failure. The operation is done in-place when o supports it. This is the
equivalent of the Python expression o*=count.
Assign object v to the ith element of o. Returns -1 on failure. This
is the equivalent of the Python statement o[i]=v. This function does
not steal a reference to v.
int PySequence_DelItem(PyObject *o, Py_ssize_t i)¶
Delete the ith element of object o. Returns -1 on failure. This is the
equivalent of the Python statement delo[i].
int PySequence_SetSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2, PyObject *v)¶
Assign the sequence object v to the slice in sequence object o from i1 to
i2. This is the equivalent of the Python statement o[i1:i2]=v.
int PySequence_DelSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2)¶
Delete the slice in sequence object o from i1 to i2. Returns -1 on
failure. This is the equivalent of the Python statement delo[i1:i2].
Return the number of occurrences of value in o, that is, return the number
of keys for which o[key]==value. On failure, return -1. This is
equivalent to the Python expression o.count(value).
Determine if o contains value. If an item in o is equal to value,
return 1, otherwise return 0. On error, return -1. This is
equivalent to the Python expression valueino.
Return a tuple object with the same contents as the arbitrary sequence o or
NULL on failure. If o is a tuple, a new reference will be returned,
otherwise a tuple will be constructed with the appropriate contents. This is
equivalent to the Python expression tuple(o).
Returns the sequence o as a tuple, unless it is already a tuple or list, in
which case o is returned. Use PySequence_Fast_GET_ITEM() to access the
members of the result. Returns NULL on failure. If the object is not a
sequence, raises TypeError with m as the message text.
Return the underlying array of PyObject pointers. Assumes that o was returned
by PySequence_Fast() and o is not NULL.
Note, if a list gets resized, the reallocation may relocate the items array.
So, only use the underlying array pointer in contexts where the sequence
cannot change.
Return the ith element of o or NULL on failure. Macro form of
PySequence_GetItem() but without checking that
PySequence_Check() on o is true and without adjustment for negative
indices.
Returns the length of o, assuming that o was returned by
PySequence_Fast() and that o is not NULL. The size can also be
gotten by calling PySequence_Size() on o, but
PySequence_Fast_GET_SIZE() is faster because it can assume o is a list
or tuple.
Returns the number of keys in object o on success, and -1 on failure. For
objects that do not provide mapping protocol, this is equivalent to the Python
expression len(o).
int PyMapping_DelItemString(PyObject *o, char *key)¶
Remove the mapping for object key from the object o. Return -1 on
failure. This is equivalent to the Python statement delo[key].
Remove the mapping for object key from the object o. Return -1 on
failure. This is equivalent to the Python statement delo[key].
int PyMapping_HasKeyString(PyObject *o, char *key)¶
On success, return 1 if the mapping object has the key key and 0
otherwise. This is equivalent to the Python expression keyino.
This function always succeeds.
On success, return a list of the items in object o, where each item is a tuple
containing a key-value pair. On failure, return NULL. This is equivalent to
the Python expression list(o.items()).
Return the next value from the iteration o. If the object is an iterator,
this retrieves the next value from the iteration, and returns NULL with no
exception set if there are no remaining items. If the object is not an
iterator, TypeError is raised, or if there is an error in retrieving the
item, returns NULL and passes along the exception.
To write a loop which iterates over an iterator, the C code should look
something like this:
PyObject*iterator=PyObject_GetIter(obj);PyObject*item;if(iterator==NULL){/* propagate error */}while(item=PyIter_Next(iterator)){/* do something with item */.../* release reference when done */Py_DECREF(item);}Py_DECREF(iterator);if(PyErr_Occurred()){/* propagate error */}else{/* continue doing useful work */}
Certain objects available in Python wrap access to an underlying memory
array or buffer. Such objects include the built-in bytes and
bytearray, and some extension types like array.array.
Third-party libraries may define their own types for special purposes, such
as image processing or numeric analysis.
While each of these types have their own semantics, they share the common
characteristic of being backed by a possibly large memory buffer. It is
then desireable, in some situations, to access that buffer directly and
without intermediate copying.
Python provides such a facility at the C level in the form of the buffer
protocol. This protocol has two sides:
on the producer side, a type can export a “buffer interface” which allows
objects of that type to expose information about their underlying buffer.
This interface is described in the section Buffer Object Structures;
on the consumer side, several means are available to obtain a pointer to
the raw underlying data of an object (for example a method parameter).
Simple objects such as bytes and bytearray expose their
underlying buffer in byte-oriented form. Other forms are possible; for example,
the elements exposed by a array.array can be multi-byte values.
An example consumer of the buffer interface is the write()
method of file objects: any object that can export a series of bytes through
the buffer interface can be written to a file. While write() only
needs read-only access to the internal contents of the object passed to it,
other methods such as readinto() need write access
to the contents of their argument. The buffer interface allows objects to
selectively allow or reject exporting of read-write and read-only buffers.
There are two ways for a consumer of the buffer interface to acquire a buffer
over a target object:
In both cases, PyBuffer_Release() must be called when the buffer
isn’t needed anymore. Failure to do so could lead to various issues such as
resource leaks.
Buffer structures (or simply “buffers”) are useful as a way to expose the
binary data from another object to the Python programmer. They can also be
used as a zero-copy slicing mechanism. Using their ability to reference a
block of memory, it is possible to expose any data to the Python programmer
quite easily. The memory could be a large, constant array in a C extension,
it could be a raw block of memory for manipulation before passing to an
operating system library, or it could be used to pass around structured data
in its native, in-memory format.
Contrary to most data types exposed by the Python interpreter, buffers
are not PyObject pointers but rather simple C structures. This
allows them to be created and copied very simply. When a generic wrapper
around a buffer is needed, a memoryview object
can be created.
A NULL terminated string in struct module style syntax giving
the contents of the elements available through the buffer. If this is
NULL, "B" (unsigned bytes) is assumed.
An array of Py_ssize_ts the length of ndim giving the
shape of the memory as a multi-dimensional array. Note that
((*shape)[0]*...*(*shape)[ndims-1])*itemsize should be equal to
len.
An array of Py_ssize_ts the length of ndim. If these
suboffset numbers are greater than or equal to 0, then the value stored
along the indicated dimension is a pointer and the suboffset value
dictates how many bytes to add to the pointer after de-referencing. A
suboffset value that it negative indicates that no de-referencing should
occur (striding in a contiguous memory block).
Here is a function that returns a pointer to the element in an N-D array
pointed to by an N-dimensional index when there are both non-NULL strides
and suboffsets:
This is a storage for the itemsize (in bytes) of each element of the
shared memory. It is technically un-necessary as it can be obtained
using PyBuffer_SizeFromFormat(), however an exporter may know
this information without parsing the format string and it is necessary
to know the itemsize for proper interpretation of striding. Therefore,
storing it is more convenient and faster.
This is for use internally by the exporting object. For example, this
might be re-cast as an integer by the exporter and used to store flags
about whether or not the shape, strides, and suboffsets arrays must be
freed when the buffer is released. The consumer should never alter this
value.
Export a view over some internal data from the target object obj.
obj must not be NULL, and view must point to an existing
Py_buffer structure allocated by the caller (most uses of
this function will simply declare a local variable of type
Py_buffer). The flags argument is a bit field indicating
what kind of buffer is requested. The buffer interface allows
for complicated memory layout possibilities; however, some callers
won’t want to handle all the complexity and instead request a simple
view of the target object (using PyBUF_SIMPLE for a read-only
view and PyBUF_WRITABLE for a read-write view).
Some exporters may not be able to share memory in every possible way and
may need to raise errors to signal to some consumers that something is
just not possible. These errors should be a BufferError unless
there is another error that is actually causing the problem. The
exporter can use flags information to simplify how much of the
Py_buffer structure is filled in with non-default values and/or
raise an error if the object can’t support a simpler view of its memory.
On success, 0 is returned and the view structure is filled with useful
values. On error, -1 is returned and an exception is raised; the view
is left in an undefined state.
The following are the possible values to the flags arguments.
This is the default flag. The returned buffer exposes a read-only
memory area. The format of data is assumed to be raw unsigned bytes,
without any particular structure. This is a “stand-alone” flag
constant. It never needs to be ‘|’d to the others. The exporter will
raise an error if it cannot provide such a contiguous buffer of bytes.
This implies PyBUF_ND. The returned buffer must provide
strides information (i.e. the strides cannot be NULL). This would be
used when the consumer can handle strided, discontiguous arrays.
Handling strides automatically assumes you can handle shape. The
exporter can raise an error if a strided representation of the data is
not possible (i.e. without the suboffsets).
The returned buffer must provide shape information. The memory will be
assumed C-style contiguous (last dimension varies the fastest). The
exporter may raise an error if it cannot provide this kind of
contiguous buffer. If this is not given then shape will be NULL.
These flags indicate that the contiguity returned buffer must be
respectively, C-contiguous (last dimension varies the fastest), Fortran
contiguous (first dimension varies the fastest) or either one. All of
these flags imply PyBUF_STRIDES and guarantee that the
strides buffer info structure will be filled in correctly.
This flag indicates the returned buffer must have suboffsets
information (which can be NULL if no suboffsets are needed). This can
be used when the consumer can handle indirect array referencing implied
by these suboffsets. This implies PyBUF_STRIDES.
The returned buffer must have true format information if this flag is
provided. This would be used when the consumer is going to be checking
for what ‘kind’ of data is actually stored. An exporter should always
be able to provide this information if requested. If format is not
explicitly requested then the format must be returned as NULL (which
means 'B', or unsigned bytes).
Return the implied itemsize from the struct-stype
format.
int PyBuffer_IsContiguous(Py_buffer *view, char fortran)¶
Return 1 if the memory defined by the view is C-style (fortran is
'C') or Fortran-style (fortran is 'F') contiguous or either one
(fortran is 'A'). Return 0 otherwise.
Fill the strides array with byte-strides of a contiguous (C-style if
fortran is 'C' or Fortran-style if fortran is 'F') array of the
given shape with the given number of bytes per element.
int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len, int readonly, int infoflags)¶
Fill in a buffer-info structure, view, correctly for an exporter that can
only share a contiguous chunk of memory of “unsigned bytes” of the given
length. Return 0 on success and -1 (with raising an error) on error.
These functions were part of the “old buffer protocol” API in Python 2.
In Python 3, this protocol doesn’t exist anymore but the functions are still
exposed to ease porting 2.x code. They act as a compatibility wrapper
around the new buffer protocol, but they don’t give
you control over the lifetime of the resources acquired when a buffer is
exported.
int PyObject_AsCharBuffer(PyObject *obj, const char **buffer, Py_ssize_t *buffer_len)¶
Returns a pointer to a read-only memory location usable as character-based
input. The obj argument must support the single-segment character buffer
interface. On success, returns 0, sets buffer to the memory location
and buffer_len to the buffer length. Returns -1 and sets a
TypeError on error.
int PyObject_AsReadBuffer(PyObject *obj, const void **buffer, Py_ssize_t *buffer_len)¶
Returns a pointer to a read-only memory location containing arbitrary data.
The obj argument must support the single-segment readable buffer
interface. On success, returns 0, sets buffer to the memory location
and buffer_len to the buffer length. Returns -1 and sets a
TypeError on error.
Returns 1 if o supports the single-segment readable buffer interface.
Otherwise returns 0.
int PyObject_AsWriteBuffer(PyObject *obj, void **buffer, Py_ssize_t *buffer_len)¶
Returns a pointer to a writable memory location. The obj argument must
support the single-segment, character buffer interface. On success,
returns 0, sets buffer to the memory location and buffer_len to the
buffer length. Returns -1 and sets a TypeError on error.
The functions in this chapter are specific to certain Python object types.
Passing them an object of the wrong type is not a good idea; if you receive an
object from a Python program and you are not sure that it has the right type,
you must perform a type check first; for example, to check that an object is a
dictionary, use PyDict_Check(). The chapter is structured like the
“family tree” of Python object types.
Warning
While the functions described in this chapter carefully check the type of the
objects which are passed in, many of them do not check for NULL being passed
instead of a valid object. Allowing NULL to be passed in can cause memory
access violations and immediate termination of the interpreter.
Return the tp_flags member of type. This function is primarily
meant for use with Py_LIMITED_API; the individual flag bits are
guaranteed to be stable across Python releases, but access to
tp_flags itself is not part of the limited API.
Invalidate the internal lookup cache for the type and all of its
subtypes. This function must be called after any manual
modification of the attributes or base classes of the type.
Finalize a type object. This should be called on all type objects to finish
their initialization. This function is responsible for adding inherited slots
from a type’s base class. Return 0 on success, or return -1 and sets an
exception on error.
Note that the PyTypeObject for None is not directly exposed in the
Python/C API. Since None is a singleton, testing for object identity (using
== in C) is sufficient. There is no PyNone_Check() function for the
same reason.
The Python None object, denoting lack of value. This object has no methods.
It needs to be treated just like any other object with respect to reference
counts.
Return a new PyLongObject object from v, or NULL on failure.
The current implementation keeps an array of integer objects for all integers
between -5 and 256, when you create an int in that range you actually
just get back a reference to the existing object. So it should be possible to
change the value of 1. I suspect the behaviour of Python in this case is
undefined. :-)
PyObject* PyLong_FromUnsignedLong(unsigned long v)¶
Return a new PyLongObject object from a C unsignedlong, or
NULL on failure.
Return a new PyLongObject object from the integer part of v, or
NULL on failure.
PyObject* PyLong_FromString(char *str, char **pend, int base)¶
Return a new PyLongObject based on the string value in str, which
is interpreted according to the radix in base. If pend is non-NULL,
*pend will point to the first character in str which follows the
representation of the number. If base is 0, the radix will be
determined based on the leading characters of str: if str starts with
'0x' or '0X', radix 16 will be used; if str starts with '0o' or
'0O', radix 8 will be used; if str starts with '0b' or '0B',
radix 2 will be used; otherwise radix 10 will be used. If base is not
0, it must be between 2 and 36, inclusive. Leading spaces are
ignored. If there are no digits, ValueError will be raised.
Convert a sequence of Unicode digits to a Python integer value. The Unicode
string is first encoded to a byte string using PyUnicode_EncodeDecimal()
and then converted using PyLong_FromString().
Return a C long representation of the contents of pylong. If
pylong is greater than LONG_MAX, raise an OverflowError,
and return -1. Convert non-long objects automatically to long first,
and return -1 if that raises exceptions.
long PyLong_AsLongAndOverflow(PyObject *pylong, int *overflow)¶
Return a C long representation of the contents of
pylong. If pylong is greater than LONG_MAX or less
than LONG_MIN, set *overflow to 1 or -1,
respectively, and return -1; otherwise, set *overflow to
0. If any other exception occurs (for example a TypeError or
MemoryError), then -1 will be returned and *overflow will
be 0.
PY_LONG_LONG PyLong_AsLongLongAndOverflow(PyObject *pylong, int *overflow)¶
Return a C longlong representation of the contents of
pylong. If pylong is greater than PY_LLONG_MAX or less
than PY_LLONG_MIN, set *overflow to 1 or -1,
respectively, and return -1; otherwise, set *overflow to
0. If any other exception occurs (for example a TypeError or
MemoryError), then -1 will be returned and *overflow will
be 0.
Return a C Py_ssize_t representation of the contents of pylong.
If pylong is greater than PY_SSIZE_T_MAX, an OverflowError
is raised and -1 will be returned.
unsigned long PyLong_AsUnsignedLong(PyObject *pylong)¶
Return a C unsignedlong representation of the contents of pylong.
If pylong is greater than ULONG_MAX, an OverflowError is
raised.
Return a C unsignedlonglong from a Python integer. If
pylong cannot be represented as an unsignedlonglong,
an OverflowError is raised and (unsignedlonglong)-1 is
returned.
Return a C double representation of the contents of pylong. If
pylong cannot be approximately represented as a double, an
OverflowError exception is raised and -1.0 will be returned.
Convert a Python integer pylong to a C void pointer.
If pylong cannot be converted, an OverflowError will be raised. This
is only assured to produce a usable void pointer for values created
with PyLong_FromVoidPtr().
Booleans in Python are implemented as a subclass of integers. There are only
two booleans, Py_False and Py_True. As such, the normal
creation and deletion functions don’t apply to booleans. The following macros
are available, however.
Return a C double representation of the contents of pyfloat. If
pyfloat is not a Python floating point object but has a __float__()
method, this method will first be called to convert pyfloat into a float.
Return a structseq instance which contains information about the
precision, minimum and maximum values of a float. It’s a thin wrapper
around the header file float.h.
Python’s complex number objects are implemented as two distinct types when
viewed from the C API: one is the Python object exposed to Python programs, and
the other is a C structure which represents the actual complex number value.
The API provides functions for working with both.
Note that the functions which accept these structures as parameters and return
them as results do so by value rather than dereferencing them through
pointers. This is consistent throughout the API.
The C structure which corresponds to the value portion of a Python complex
number object. Most of the functions for dealing with complex number objects
use structures of this type as input or output values, as appropriate. It is
defined as:
Return the Py_complex value of the complex number op.
If op is not a Python complex number object but has a __complex__()
method, this method will first be called to convert op to a Python complex
number object.
Generic operations on sequence objects were discussed in the previous chapter;
this section deals with the specific kinds of sequence objects that are
intrinsic to the Python language.
Return a new bytes object with a copy of the string v as value on success,
and NULL on failure. The parameter v must not be NULL; it will not be
checked.
Return a new bytes object with a copy of the string v as value and length
len on success, and NULL on failure. If v is NULL, the contents of
the bytes object are uninitialized.
Take a C printf()-style format string and a variable number of
arguments, calculate the size of the resulting Python bytes object and return
a bytes object with the values formatted into it. The variable arguments
must be C types and must correspond exactly to the format characters in the
format string. The following format characters are allowed:
Format Characters
Type
Comment
%%
n/a
The literal % character.
%c
int
A single character,
represented as an C int.
%d
int
Exactly equivalent to
printf("%d").
%u
unsigned int
Exactly equivalent to
printf("%u").
%ld
long
Exactly equivalent to
printf("%ld").
%lu
unsigned long
Exactly equivalent to
printf("%lu").
%zd
Py_ssize_t
Exactly equivalent to
printf("%zd").
%zu
size_t
Exactly equivalent to
printf("%zu").
%i
int
Exactly equivalent to
printf("%i").
%x
int
Exactly equivalent to
printf("%x").
%s
char*
A null-terminated C character
array.
%p
void*
The hex representation of a C
pointer. Mostly equivalent to
printf("%p") except that
it is guaranteed to start with
the literal 0x regardless
of what the platform’s
printf yields.
An unrecognized format character causes all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
Return a NUL-terminated representation of the contents of o. The pointer
refers to the internal buffer of o, not a copy. The data must not be
modified in any way, unless the string was just created using
PyBytes_FromStringAndSize(NULL,size). It must not be deallocated. If
o is not a string object at all, PyBytes_AsString() returns NULL
and raises TypeError.
int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)¶
Return a NUL-terminated representation of the contents of the object obj
through the output variables buffer and length.
If length is NULL, the resulting buffer may not contain NUL characters;
if it does, the function returns -1 and a TypeError is raised.
The buffer refers to an internal string buffer of obj, not a copy. The data
must not be modified in any way, unless the string was just created using
PyBytes_FromStringAndSize(NULL,size). It must not be deallocated. If
string is not a string object at all, PyBytes_AsStringAndSize()
returns -1 and raises TypeError.
Create a new bytes object in *bytes containing the contents of newpart
appended to bytes; the caller will own the new reference. The reference to
the old value of bytes will be stolen. If the new string cannot be
created, the old reference to bytes will still be discarded and the value
of *bytes will be set to NULL; the appropriate exception will be set.
Create a new string object in *bytes containing the contents of newpart
appended to bytes. This version decrements the reference count of
newpart.
int _PyBytes_Resize(PyObject **bytes, Py_ssize_t newsize)¶
A way to resize a bytes object even though it is “immutable”. Only use this
to build up a brand new bytes object; don’t use this if the bytes may already
be known in other parts of the code. It is an error to call this function if
the refcount on the input bytes object is not one. Pass the address of an
existing bytes object as an lvalue (it may be written into), and the new size
desired. On success, *bytes holds the resized bytes object and 0 is
returned; the address in *bytes may differ from its input value. If the
reallocation fails, the original bytes object at *bytes is deallocated,
*bytes is set to NULL, a memory exception is set, and -1 is
returned.
This type represents the storage type which is used by Python internally as
basis for holding Unicode ordinals. Python’s default builds use a 16-bit type
for Py_UNICODE and store Unicode values internally as UCS2. It is also
possible to build a UCS4 version of Python (most recent Linux distributions come
with UCS4 builds of Python). These builds then use a 32-bit type for
Py_UNICODE and store Unicode data internally as UCS4. On platforms
where wchar_t is available and compatible with the chosen Python
Unicode build variant, Py_UNICODE is a typedef alias for
wchar_t to enhance native platform compatibility. On all other
platforms, Py_UNICODE is a typedef alias for either unsignedshort (UCS2) or unsignedlong (UCS4).
Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
this in mind when writing extensions or interfaces.
Unicode provides many different character properties. The most often needed ones
are available through these macros which are mapped to C functions depending on
the Python configuration.
Return 1 or 0 depending on whether ch is a printable character.
Nonprintable characters are those characters defined in the Unicode character
database as “Other” or “Separator”, excepting the ASCII space (0x20) which is
considered printable. (Note that printable characters in this context are
those which should not be escaped when repr() is invoked on a string.
It has no bearing on the handling of strings written to sys.stdout or
sys.stderr.)
These APIs can be used for fast direct character conversions:
Create a Unicode object from the Py_UNICODE buffer u of the given size. u
may be NULL which causes the contents to be undefined. It is the user’s
responsibility to fill in the needed data. The buffer is copied into the new
object. If the buffer is not NULL, the return value might be a shared object.
Therefore, modification of the resulting Unicode object is only allowed when u
is NULL.
Create a Unicode object from the char buffer u. The bytes will be interpreted
as being UTF-8 encoded. u may also be NULL which
causes the contents to be undefined. It is the user’s responsibility to fill in
the needed data. The buffer is copied into the new object. If the buffer is not
NULL, the return value might be a shared object. Therefore, modification of
the resulting Unicode object is only allowed when u is NULL.
Take a C printf()-style format string and a variable number of
arguments, calculate the size of the resulting Python unicode string and return
a string with the values formatted into it. The variable arguments must be C
types and must correspond exactly to the format characters in the format
ASCII-encoded string. The following format characters are allowed:
Format Characters
Type
Comment
%%
n/a
The literal % character.
%c
int
A single character,
represented as an C int.
%d
int
Exactly equivalent to
printf("%d").
%u
unsigned int
Exactly equivalent to
printf("%u").
%ld
long
Exactly equivalent to
printf("%ld").
%lu
unsigned long
Exactly equivalent to
printf("%lu").
%lld
long long
Exactly equivalent to
printf("%lld").
%llu
unsigned long long
Exactly equivalent to
printf("%llu").
%zd
Py_ssize_t
Exactly equivalent to
printf("%zd").
%zu
size_t
Exactly equivalent to
printf("%zu").
%i
int
Exactly equivalent to
printf("%i").
%x
int
Exactly equivalent to
printf("%x").
%s
char*
A null-terminated C character
array.
%p
void*
The hex representation of a C
pointer. Mostly equivalent to
printf("%p") except that
it is guaranteed to start with
the literal 0x regardless
of what the platform’s
printf yields.
A unicode object (which may be
NULL) and a null-terminated
C character array as a second
parameter (which will be used,
if the first parameter is
NULL).
Create a Unicode object by replacing all decimal digits in
Py_UNICODE buffer of the given size by ASCII digits 0–9
according to their decimal value. Return NULL if an exception
occurs.
Create a copy of a Unicode string ending with a nul character. Return NULL
and raise a MemoryError exception on memory allocation failure,
otherwise return a new allocated buffer (use PyMem_Free() to free the
buffer).
Coerce an encoded object obj to an Unicode object and return a reference with
incremented refcount.
bytes, bytearray and other char buffer compatible objects
are decoded according to the given encoding and using the error handling
defined by errors. Both can be NULL to have the interface use the default
values (see the next section for details).
All other objects, including Unicode objects, cause a TypeError to be
set.
The API returns NULL if there was an error. The caller is responsible for
decref’ing the returned objects.
Shortcut for PyUnicode_FromEncodedObject(obj,NULL,"strict") which is used
throughout the interpreter whenever coercion to Unicode is needed.
If the platform supports wchar_t and provides a header file wchar.h,
Python can interface directly to this type using the following functions.
Support is optimized if Python’s own Py_UNICODE type is identical to
the system’s wchar_t.
To encode and decode file names and other environment strings,
Py_FileSystemEncoding should be used as the encoding, and
"surrogateescape" should be used as the error handler (PEP 383). To
encode file names during argument parsing, the "O&" converter should be
used, passing PyUnicode_FSConverter() as the conversion function:
int PyUnicode_FSConverter(PyObject* obj, void* result)¶
Create a Unicode object from the wchar_t buffer w of the given size.
Passing -1 as the size indicates that the function must itself compute the length,
using wcslen.
Return NULL on failure.
Copy the Unicode object contents into the wchar_t buffer w. At most
sizewchar_t characters are copied (excluding a possibly trailing
0-termination character). Return the number of wchar_t characters
copied or -1 in case of an error. Note that the resulting wchar_t
string may or may not be 0-terminated. It is the responsibility of the caller
to make sure that the wchar_t string is 0-terminated in case this is
required by the application.
Convert the Unicode object to a wide character string. The output string
always ends with a nul character. If size is not NULL, write the number
of wide characters (excluding the trailing 0-termination character) into
*size.
Returns a buffer allocated by PyMem_Alloc() (use PyMem_Free()
to free it) on success. On error, returns NULL, *size is undefined and
raises a MemoryError.
Python provides a set of built-in codecs which are written in C for speed. All of
these codecs are directly usable via the following functions.
Many of the following APIs take two arguments encoding and errors, and they
have the same semantics as the ones of the built-in str() string object
constructor.
Setting encoding to NULL causes the default encoding to be used
which is ASCII. The file system calls should use
PyUnicode_FSConverter() for encoding file names. This uses the
variable Py_FileSystemDefaultEncoding internally. This
variable should be treated as read-only: on some systems, it will be a
pointer to a static string, on others, it will change at run-time
(such as when the application invokes setlocale).
Error handling is set by errors which may also be set to NULL meaning to use
the default handling defined for the codec. Default error handling for all
built-in codecs is “strict” (ValueError is raised).
The codecs all use a similar interface. Only deviation from the following
generic ones are documented for simplicity.
Create a Unicode object by decoding size bytes of the encoded string s.
encoding and errors have the same meaning as the parameters of the same name
in the unicode() built-in function. The codec to be used is looked up
using the Python codec registry. Return NULL if an exception was raised by
the codec.
Encode the Py_UNICODE buffer s of the given size and return a Python
bytes object. encoding and errors have the same meaning as the
parameters of the same name in the Unicode encode() method. The codec
to be used is looked up using the Python codec registry. Return NULL if an
exception was raised by the codec.
Encode a Unicode object and return the result as Python bytes object.
encoding and errors have the same meaning as the parameters of the same
name in the Unicode encode() method. The codec to be used is looked up
using the Python codec registry. Return NULL if an exception was raised by
the codec.
If consumed is NULL, behave like PyUnicode_DecodeUTF8(). If
consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
treated as an error. Those bytes will not be decoded and the number of bytes
that have been decoded will be stored in consumed.
Encode a Unicode object using UTF-8 and return the result as Python bytes
object. Error handling is “strict”. Return NULL if an exception was
raised by the codec.
Decode size bytes from a UTF-32 encoded buffer string and return the
corresponding Unicode object. errors (if non-NULL) defines the error
handling. It defaults to “strict”.
If byteorder is non-NULL, the decoder starts decoding using the given byte
order:
If *byteorder is zero, and the first four bytes of the input data are a
byte order mark (BOM), the decoder switches to this byte order and the BOM is
not copied into the resulting Unicode string. If *byteorder is -1 or
1, any byte order mark is copied to the output.
After completion, *byteorder is set to the current byte order at the end
of input data.
In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
If byteorder is NULL, the codec starts in native order mode.
Return NULL if an exception was raised by the codec.
If consumed is NULL, behave like PyUnicode_DecodeUTF32(). If
consumed is not NULL, PyUnicode_DecodeUTF32Stateful() will not treat
trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
by four) as an error. Those bytes will not be decoded and the number of bytes
that have been decoded will be stored in consumed.
Return a Python byte string using the UTF-32 encoding in native byte
order. The string always starts with a BOM mark. Error handling is “strict”.
Return NULL if an exception was raised by the codec.
Decode size bytes from a UTF-16 encoded buffer string and return the
corresponding Unicode object. errors (if non-NULL) defines the error
handling. It defaults to “strict”.
If byteorder is non-NULL, the decoder starts decoding using the given byte
order:
If *byteorder is zero, and the first two bytes of the input data are a
byte order mark (BOM), the decoder switches to this byte order and the BOM is
not copied into the resulting Unicode string. If *byteorder is -1 or
1, any byte order mark is copied to the output (where it will result in
either a \ufeff or a \ufffe character).
After completion, *byteorder is set to the current byte order at the end
of input data.
If byteorder is NULL, the codec starts in native order mode.
Return NULL if an exception was raised by the codec.
If consumed is NULL, behave like PyUnicode_DecodeUTF16(). If
consumed is not NULL, PyUnicode_DecodeUTF16Stateful() will not treat
trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
split surrogate pair) as an error. Those bytes will not be decoded and the
number of bytes that have been decoded will be stored in consumed.
If byteorder is 0, the output string will always start with the Unicode BOM
mark (U+FEFF). In the other two modes, no BOM mark is prepended.
If Py_UNICODE_WIDE is defined, a single Py_UNICODE value may get
represented as a surrogate pair. If it is not defined, each Py_UNICODE
values is interpreted as an UCS-2 character.
Return NULL if an exception was raised by the codec.
Return a Python byte string using the UTF-16 encoding in native byte
order. The string always starts with a BOM mark. Error handling is “strict”.
Return NULL if an exception was raised by the codec.
If consumed is NULL, behave like PyUnicode_DecodeUTF7(). If
consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
be treated as an error. Those bytes will not be decoded and the number of
bytes that have been decoded will be stored in consumed.
PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)¶
Encode the Py_UNICODE buffer of the given size using UTF-7 and
return a Python bytes object. Return NULL if an exception was raised by
the codec.
If base64SetO is nonzero, “Set O” (punctuation that has no otherwise
special meaning) will be encoded in base-64. If base64WhiteSpace is
nonzero, whitespace will be encoded in base-64. Both are set to zero for the
Python “utf-7” codec.
Encode the Py_UNICODE buffer of the given size using Unicode-Escape and
return a Python string object. Return NULL if an exception was raised by the
codec.
Encode a Unicode object using Unicode-Escape and return the result as Python
string object. Error handling is “strict”. Return NULL if an exception was
raised by the codec.
Encode the Py_UNICODE buffer of the given size using Raw-Unicode-Escape
and return a Python string object. Return NULL if an exception was raised by
the codec.
Encode a Unicode object using Raw-Unicode-Escape and return the result as
Python string object. Error handling is “strict”. Return NULL if an exception
was raised by the codec.
Encode a Unicode object using Latin-1 and return the result as Python bytes
object. Error handling is “strict”. Return NULL if an exception was
raised by the codec.
Encode a Unicode object using ASCII and return the result as Python bytes
object. Error handling is “strict”. Return NULL if an exception was
raised by the codec.
This codec is special in that it can be used to implement many different codecs
(and this is in fact what was done to obtain most of the standard codecs
included in the encodings package). The codec uses mapping to encode and
decode characters.
Decoding mappings must map single string characters to single Unicode
characters, integers (which are then interpreted as Unicode ordinals) or None
(meaning “undefined mapping” and causing an error).
Encoding mappings must map single Unicode characters to single string
characters, integers (which are then interpreted as Latin-1 ordinals) or None
(meaning “undefined mapping” and causing an error).
The mapping objects provided must only support the __getitem__ mapping
interface.
If a character lookup fails with a LookupError, the character is copied as-is
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
resp. Because of this, mappings only need to contain those mappings which map
characters to different code points.
Create a Unicode object by decoding size bytes of the encoded string s using
the given mapping object. Return NULL if an exception was raised by the
codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
dictionary mapping byte or a unicode string, which is treated as a lookup table.
Byte values greater that the length of the string and U+FFFE “characters” are
treated as “undefined mapping”.
Encode the Py_UNICODE buffer of the given size using the given
mapping object and return a Python string object. Return NULL if an
exception was raised by the codec.
Encode a Unicode object using the given mapping object and return the result
as Python string object. Error handling is “strict”. Return NULL if an
exception was raised by the codec.
The following codec API is special in that maps Unicode to Unicode.
Translate a Py_UNICODE buffer of the given size by applying a
character mapping table to it and return the resulting Unicode object. Return
NULL when an exception was raised by the codec.
The mapping table must map Unicode ordinal integers to Unicode ordinal
integers or None (causing deletion of the character).
Mapping tables need only provide the __getitem__() interface; dictionaries
and sequences work well. Unmapped character ordinals (ones which cause a
LookupError) are left untouched and are copied as-is.
These are the MBCS codec APIs. They are currently only available on Windows and
use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
DBCS) is a class of encodings, not just one. The target encoding is defined by
the user settings on the machine running the codec.
Create a Unicode object by decoding size bytes of the MBCS encoded string s.
Return NULL if an exception was raised by the codec.
PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)¶
If consumed is NULL, behave like PyUnicode_DecodeMBCS(). If
consumed is not NULL, PyUnicode_DecodeMBCSStateful() will not decode
trailing lead byte and the number of bytes that have been decoded will be stored
in consumed.
Encode a Unicode object using MBCS and return the result as Python bytes
object. Error handling is “strict”. Return NULL if an exception was
raised by the codec.
The following APIs are capable of handling Unicode objects and strings on input
(we refer to them as strings in the descriptions) and return Unicode objects or
integers as appropriate.
They all return NULL or -1 if an exception occurs.
Split a string giving a list of Unicode strings. If sep is NULL, splitting
will be done at all whitespace substrings. Otherwise, splits occur at the given
separator. At most maxsplit splits will be done. If negative, no limit is
set. Separators are not included in the resulting list.
Split a Unicode string at line breaks, returning a list of Unicode strings.
CRLF is considered to be one line break. If keepend is 0, the Line break
characters are not included in the resulting strings.
Translate a string by applying a character mapping table to it and return the
resulting Unicode object.
The mapping table must map Unicode ordinal integers to Unicode ordinal integers
or None (causing deletion of the character).
Mapping tables need only provide the __getitem__() interface; dictionaries
and sequences work well. Unmapped character ordinals (ones which cause a
LookupError) are left untouched and are copied as-is.
errors has the usual meaning for codecs. It may be NULL which indicates to
use the default error handling.
Join a sequence of strings using the given separator and return the resulting
Unicode string.
int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)¶
Return 1 if substr matches str[start:end] at the given tail end
(direction == -1 means to do a prefix match, direction == 1 a suffix match),
0 otherwise. Return -1 if an error occurred.
Return the first position of substr in str[start:end] using the given
direction (direction == 1 means to do a forward search, direction == -1 a
backward search). The return value is the index of the first match; a value of
-1 indicates that no match was found, and -2 indicates that an error
occurred and an exception has been set.
Replace at most maxcount occurrences of substr in str with replstr and
return the resulting Unicode object. maxcount == -1 means replace all
occurrences.
Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
respectively.
int PyUnicode_CompareWithASCIIString(PyObject *uni, char *string)¶
Compare a unicode object, uni, with string and return -1, 0, 1 for less
than, equal, and greater than, respectively. It is best to pass only
ASCII-encoded strings, but the function interprets the input string as
ISO-8859-1 if it contains non-ASCII characters”.
Intern the argument *string in place. The argument must be the address of a
pointer variable pointing to a Python unicode string object. If there is an
existing interned string that is the same as *string, it sets *string to
it (decrementing the reference count of the old string object and incrementing
the reference count of the interned string object), otherwise it leaves
*string alone and interns it (incrementing its reference count).
(Clarification: even though there is a lot of talk about reference counts, think
of this function as reference-count-neutral; you own the object after the call
if and only if you owned it before the call.)
A combination of PyUnicode_FromString() and
PyUnicode_InternInPlace(), returning either a new unicode string object
that has been interned, or a new (“owned”) reference to an earlier interned
string object with the same value.
Return a new tuple object of size n, or NULL on failure. The tuple values
are initialized to the subsequent n C arguments pointing to Python objects.
PyTuple_Pack(2,a,b) is equivalent to Py_BuildValue("(OO)",a,b).
Like PyTuple_SetItem(), but does no error checking, and should only be
used to fill in brand new tuples.
Note
This function “steals” a reference to o.
int _PyTuple_Resize(PyObject **p, Py_ssize_t newsize)¶
Can be used to resize a tuple. newsize will be the new length of the tuple.
Because tuples are supposed to be immutable, this should only be used if there
is only one reference to the object. Do not use this if the tuple may already
be known to some other part of the code. The tuple will always grow or shrink
at the end. Think of this as destroying the old tuple and creating a new one,
only more efficiently. Returns 0 on success. Client code should never
assume that the resulting value of *p will be the same as before calling
this function. If the object referenced by *p is replaced, the original
*p is destroyed. On failure, returns -1 and sets *p to NULL, and
raises MemoryError or SystemError.
Return a new list of length len on success, or NULL on failure.
Note
If len is greater than zero, the returned list object’s items are
set to NULL. Thus you cannot use abstract API functions such as
PySequence_SetItem() or expose the object to Python code before
setting all items to a real object with PyList_SetItem().
Return the object at position index in the list pointed to by list. The
position must be positive, indexing from the end of the list is not
supported. If index is out of bounds, return NULL and set an
IndexError exception.
Macro form of PyList_SetItem() without error checking. This is
normally only used to fill in new lists where there is no previous content.
Note
This macro “steals” a reference to item, and, unlike
PyList_SetItem(), does not discard a reference to any item that
is being replaced; any reference in list at position i will be
leaked.
int PyList_Insert(PyObject *list, Py_ssize_t index, PyObject *item)¶
Insert the item item into list list in front of index index. Return
0 if successful; return -1 and set an exception if unsuccessful.
Analogous to list.insert(index,item).
Append the object item at the end of list list. Return 0 if
successful; return -1 and set an exception if unsuccessful. Analogous
to list.append(item).
Return a list of the objects in list containing the objects betweenlow
and high. Return NULL and set an exception if unsuccessful. Analogous
to list[low:high]. Negative indices, as when slicing from Python, are not
supported.
int PyList_SetSlice(PyObject *list, Py_ssize_t low, Py_ssize_t high, PyObject *itemlist)¶
Set the slice of list between low and high to the contents of
itemlist. Analogous to list[low:high]=itemlist. The itemlist may
be NULL, indicating the assignment of an empty list (slice deletion).
Return 0 on success, -1 on failure. Negative indices, as when
slicing from Python, are not supported.
Return a proxy object for a mapping which enforces read-only behavior.
This is normally used to create a proxy to prevent modification of the
dictionary for non-dynamic class types.
Determine if dictionary p contains key. If an item in p is matches
key, return 1, otherwise return 0. On error, return -1.
This is equivalent to the Python expression keyinp.
Insert value into the dictionary p with a key of key. key must be
hashable; if it isn’t, TypeError will be raised. Return
0 on success or -1 on failure.
int PyDict_SetItemString(PyObject *p, const char *key, PyObject *val)¶
Insert value into the dictionary p using key as a key. key should
be a char*. The key object is created using
PyUnicode_FromString(key). Return 0 on success or -1 on
failure.
Variant of PyDict_GetItem() that does not suppress
exceptions. Return NULLwith an exception set if an exception
occurred. Return NULLwithout an exception set if the key
wasn’t present.
Iterate over all key-value pairs in the dictionary p. The
Py_ssize_t referred to by ppos must be initialized to 0
prior to the first call to this function to start the iteration; the
function returns true for each pair in the dictionary, and false once all
pairs have been reported. The parameters pkey and pvalue should either
point to PyObject* variables that will be filled in with each key
and value, respectively, or may be NULL. Any references returned through
them are borrowed. ppos should not be altered during iteration. Its
value represents offsets within the internal dictionary structure, and
since the structure is sparse, the offsets are not consecutive.
For example:
PyObject*key,*value;Py_ssize_tpos=0;while(PyDict_Next(self->dict,&pos,&key,&value)){/* do something interesting with the values... */...}
The dictionary p should not be mutated during iteration. It is safe to
modify the values of the keys as you iterate over the dictionary, but only
so long as the set of keys does not change. For example:
Iterate over mapping object b adding key-value pairs to dictionary a.
b may be a dictionary, or any object supporting PyMapping_Keys()
and PyObject_GetItem(). If override is true, existing pairs in a
will be replaced if a matching key is found in b, otherwise pairs will
only be added if there is not a matching key in a. Return 0 on
success or -1 if an exception was raised.
This is the same as PyDict_Merge(a,b,1) in C, or a.update(b) in
Python. Return 0 on success or -1 if an exception was raised.
int PyDict_MergeFromSeq2(PyObject *a, PyObject *seq2, int override)¶
Update or merge into dictionary a, from the key-value pairs in seq2.
seq2 must be an iterable object producing iterable objects of length 2,
viewed as key-value pairs. In case of duplicate keys, the last wins if
override is true, else the first wins. Return 0 on success or -1
if an exception was raised. Equivalent Python (except for the return
value):
This subtype of PyObject is used to hold the internal data for both
set and frozenset objects. It is like a PyDictObject
in that it is a fixed size for small sets (much like tuple storage) and will
point to a separate, variable sized block of memory for medium and large sized
sets (much like list storage). None of the fields of this structure should be
considered public and are subject to change. All access should be done through
the documented API rather than by manipulating the values in the structure.
Return a new set containing objects returned by the iterable. The
iterable may be NULL to create a new empty set. Return the new set on
success or NULL on failure. Raise TypeError if iterable is not
actually iterable. The constructor is also useful for copying a set
(c=set(s)).
Return a new frozenset containing objects returned by the iterable.
The iterable may be NULL to create a new empty frozenset. Return the new
set on success or NULL on failure. Raise TypeError if iterable is
not actually iterable.
The following functions and macros are available for instances of set
or frozenset or instances of their subtypes.
Return the length of a set or frozenset object. Equivalent to
len(anyset). Raises a PyExc_SystemError if anyset is not a
set, frozenset, or an instance of a subtype.
Return 1 if found, 0 if not found, and -1 if an error is encountered. Unlike
the Python __contains__() method, this function does not automatically
convert unhashable sets into temporary frozensets. Raise a TypeError if
the key is unhashable. Raise PyExc_SystemError if anyset is not a
set, frozenset, or an instance of a subtype.
Add key to a set instance. Also works with frozenset
instances (like PyTuple_SetItem() it can be used to fill-in the values
of brand new frozensets before they are exposed to other code). Return 0 on
success or -1 on failure. Raise a TypeError if the key is
unhashable. Raise a MemoryError if there is no room to grow. Raise a
SystemError if set is an not an instance of set or its
subtype.
The following functions are available for instances of set or its
subtypes but not for instances of frozenset or its subtypes.
Return 1 if found and removed, 0 if not found (no action taken), and -1 if an
error is encountered. Does not raise KeyError for missing keys. Raise a
TypeError if the key is unhashable. Unlike the Python discard()
method, this function does not automatically convert unhashable sets into
temporary frozensets. Raise PyExc_SystemError if set is an not an
instance of set or its subtype.
Return a new reference to an arbitrary object in the set, and removes the
object from the set. Return NULL on failure. Raise KeyError if the
set is empty. Raise a SystemError if set is an not an instance of
set or its subtype.
Return the __module__ attribute of the function object op. This is normally
a string containing the module name, but can be set to any other object by
Python code.
An instance method is a wrapper for a PyCFunction and the new way
to bind a PyCFunction to a class object. It replaces the former call
PyMethod_New(func,NULL,class).
Return a new instance method object, with func being any callable object
func is is the function that will be called when the instance method is
called.
Methods are bound function objects. Methods are always bound to an instance of
an user-defined class. Unbound methods (methods bound to a class object) are
no longer available.
Return a new method object, with func being any callable object and self
the instance the method should be bound. func is is the function that will
be called when the method is called. self must not be NULL.
These APIs are a minimal emulation of the Python 2 C API for built-in file
objects, which used to rely on the buffered I/O (FILE*) support
from the C standard library. In Python 3, files and streams use the new
io module, which defines several layers over the low-level unbuffered
I/O of the operating system. The functions described below are
convenience C wrappers over these new APIs, and meant mostly for internal
error reporting in the interpreter; third-party code is advised to access
the io APIs instead.
PyFile_FromFd(int fd, char *name, char *mode, int buffering, char *encoding, char *errors, char *newline, int closefd)¶
Create a Python file object from the file descriptor of an already
opened file fd. The arguments name, encoding, errors and newline
can be NULL to use the defaults; buffering can be -1 to use the
default. name is ignored and kept for backward compatibility. Return
NULL on failure. For a more comprehensive description of the arguments,
please refer to the io.open() function documentation.
Warning
Since Python streams have their own buffering layer, mixing them with
OS-level file descriptors can produce various issues (such as unexpected
ordering of data).
Return the file descriptor associated with p as an int. If the
object is an integer, its value is returned. If not, the
object’s fileno() method is called if it exists; the method must return
an integer, which is returned as the file descriptor value. Sets an
exception and returns -1 on failure.
Equivalent to p.readline([n]), this function reads one line from the
object p. p may be a file object or any object with a readline()
method. If n is 0, exactly one line is read, regardless of the length of
the line. If n is greater than 0, no more than n bytes will be read
from the file; a partial line can be returned. In both cases, an empty string
is returned if the end of the file is reached immediately. If n is less than
0, however, one line is read regardless of length, but EOFError is
raised if the end of the file is reached immediately.
Write object obj to file object p. The only supported flag for flags is
Py_PRINT_RAW; if given, the str() of the object is written
instead of the repr(). Return 0 on success or -1 on failure; the
appropriate exception will be set.
int PyFile_WriteString(const char *s, PyObject *p)¶
Write string s to file object p. Return 0 on success or -1 on
failure; the appropriate exception will be set.
Return a new module object with the __name__ attribute set to name.
Only the module’s __doc__ and __name__ attributes are filled in;
the caller is responsible for providing a __file__ attribute.
Return the dictionary object that implements module‘s namespace; this object
is the same as the __dict__ attribute of the module object. This
function never fails. It is recommended extensions use other
PyModule_*() and PyObject_*() functions rather than directly
manipulate a module’s __dict__.
Return the name of the file from which module was loaded using module‘s
__file__ attribute. If this is not defined, or if it is not a
unicode string, raise SystemError and return NULL; otherwise return
a reference to a PyUnicodeObject.
Create a new module object, given the definition in module, assuming the
API version module_api_version. If that version does not match the version
of the running interpreter, a RuntimeWarning is emitted.
Note
Most uses of this function should be using PyModule_Create()
instead; only use this if you are sure you need it.
This struct holds all information that is needed to create a module object.
There is usually only one static variable of that type for each module, which
is statically initialized and then passed to PyModule_Create() in the
module initialization function.
If the module object needs additional memory, this should be set to the
number of bytes to allocate; a pointer to the block of memory can be
retrieved with PyModule_GetState(). If no memory is needed, set
this to -1.
This memory should be used, rather than static globals, to hold per-module
state, since it is then safe for use in multiple sub-interpreters. It is
freed when the module object is deallocated, after the m_free
function has been called, if present.
A function to call during deallocation of the module object, or NULL if
not needed.
int PyModule_AddObject(PyObject *module, const char *name, PyObject *value)¶
Add an object to module as name. This is a convenience function which can
be used from the module’s initialization function. This steals a reference to
value. Return -1 on error, 0 on success.
int PyModule_AddIntConstant(PyObject *module, const char *name, long value)¶
Add an integer constant to module as name. This convenience function can be
used from the module’s initialization function. Return -1 on error, 0 on
success.
int PyModule_AddStringConstant(PyObject *module, const char *name, const char *value)¶
Add a string constant to module as name. This convenience function can be
used from the module’s initialization function. The string value must be
null-terminated. Return -1 on error, 0 on success.
int PyModule_AddIntMacro(PyObject *module, macro)¶
Add an int constant to module. The name and the value are taken from
macro. For example PyModule_AddIntMacro(module,AF_INET) adds the int
constant AF_INET with the value of AF_INET to module.
Return -1 on error, 0 on success.
int PyModule_AddStringMacro(PyObject *module, macro)¶
Python provides two general-purpose iterator objects. The first, a sequence
iterator, works with an arbitrary sequence supporting the __getitem__()
method. The second works with a callable object and a sentinel value, calling
the callable for each item in the sequence, and ending the iteration when the
sentinel value is returned.
Return an iterator that works with a general sequence object, seq. The
iteration ends when the sequence raises IndexError for the subscripting
operation.
Return a new iterator. The first parameter, callable, can be any Python
callable object that can be called with no parameters; each call to it should
return the next item in the iteration. When callable returns a value equal to
sentinel, the iteration will be terminated.
Return true if the descriptor objects descr describes a data attribute, or
false if it describes a method. descr must be a descriptor object; there is
no error checking.
Return a new slice object with the given values. The start, stop, and
step parameters are used as the values of the slice object attributes of
the same names. Any of the values may be NULL, in which case the
None will be used for the corresponding attribute. Return NULL if
the new object could not be allocated.
Retrieve the start, stop and step indices from the slice object slice,
assuming a sequence of length length. Treats indices greater than
length as errors.
Returns 0 on success and -1 on error with no exception set (unless one of
the indices was not None and failed to be converted to an integer,
in which case -1 is returned with an exception set).
You probably do not want to use this function.
Changed in version 3.2:
Changed in version 3.2: The parameter type for the slice parameter was PySliceObject*
before.
Usable replacement for PySlice_GetIndices(). Retrieve the start,
stop, and step indices from the slice object slice assuming a sequence of
length length, and store the length of the slice in slicelength. Out
of bounds indices are clipped in a manner consistent with the handling of
normal slices.
Returns 0 on success and -1 on error with exception set.
Changed in version 3.2:
Changed in version 3.2: The parameter type for the slice parameter was PySliceObject*
before.
Create a memoryview object from an object that provides the buffer interface.
If obj supports writable buffer exports, the memoryview object will be
readable and writable, other it will be read-only.
Create a memoryview object wrapping the given buffer structure view.
The memoryview object then owns the buffer represented by view, which
means you shouldn’t try to call PyBuffer_Release() yourself: it
will be done on deallocation of the memoryview object.
PyObject *PyMemoryView_GetContiguous(PyObject *obj, int buffertype, char order)¶
Create a memoryview object to a contiguous chunk of memory (in either
‘C’ or ‘F’ortran order) from an object that defines the buffer
interface. If memory is contiguous, the memoryview object points to the
original memory. Otherwise copy is made and the memoryview points to a
new bytes object.
Return a pointer to the buffer structure wrapped by the given
memoryview object. The object must be a memoryview instance;
this macro doesn’t check its type, you must do it yourself or you
will risk crashes.
Python supports weak references as first-class objects. There are two
specific object types which directly implement weak references. The first is a
simple reference object, and the second acts as a proxy for the original object
as much as it can.
Return a weak reference object for the object ob. This will always return
a new reference, but is not guaranteed to create a new object; an existing
reference object may be returned. The second parameter, callback, can be a
callable object that receives notification when ob is garbage collected; it
should accept a single parameter, which will be the weak reference object
itself. callback may also be None or NULL. If ob is not a
weakly-referencable object, or if callback is not callable, None, or
NULL, this will return NULL and raise TypeError.
Return a weak reference proxy object for the object ob. This will always
return a new reference, but is not guaranteed to create a new object; an
existing proxy object may be returned. The second parameter, callback, can
be a callable object that receives notification when ob is garbage
collected; it should accept a single parameter, which will be the weak
reference object itself. callback may also be None or NULL. If ob
is not a weakly-referencable object, or if callback is not callable,
None, or NULL, this will return NULL and raise TypeError.
Return the referenced object from a weak reference, ref. If the referent is
no longer live, returns Py_None.
Note
This function returns a borrowed reference to the referenced object.
This means that you should always call Py_INCREF() on the object
except if you know that it cannot be destroyed while you are still
using it.
This subtype of PyObject represents an opaque value, useful for C
extension modules who need to pass an opaque value (as a void*
pointer) through Python code to other C code. It is often used to make a C
function pointer defined in one module available to other modules, so the
regular import mechanism can be used to access C APIs defined in dynamically
loaded modules.
Create a PyCapsule encapsulating the pointer. The pointer
argument may not be NULL.
On failure, set an exception and return NULL.
The name string may either be NULL or a pointer to a valid C string. If
non-NULL, this string must outlive the capsule. (Though it is permitted to
free it inside the destructor.)
If the destructor argument is not NULL, it will be called with the
capsule as its argument when it is destroyed.
If this capsule will be stored as an attribute of a module, the name should
be specified as modulename.attributename. This will enable other modules
to import the capsule using PyCapsule_Import().
Retrieve the pointer stored in the capsule. On failure, set an exception
and return NULL.
The name parameter must compare exactly to the name stored in the capsule.
If the name stored in the capsule is NULL, the name passed in must also
be NULL. Python uses the C function strcmp() to compare capsule
names.
Return the current destructor stored in the capsule. On failure, set an
exception and return NULL.
It is legal for a capsule to have a NULL destructor. This makes a NULL
return code somewhat ambiguous; use PyCapsule_IsValid() or
PyErr_Occurred() to disambiguate.
Return the current context stored in the capsule. On failure, set an
exception and return NULL.
It is legal for a capsule to have a NULL context. This makes a NULL
return code somewhat ambiguous; use PyCapsule_IsValid() or
PyErr_Occurred() to disambiguate.
Return the current name stored in the capsule. On failure, set an exception
and return NULL.
It is legal for a capsule to have a NULL name. This makes a NULL return
code somewhat ambiguous; use PyCapsule_IsValid() or
PyErr_Occurred() to disambiguate.
void* PyCapsule_Import(const char *name, int no_block)¶
Import a pointer to a C object from a capsule attribute in a module. The
name parameter should specify the full name to the attribute, as in
module.attribute. The name stored in the capsule must match this
string exactly. If no_block is true, import the module without blocking
(using PyImport_ImportModuleNoBlock()). If no_block is false,
import the module conventionally (using PyImport_ImportModule()).
Return the capsule’s internal pointer on success. On failure, set an
exception and return NULL. However, if PyCapsule_Import() failed to
import the module, and no_block was true, no exception is set.
int PyCapsule_IsValid(PyObject *capsule, const char *name)¶
Determines whether or not capsule is a valid capsule. A valid capsule is
non-NULL, passes PyCapsule_CheckExact(), has a non-NULL pointer
stored in it, and its internal name matches the name parameter. (See
PyCapsule_GetPointer() for information on how capsule names are
compared.)
In other words, if PyCapsule_IsValid() returns a true value, calls to
any of the accessors (any function starting with PyCapsule_Get()) are
guaranteed to succeed.
Return a nonzero value if the object is valid and matches the name passed in.
Return 0 otherwise. This function will not fail.
int PyCapsule_SetContext(PyObject *capsule, void *context)¶
Set the context pointer inside capsule to context.
Return 0 on success. Return nonzero and set an exception on failure.
Return 0 on success. Return nonzero and set an exception on failure.
int PyCapsule_SetName(PyObject *capsule, const char *name)¶
Set the name inside capsule to name. If non-NULL, the name must
outlive the capsule. If the previous name stored in the capsule was not
NULL, no attempt is made to free it.
Return 0 on success. Return nonzero and set an exception on failure.
int PyCapsule_SetPointer(PyObject *capsule, void *pointer)¶
Set the void pointer inside capsule to pointer. The pointer may not be
NULL.
Return 0 on success. Return nonzero and set an exception on failure.
“Cell” objects are used to implement variables referenced by multiple scopes.
For each such variable, a cell object is created to store the value; the local
variables of each stack frame that references the value contains a reference to
the cells from outer scopes which also use that variable. When the value is
accessed, the value contained in the cell is used instead of the cell object
itself. This de-referencing of the cell object requires support from the
generated byte-code; these are not automatically de-referenced when accessed.
Cell objects are not likely to be useful elsewhere.
Set the contents of the cell object cell to value. This releases the
reference to any current content of the cell. value may be NULL. cell
must be non-NULL; if it is not a cell object, -1 will be returned. On
success, 0 will be returned.
Sets the value of the cell object cell to value. No reference counts are
adjusted, and no checks are made for safety; cell must be non-NULL and must
be a cell object.
Generator objects are what Python uses to implement generator iterators. They
are normally created by iterating over a function that yields values, rather
than explicitly calling PyGen_New().
Various date and time objects are supplied by the datetime module.
Before using any of these functions, the header file datetime.h must be
included in your source (note that this is not included by Python.h),
and the macro PyDateTime_IMPORT must be invoked, usually as part of
the module initialisation function. The macro puts a pointer to a C structure
into a static variable, PyDateTimeAPI, that is used by the following
macros.
Return true if ob is of type PyDateTime_TZInfoType. ob must not be
NULL.
Macros to create objects:
PyObject* PyDate_FromDate(int year, int month, int day)¶
Return a datetime.date object with the specified year, month and day.
PyObject* PyDateTime_FromDateAndTime(int year, int month, int day, int hour, int minute, int second, int usecond)¶
Return a datetime.datetime object with the specified year, month, day, hour,
minute, second and microsecond.
PyObject* PyTime_FromTime(int hour, int minute, int second, int usecond)¶
Return a datetime.time object with the specified hour, minute, second and
microsecond.
PyObject* PyDelta_FromDSU(int days, int seconds, int useconds)¶
Return a datetime.timedelta object representing the given number of days,
seconds and microseconds. Normalization is performed so that the resulting
number of microseconds and seconds lie in the ranges documented for
datetime.timedelta objects.
Macros to extract fields from date objects. The argument must be an instance of
PyDateTime_Date, including subclasses (such as
PyDateTime_DateTime). The argument must not be NULL, and the type is
not checked:
Macros to extract fields from datetime objects. The argument must be an
instance of PyDateTime_DateTime, including subclasses. The argument
must not be NULL, and the type is not checked:
int PyDateTime_DATE_GET_HOUR(PyDateTime_DateTime *o)¶
Return the hour, as an int from 0 through 23.
int PyDateTime_DATE_GET_MINUTE(PyDateTime_DateTime *o)¶
Return the minute, as an int from 0 through 59.
int PyDateTime_DATE_GET_SECOND(PyDateTime_DateTime *o)¶
Return the second, as an int from 0 through 59.
int PyDateTime_DATE_GET_MICROSECOND(PyDateTime_DateTime *o)¶
Return the microsecond, as an int from 0 through 999999.
Macros to extract fields from time objects. The argument must be an instance of
PyDateTime_Time, including subclasses. The argument must not be NULL,
and the type is not checked:
Code objects are a low-level detail of the CPython implementation.
Each one represents a chunk of executable code that hasn’t yet been
bound into a function.
Return a new code object. If you need a dummy code object to
create a frame, use PyCode_NewEmpty() instead. Calling
PyCode_New() directly can bind you to a precise Python
version since the definition of the bytecode changes often.
int PyCode_NewEmpty(const char *filename, const char *funcname, int firstlineno)¶
Return a new empty code object with the specified filename,
function name, and first line number. It is illegal to
exec() or eval() the resulting code object.
Initialize the Python interpreter. In an application embedding Python, this
should be called before using any other Python/C API functions; with the
exception of Py_SetProgramName() and Py_SetPath(). This initializes
the table of loaded modules (sys.modules), and creates the fundamental
modules builtins, __main__ and sys. It also initializes
the module search path (sys.path). It does not set sys.argv; use
PySys_SetArgvEx() for that. This is a no-op when called for a second time
(without calling Py_Finalize() first). There is no return value; it is a
fatal error if the initialization fails.
This function works like Py_Initialize() if initsigs is 1. If
initsigs is 0, it skips initialization registration of signal handlers, which
might be useful when Python is embedded.
Return true (nonzero) when the Python interpreter has been initialized, false
(zero) if not. After Py_Finalize() is called, this returns false until
Py_Initialize() is called again.
Undo all initializations made by Py_Initialize() and subsequent use of
Python/C API functions, and destroy all sub-interpreters (see
Py_NewInterpreter() below) that were created and not yet destroyed since
the last call to Py_Initialize(). Ideally, this frees all memory
allocated by the Python interpreter. This is a no-op when called for a second
time (without calling Py_Initialize() again first). There is no return
value; errors during finalization are ignored.
This function is provided for a number of reasons. An embedding application
might want to restart Python without having to restart the application itself.
An application that has loaded the Python interpreter from a dynamically
loadable library (or DLL) might want to free all memory allocated by Python
before unloading the DLL. During a hunt for memory leaks in an application a
developer might want to free all memory allocated by Python before exiting from
the application.
Bugs and caveats: The destruction of modules and objects in modules is done
in random order; this may cause destructors (__del__() methods) to fail
when they depend on other objects (even functions) or modules. Dynamically
loaded extension modules loaded by Python are not unloaded. Small amounts of
memory allocated by the Python interpreter may not be freed (if you find a leak,
please report it). Memory tied up in circular references between objects is not
freed. Some memory allocated by extension modules may not be freed. Some
extensions may not work properly if their initialization routine is called more
than once; this can happen if an application calls Py_Initialize() and
Py_Finalize() more than once.
This function should be called before Py_Initialize() is called for
the first time, if it is called at all. It tells the interpreter the value
of the argv[0] argument to the main() function of the program
(converted to wide characters).
This is used by Py_GetPath() and some other functions below to find
the Python run-time libraries relative to the interpreter executable. The
default value is 'python'. The argument should point to a
zero-terminated wide character string in static storage whose contents will not
change for the duration of the program’s execution. No code in the Python
interpreter will change the contents of this storage.
Return the program name set with Py_SetProgramName(), or the default.
The returned string points into static storage; the caller should not modify its
value.
Return the prefix for installed platform-independent files. This is derived
through a number of complicated rules from the program name set with
Py_SetProgramName() and some environment variables; for example, if the
program name is '/usr/local/bin/python', the prefix is '/usr/local'. The
returned string points into static storage; the caller should not modify its
value. This corresponds to the prefix variable in the top-level
Makefile and the --prefix argument to the configure
script at build time. The value is available to Python code as sys.prefix.
It is only useful on Unix. See also the next function.
Return the exec-prefix for installed platform-dependent files. This is
derived through a number of complicated rules from the program name set with
Py_SetProgramName() and some environment variables; for example, if the
program name is '/usr/local/bin/python', the exec-prefix is
'/usr/local'. The returned string points into static storage; the caller
should not modify its value. This corresponds to the exec_prefix
variable in the top-level Makefile and the --exec-prefix
argument to the configure script at build time. The value is
available to Python code as sys.exec_prefix. It is only useful on Unix.
Background: The exec-prefix differs from the prefix when platform dependent
files (such as executables and shared libraries) are installed in a different
directory tree. In a typical installation, platform dependent files may be
installed in the /usr/local/plat subtree while platform independent may
be installed in /usr/local.
Generally speaking, a platform is a combination of hardware and software
families, e.g. Sparc machines running the Solaris 2.x operating system are
considered the same platform, but Intel machines running Solaris 2.x are another
platform, and Intel machines running Linux are yet another platform. Different
major revisions of the same operating system generally also form different
platforms. Non-Unix operating systems are a different story; the installation
strategies on those systems are so different that the prefix and exec-prefix are
meaningless, and set to the empty string. Note that compiled Python bytecode
files are platform independent (but not independent from the Python version by
which they were compiled!).
System administrators will know how to configure the mount or
automount programs to share /usr/local between platforms
while having /usr/local/plat be a different filesystem for each
platform.
Return the full program name of the Python executable; this is computed as a
side-effect of deriving the default module search path from the program name
(set by Py_SetProgramName() above). The returned string points into
static storage; the caller should not modify its value. The value is available
to Python code as sys.executable.
Return the default module search path; this is computed from the program name
(set by Py_SetProgramName() above) and some environment variables.
The returned string consists of a series of directory names separated by a
platform dependent delimiter character. The delimiter character is ':'
on Unix and Mac OS X, ';' on Windows. The returned string points into
static storage; the caller should not modify its value. The list
sys.path is initialized with this value on interpreter startup; it
can be (and usually is) modified later to change the search path for loading
modules.
Set the default module search path. If this function is called before
Py_Initialize(), then Py_GetPath() won’t attempt to compute a
default search path but uses the one provided instead. This is useful if
Python is embedded by an application that has full knowledge of the location
of all modules. The path components should be separated by semicolons.
Return the version of this Python interpreter. This is a string that looks
something like
"3.0a5+ (py3k:63103M, May 12 2008, 00:53:55) \n[GCC 4.2.3]"
The first word (up to the first space character) is the current Python version;
the first three characters are the major and minor version separated by a
period. The returned string points into static storage; the caller should not
modify its value. The value is available to Python code as sys.version.
Return the platform identifier for the current platform. On Unix, this is
formed from the “official” name of the operating system, converted to lower
case, followed by the major revision number; e.g., for Solaris 2.x, which is
also known as SunOS 5.x, the value is 'sunos5'. On Mac OS X, it is
'darwin'. On Windows, it is 'win'. The returned string points into
static storage; the caller should not modify its value. The value is available
to Python code as sys.platform.
Return an indication of the compiler used to build the current Python version,
in square brackets, for example:
"[GCC 2.7.2.2]"
The returned string points into static storage; the caller should not modify its
value. The value is available to Python code as part of the variable
sys.version.
Return information about the sequence number and build date and time of the
current Python interpreter instance, for example
"#67, Aug 1 1997, 22:34:28"
The returned string points into static storage; the caller should not modify its
value. The value is available to Python code as part of the variable
sys.version.
void PySys_SetArgvEx(int argc, wchar_t **argv, int updatepath)¶
Set sys.argv based on argc and argv. These parameters are
similar to those passed to the program’s main() function with the
difference that the first entry should refer to the script file to be
executed rather than the executable hosting the Python interpreter. If there
isn’t a script that will be run, the first entry in argv can be an empty
string. If this function fails to initialize sys.argv, a fatal
condition is signalled using Py_FatalError().
If updatepath is zero, this is all the function does. If updatepath
is non-zero, the function also modifies sys.path according to the
following algorithm:
If the name of an existing script is passed in argv[0], the absolute
path of the directory where the script is located is prepended to
sys.path.
Otherwise (that is, if argc is 0 or argv[0] doesn’t point
to an existing file name), an empty string is prepended to
sys.path, which is the same as prepending the current working
directory (".").
Note
It is recommended that applications embedding the Python interpreter
for purposes other than executing a single script pass 0 as updatepath,
and update sys.path themselves if desired.
See CVE-2008-5983.
On versions before 3.1.3, you can achieve the same effect by manually
popping the first sys.path element after having called
PySys_SetArgv(), for example using:
Set the default “home” directory, that is, the location of the standard
Python libraries. See PYTHONHOME for the meaning of the
argument string.
The argument should point to a zero-terminated character string in static
storage whose contents will not change for the duration of the program’s
execution. No code in the Python interpreter will change the contents of
this storage.
Return the default “home”, that is, the value set by a previous call to
Py_SetPythonHome(), or the value of the PYTHONHOME
environment variable if it is set.
The Python interpreter is not fully thread-safe. In order to support
multi-threaded Python programs, there’s a global lock, called the global
interpreter lock or GIL, that must be held by the current thread before
it can safely access Python objects. Without the lock, even the simplest
operations could cause problems in a multi-threaded program: for example, when
two threads simultaneously increment the reference count of the same object, the
reference count could end up being incremented only once instead of twice.
Therefore, the rule exists that only the thread that has acquired the
GIL may operate on Python objects or call Python/C API functions.
In order to emulate concurrency of execution, the interpreter regularly
tries to switch threads (see sys.setswitchinterval()). The lock is also
released around potentially blocking I/O operations like reading or writing
a file, so that other Python threads can run in the meantime.
The Python interpreter keeps some thread-specific bookkeeping information
inside a data structure called PyThreadState. There’s also one
global variable pointing to the current PyThreadState: it can
be retrieved using PyThreadState_Get().
The Py_BEGIN_ALLOW_THREADS macro opens a new block and declares a
hidden local variable; the Py_END_ALLOW_THREADS macro closes the
block. These two macros are still available when Python is compiled without
thread support (they simply have an empty expansion).
When thread support is enabled, the block above expands to the following code:
Here is how these functions work: the global interpreter lock is used to protect the pointer to the
current thread state. When releasing the lock and saving the thread state,
the current thread state pointer must be retrieved before the lock is released
(since another thread could immediately acquire the lock and store its own thread
state in the global variable). Conversely, when acquiring the lock and restoring
the thread state, the lock must be acquired before storing the thread state
pointer.
Note
Calling system I/O functions is the most common use case for releasing
the GIL, but it can also be useful before calling long-running computations
which don’t need access to Python objects, such as compression or
cryptographic functions operating over memory buffers. For example, the
standard zlib and hashlib modules release the GIL when
compressing or hashing data.
When threads are created using the dedicated Python APIs (such as the
threading module), a thread state is automatically associated to them
and the code showed above is therefore correct. However, when threads are
created from C (for example by a third-party library with its own thread
management), they don’t hold the GIL, nor is there a thread state structure
for them.
If you need to call Python code from these threads (often this will be part
of a callback API provided by the aforementioned third-party library),
you must first register these threads with the interpreter by
creating a thread state data structure, then acquiring the GIL, and finally
storing their thread state pointer, before you can start using the Python/C
API. When you are done, you should reset the thread state pointer, release
the GIL, and finally free the thread state data structure.
PyGILState_STATEgstate;gstate=PyGILState_Ensure();/* Perform Python actions here. */result=CallSomeFunction();/* evaluate result or handle exception *//* Release the thread. No Python API allowed beyond this point. */PyGILState_Release(gstate);
Note that the PyGILState_*() functions assume there is only one global
interpreter (created automatically by Py_Initialize()). Python
supports the creation of additional interpreters (using
Py_NewInterpreter()), but mixing multiple interpreters and the
PyGILState_*() API is unsupported.
Another important thing to note about threads is their behaviour in the face
of the C fork() call. On most systems with fork(), after a
process forks only the thread that issued the fork will exist. That also
means any locks held by other threads will never be released. Python solves
this for os.fork() by acquiring the locks it uses internally before
the fork, and releasing them afterwards. In addition, it resets any
Lock Objects in the child. When extending or embedding Python, there
is no way to inform Python of additional (non-Python) locks that need to be
acquired before or reset after a fork. OS facilities such as
pthread_atfork() would need to be used to accomplish the same thing.
Additionally, when extending or embedding Python, calling fork()
directly rather than through os.fork() (and returning to or calling
into Python) may result in a deadlock by one of Python’s internal locks
being held by a thread that is defunct after the fork.
PyOS_AfterFork() tries to reset the necessary locks, but is not
always able to.
This data structure represents the state shared by a number of cooperating
threads. Threads belonging to the same interpreter share their module
administration and a few other internal items. There are no public members in
this structure.
Threads belonging to different interpreters initially share nothing, except
process state like available memory, open file descriptors and such. The global
interpreter lock is also shared by all threads, regardless of to which
interpreter they belong.
This data structure represents the state of a single thread. The only public
data member is PyInterpreterState*interp, which points to
this thread’s interpreter state.
Initialize and acquire the global interpreter lock. It should be called in the
main thread before creating a second thread or engaging in any other thread
operations such as PyEval_ReleaseThread(tstate). It is not needed before
calling PyEval_SaveThread() or PyEval_RestoreThread().
This is a no-op when called for a second time.
Changed in version 3.2:
Changed in version 3.2: This function cannot be called before Py_Initialize() anymore.
Note
When only the main thread exists, no GIL operations are needed. This is a
common situation (most Python programs do not use threads), and the lock
operations slow the interpreter down a bit. Therefore, the lock is not
created initially. This situation is equivalent to having acquired the lock:
when there is only a single thread, all object accesses are safe. Therefore,
when this function initializes the global interpreter lock, it also acquires
it. Before the Python _thread module creates a new thread, knowing
that either it has the lock or the lock hasn’t been created yet, it calls
PyEval_InitThreads(). When this call returns, it is guaranteed that
the lock has been created and that the calling thread has acquired it.
It is not safe to call this function when it is unknown which thread (if
any) currently has the global interpreter lock.
This function is not available when thread support is disabled at compile time.
Returns a non-zero value if PyEval_InitThreads() has been called. This
function can be called without holding the GIL, and therefore can be used to
avoid calls to the locking API when running single-threaded. This function is
not available when thread support is disabled at compile time.
Release the global interpreter lock (if it has been created and thread
support is enabled) and reset the thread state to NULL, returning the
previous thread state (which is not NULL). If the lock has been created,
the current thread must have acquired it. (This function is available even
when thread support is disabled at compile time.)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not be
NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
Return the current thread state. The global interpreter lock must be held.
When the current thread state is NULL, this issues a fatal error (so that
the caller needn’t check for NULL).
Swap the current thread state with the thread state given by the argument
tstate, which may be NULL. The global interpreter lock must be held
and is not released.
This function is called from PyOS_AfterFork() to ensure that newly
created child processes don’t hold locks referring to threads which
are not running in the child process.
The following functions use thread-local storage, and are not compatible
with sub-interpreters:
Ensure that the current thread is ready to call the Python C API regardless
of the current state of Python, or of the global interpreter lock. This may
be called as many times as desired by a thread as long as each call is
matched with a call to PyGILState_Release(). In general, other
thread-related APIs may be used between PyGILState_Ensure() and
PyGILState_Release() calls as long as the thread state is restored to
its previous state before the Release(). For example, normal usage of the
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros is
acceptable.
The return value is an opaque “handle” to the thread state when
PyGILState_Ensure() was called, and must be passed to
PyGILState_Release() to ensure Python is left in the same state. Even
though recursive calls are allowed, these handles cannot be shared - each
unique call to PyGILState_Ensure() must save the handle for its call
to PyGILState_Release().
When the function returns, the current thread will hold the GIL and be able
to call arbitrary Python code. Failure is a fatal error.
Release any resources previously acquired. After this call, Python’s state will
be the same as it was prior to the corresponding PyGILState_Ensure() call
(but generally this state will be unknown to the caller, hence the use of the
GILState API).
Get the current thread state for this thread. May return NULL if no
GILState API has been used on the current thread. Note that the main thread
always has such a thread-state, even if no auto-thread-state call has been
made on the main thread. This is mainly a helper/diagnostic function.
The following macros are normally used without a trailing semicolon; look for
example usage in the Python source distribution.
This macro expands to {PyThreadState*_save;_save=PyEval_SaveThread();.
Note that it contains an opening brace; it must be matched with a following
Py_END_ALLOW_THREADS macro. See above for further discussion of this
macro. It is a no-op when thread support is disabled at compile time.
This macro expands to PyEval_RestoreThread(_save);}. Note that it contains
a closing brace; it must be matched with an earlier
Py_BEGIN_ALLOW_THREADS macro. See above for further discussion of
this macro. It is a no-op when thread support is disabled at compile time.
This macro expands to PyEval_RestoreThread(_save);: it is equivalent to
Py_END_ALLOW_THREADS without the closing brace. It is a no-op when
thread support is disabled at compile time.
This macro expands to _save=PyEval_SaveThread();: it is equivalent to
Py_BEGIN_ALLOW_THREADS without the opening brace and variable
declaration. It is a no-op when thread support is disabled at compile time.
All of the following functions are only available when thread support is enabled
at compile time, and must be called only when the global interpreter lock has
been created.
Create a new interpreter state object. The global interpreter lock need not
be held, but may be held if it is necessary to serialize calls to this
function.
Destroy an interpreter state object. The global interpreter lock need not be
held. The interpreter state must have been reset with a previous call to
PyInterpreterState_Clear().
Create a new thread state object belonging to the given interpreter object.
The global interpreter lock need not be held, but may be held if it is
necessary to serialize calls to this function.
Destroy a thread state object. The global interpreter lock need not be held.
The thread state must have been reset with a previous call to
PyThreadState_Clear().
Return a dictionary in which extensions can store thread-specific state
information. Each extension should use a unique key to use to store state in
the dictionary. It is okay to call this function when no current thread state
is available. If this function returns NULL, no exception has been raised and
the caller should assume no current thread state is available.
int PyThreadState_SetAsyncExc(long id, PyObject *exc)¶
Asynchronously raise an exception in a thread. The id argument is the thread
id of the target thread; exc is the exception object to be raised. This
function does not steal any references to exc. To prevent naive misuse, you
must write your own C extension to call this. Must be called with the GIL held.
Returns the number of thread states modified; this is normally one, but will be
zero if the thread id isn’t found. If exc is NULL, the pending
exception (if any) for the thread is cleared. This raises no exceptions.
Acquire the global interpreter lock and set the current thread state to
tstate, which should not be NULL. The lock must have been created earlier.
If this thread already has the lock, deadlock ensues.
PyEval_RestoreThread() is a higher-level function which is always
available (even when thread support isn’t enabled or when threads have
not been initialized).
Reset the current thread state to NULL and release the global interpreter
lock. The lock must have been created earlier and must be held by the current
thread. The tstate argument, which must not be NULL, is only used to check
that it represents the current thread state — if it isn’t, a fatal error is
reported.
PyEval_SaveThread() is a higher-level function which is always
available (even when thread support isn’t enabled or when threads have
not been initialized).
While in most uses, you will only embed a single Python interpreter, there
are cases where you need to create several independent interpreters in the
same process and perhaps even in the same thread. Sub-interpreters allow
you to do that. You can switch between sub-interpreters using the
PyThreadState_Swap() function. You can create and destroy them
using the following functions:
Create a new sub-interpreter. This is an (almost) totally separate environment
for the execution of Python code. In particular, the new interpreter has
separate, independent versions of all imported modules, including the
fundamental modules builtins, __main__ and sys. The
table of loaded modules (sys.modules) and the module search path
(sys.path) are also separate. The new environment has no sys.argv
variable. It has new standard I/O stream file objects sys.stdin,
sys.stdout and sys.stderr (however these refer to the same underlying
file descriptors).
The return value points to the first thread state created in the new
sub-interpreter. This thread state is made in the current thread state.
Note that no actual thread is created; see the discussion of thread states
below. If creation of the new interpreter is unsuccessful, NULL is
returned; no exception is set since the exception state is stored in the
current thread state and there may not be a current thread state. (Like all
other Python/C API functions, the global interpreter lock must be held before
calling this function and is still held when it returns; however, unlike most
other Python/C API functions, there needn’t be a current thread state on
entry.)
Extension modules are shared between (sub-)interpreters as follows: the first
time a particular extension is imported, it is initialized normally, and a
(shallow) copy of its module’s dictionary is squirreled away. When the same
extension is imported by another (sub-)interpreter, a new module is initialized
and filled with the contents of this copy; the extension’s init function is
not called. Note that this is different from what happens when an extension is
imported after the interpreter has been completely re-initialized by calling
Py_Finalize() and Py_Initialize(); in that case, the extension’s
initmodule function is called again.
Destroy the (sub-)interpreter represented by the given thread state. The given
thread state must be the current thread state. See the discussion of thread
states below. When the call returns, the current thread state is NULL. All
thread states associated with this interpreter are destroyed. (The global
interpreter lock must be held before calling this function and is still held
when it returns.) Py_Finalize() will destroy all sub-interpreters that
haven’t been explicitly destroyed at that point.
Because sub-interpreters (and the main interpreter) are part of the same
process, the insulation between them isn’t perfect — for example, using
low-level file operations like os.close() they can
(accidentally or maliciously) affect each other’s open files. Because of the
way extensions are shared between (sub-)interpreters, some extensions may not
work properly; this is especially likely when the extension makes use of
(static) global variables, or when the extension manipulates its module’s
dictionary after its initialization. It is possible to insert objects created
in one sub-interpreter into a namespace of another sub-interpreter; this should
be done with great care to avoid sharing user-defined functions, methods,
instances or classes between sub-interpreters, since import operations executed
by such objects may affect the wrong (sub-)interpreter’s dictionary of loaded
modules.
Also note that combining this functionality with PyGILState_*() APIs
is delicate, because these APIs assume a bijection between Python thread states
and OS-level threads, an assumption broken by the presence of sub-interpreters.
It is highly recommended that you don’t switch sub-interpreters between a pair
of matching PyGILState_Ensure() and PyGILState_Release() calls.
Furthermore, extensions (such as ctypes) using these APIs to allow calling
of Python code from non-Python created threads will probably be broken when using
sub-interpreters.
A mechanism is provided to make asynchronous notifications to the main
interpreter thread. These notifications take the form of a function
pointer and a void argument.
Every check interval, when the global interpreter lock is released and
reacquired, Python will also call any such provided functions. This can be used
for example by asynchronous IO handlers. The notification can be scheduled from
a worker thread and the actual call than made at the earliest convenience by the
main thread where it has possession of the global interpreter lock and can
perform any Python API calls.
int Py_AddPendingCall(int (*func)(void *), void *arg)¶
Post a notification to the Python main thread. If successful, func will be
called with the argument arg at the earliest convenience. func will be
called having the global interpreter lock held and can thus use the full
Python API and can take any action such as setting object attributes to
signal IO completion. It must return 0 on success, or -1 signalling an
exception. The notification function won’t be interrupted to perform another
asynchronous notification recursively, but it can still be interrupted to
switch threads if the global interpreter lock is released, for example, if it
calls back into Python code.
This function returns 0 on success in which case the notification has been
scheduled. Otherwise, for example if the notification buffer is full, it
returns -1 without setting any exception.
This function can be called on any thread, be it a Python thread or some
other system thread. If it is a Python thread, it doesn’t matter if it holds
the global interpreter lock or not.
The Python interpreter provides some low-level support for attaching profiling
and execution tracing facilities. These are used for profiling, debugging, and
coverage analysis tools.
This C interface allows the profiling or tracing code to avoid the overhead of
calling through Python-level callable objects, making a direct C function call
instead. The essential attributes of the facility have not changed; the
interface allows trace functions to be installed per-thread, and the basic
events reported to the trace function are the same as had been reported to the
Python-level trace functions in previous versions.
int (*Py_tracefunc)(PyObject *obj, PyFrameObject *frame, int what, PyObject *arg)¶
The type of the trace function registered using PyEval_SetProfile() and
PyEval_SetTrace(). The first parameter is the object passed to the
registration function as obj, frame is the frame object to which the event
pertains, what is one of the constants PyTrace_CALL,
PyTrace_EXCEPTION, PyTrace_LINE, PyTrace_RETURN,
PyTrace_C_CALL, PyTrace_C_EXCEPTION, or
PyTrace_C_RETURN, and arg depends on the value of what:
The value of the what parameter to a Py_tracefunc function when a new
call to a function or method is being reported, or a new entry into a generator.
Note that the creation of the iterator for a generator function is not reported
as there is no control transfer to the Python bytecode in the corresponding
frame.
The value of the what parameter to a Py_tracefunc function when an
exception has been raised. The callback function is called with this value for
what when after any bytecode is processed after which the exception becomes
set within the frame being executed. The effect of this is that as exception
propagation causes the Python stack to unwind, the callback is called upon
return to each frame as the exception propagates. Only trace functions receives
these events; they are not needed by the profiler.
Set the profiler function to func. The obj parameter is passed to the
function as its first parameter, and may be any Python object, or NULL. If
the profile function needs to maintain state, using a different value for obj
for each thread provides a convenient and thread-safe place to store it. The
profile function is called for all monitored events except the line-number
events.
Memory management in Python involves a private heap containing all Python
objects and data structures. The management of this private heap is ensured
internally by the Python memory manager. The Python memory manager has
different components which deal with various dynamic storage management aspects,
like sharing, segmentation, preallocation or caching.
At the lowest level, a raw memory allocator ensures that there is enough room in
the private heap for storing all Python-related data by interacting with the
memory manager of the operating system. On top of the raw memory allocator,
several object-specific allocators operate on the same heap and implement
distinct memory management policies adapted to the peculiarities of every object
type. For example, integer objects are managed differently within the heap than
strings, tuples or dictionaries because integers imply different storage
requirements and speed/space tradeoffs. The Python memory manager thus delegates
some of the work to the object-specific allocators, but ensures that the latter
operate within the bounds of the private heap.
It is important to understand that the management of the Python heap is
performed by the interpreter itself and that the user has no control over it,
even if she regularly manipulates object pointers to memory blocks inside that
heap. The allocation of heap space for Python objects and other internal
buffers is performed on demand by the Python memory manager through the Python/C
API functions listed in this document.
To avoid memory corruption, extension writers should never try to operate on
Python objects with the functions exported by the C library: malloc(),
calloc(), realloc() and free(). This will result in mixed
calls between the C allocator and the Python memory manager with fatal
consequences, because they implement different algorithms and operate on
different heaps. However, one may safely allocate and release memory blocks
with the C library allocator for individual purposes, as shown in the following
example:
PyObject*res;char*buf=(char*)malloc(BUFSIZ);/* for I/O */if(buf==NULL)returnPyErr_NoMemory();...DosomeI/Ooperationinvolvingbuf...res=PyString_FromString(buf);free(buf);/* malloc'ed */returnres;
In this example, the memory request for the I/O buffer is handled by the C
library allocator. The Python memory manager is involved only in the allocation
of the string object returned as a result.
In most situations, however, it is recommended to allocate memory from the
Python heap specifically because the latter is under control of the Python
memory manager. For example, this is required when the interpreter is extended
with new object types written in C. Another reason for using the Python heap is
the desire to inform the Python memory manager about the memory needs of the
extension module. Even when the requested memory is used exclusively for
internal, highly-specific purposes, delegating all memory requests to the Python
memory manager causes the interpreter to have a more accurate image of its
memory footprint as a whole. Consequently, under certain circumstances, the
Python memory manager may or may not trigger appropriate actions, like garbage
collection, memory compaction or other preventive procedures. Note that by using
the C library allocator as shown in the previous example, the allocated memory
for the I/O buffer escapes completely the Python memory manager.
The following function sets, modeled after the ANSI C standard, but specifying
behavior when requesting zero bytes, are available for allocating and releasing
memory from the Python heap:
Allocates n bytes and returns a pointer of type void* to the
allocated memory, or NULL if the request fails. Requesting zero bytes returns
a distinct non-NULL pointer if possible, as if PyMem_Malloc(1)() had
been called instead. The memory will not have been initialized in any way.
Resizes the memory block pointed to by p to n bytes. The contents will be
unchanged to the minimum of the old and the new sizes. If p is NULL, the
call is equivalent to PyMem_Malloc(n)(); else if n is equal to zero,
the memory block is resized but is not freed, and the returned pointer is
non-NULL. Unless p is NULL, it must have been returned by a previous call
to PyMem_Malloc() or PyMem_Realloc(). If the request fails,
PyMem_Realloc() returns NULL and p remains a valid pointer to the
previous memory area.
Frees the memory block pointed to by p, which must have been returned by a
previous call to PyMem_Malloc() or PyMem_Realloc(). Otherwise, or
if PyMem_Free(p)() has been called before, undefined behavior occurs. If
p is NULL, no operation is performed.
The following type-oriented macros are provided for convenience. Note that
TYPE refers to any C type.
Same as PyMem_Malloc(), but allocates (n*sizeof(TYPE)) bytes of
memory. Returns a pointer cast to TYPE*. The memory will not have
been initialized in any way.
Same as PyMem_Realloc(), but the memory block is resized to (n*sizeof(TYPE)) bytes. Returns a pointer cast to TYPE*. On return,
p will be a pointer to the new memory area, or NULL in the event of
failure. This is a C preprocessor macro; p is always reassigned. Save
the original value of p to avoid losing memory when handling errors.
In addition, the following macro sets are provided for calling the Python memory
allocator directly, without involving the C API functions listed above. However,
note that their use does not preserve binary compatibility across Python
versions and is therefore deprecated in extension modules.
Here is the example from section Overview, rewritten so that the
I/O buffer is allocated from the Python heap by using the first function set:
PyObject*res;char*buf=(char*)PyMem_Malloc(BUFSIZ);/* for I/O */if(buf==NULL)returnPyErr_NoMemory();/* ...Do some I/O operation involving buf... */res=PyString_FromString(buf);PyMem_Free(buf);/* allocated with PyMem_Malloc */returnres;
The same code using the type-oriented function set:
PyObject*res;char*buf=PyMem_New(char,BUFSIZ);/* for I/O */if(buf==NULL)returnPyErr_NoMemory();/* ...Do some I/O operation involving buf... */res=PyString_FromString(buf);PyMem_Del(buf);/* allocated with PyMem_New */returnres;
Note that in the two examples above, the buffer is always manipulated via
functions belonging to the same set. Indeed, it is required to use the same
memory API family for a given memory block, so that the risk of mixing different
allocators is reduced to a minimum. The following code sequence contains two
errors, one of which is labeled as fatal because it mixes two different
allocators operating on different heaps.
char*buf1=PyMem_New(char,BUFSIZ);char*buf2=(char*)malloc(BUFSIZ);char*buf3=(char*)PyMem_Malloc(BUFSIZ);...PyMem_Del(buf3);/* Wrong -- should be PyMem_Free() */free(buf2);/* Right -- allocated via malloc() */free(buf1);/* Fatal -- should be PyMem_Del() */
In addition to the functions aimed at handling raw memory blocks from the Python
heap, objects in Python are allocated and released with PyObject_New(),
PyObject_NewVar() and PyObject_Del().
These will be explained in the next chapter on defining and implementing new
object types in C.
Initialize a newly-allocated object op with its type and initial
reference. Returns the initialized object. If type indicates that the
object participates in the cyclic garbage detector, it is added to the
detector’s set of observed objects. Other fields of the object are not
affected.
Allocate a new Python object using the C structure type TYPE and the
Python type object type. Fields not defined by the Python object header
are not initialized; the object’s reference count will be one. The size of
the memory allocation is determined from the tp_basicsize field of
the type object.
Allocate a new Python object using the C structure type TYPE and the
Python type object type. Fields not defined by the Python object header
are not initialized. The allocated memory allows for the TYPE structure
plus size fields of the size given by the tp_itemsize field of
type. This is useful for implementing objects like tuples, which are
able to determine their size at construction time. Embedding the array of
fields into the same allocation decreases the number of allocations,
improving the memory management efficiency.
Releases memory allocated to an object using PyObject_New() or
PyObject_NewVar(). This is normally called from the
tp_dealloc handler specified in the object’s type. The fields of
the object should not be accessed after this call as the memory is no
longer a valid Python object.
There are a large number of structures which are used in the definition of
object types for Python. This section describes these structures and how they
are used.
All Python objects ultimately share a small number of fields at the beginning
of the object’s representation in memory. These are represented by the
PyObject and PyVarObject types, which are defined, in turn,
by the expansions of some macros also used, whether directly or indirectly, in
the definition of all other Python objects.
All object types are extensions of this type. This is a type which
contains the information Python needs to treat a pointer to an object as an
object. In a normal “release” build, it contains only the object’s
reference count and a pointer to the corresponding type object. It
corresponds to the fields defined by the expansion of the PyObject_HEAD
macro.
This is an extension of PyObject that adds the ob_size
field. This is only used for objects that have some notion of length.
This type does not often appear in the Python/C API. It corresponds to the
fields defined by the expansion of the PyObject_VAR_HEAD macro.
This is a macro which expands to the declarations of the fields of the
PyObject type; it is used when declaring new types which represent
objects without a varying length. The specific fields it expands to depend
on the definition of Py_TRACE_REFS. By default, that macro is
not defined, and PyObject_HEAD expands to:
This is a macro which expands to the declarations of the fields of the
PyVarObject type; it is used when declaring new types which
represent objects with a length that varies from instance to instance.
This macro always expands to:
PyObject_HEADPy_ssize_tob_size;
Note that PyObject_HEAD is part of the expansion, and that its own
expansion varies depending on the definition of Py_TRACE_REFS.
Type of the functions used to implement most Python callables in C.
Functions of this type take two PyObject* parameters and return
one such value. If the return value is NULL, an exception shall have
been set. If not NULL, the return value is interpreted as the return
value of the function as exposed in Python. The function must return a new
reference.
Type of the functions used to implement Python callables in C that take
keyword arguments: they take three PyObject* parameters and return
one such value. See PyCFunction above for the meaning of the return
value.
Structure used to describe a method of an extension type. This structure has
four fields:
Field
C Type
Meaning
ml_name
char *
name of the method
ml_meth
PyCFunction
pointer to the C
implementation
ml_flags
int
flag bits indicating how the
call should be constructed
ml_doc
char *
points to the contents of the
docstring
The ml_meth is a C function pointer. The functions may be of different
types, but they always return PyObject*. If the function is not of
the PyCFunction, the compiler will require a cast in the method table.
Even though PyCFunction defines the first parameter as
PyObject*, it is common that the method implementation uses a the
specific C type of the self object.
The ml_flags field is a bitfield which can include the following flags.
The individual flags indicate either a calling convention or a binding
convention. Of the calling convention flags, only METH_VARARGS and
METH_KEYWORDS can be combined (but note that METH_KEYWORDS
alone is equivalent to METH_VARARGS|METH_KEYWORDS). Any of the calling
convention flags can be combined with a binding flag.
This is the typical calling convention, where the methods have the type
PyCFunction. The function expects two PyObject* values.
The first one is the self object for methods; for module functions, it is
the module object. The second parameter (often called args) is a tuple
object representing all arguments. This parameter is typically processed
using PyArg_ParseTuple() or PyArg_UnpackTuple().
Methods with these flags must be of type PyCFunctionWithKeywords.
The function expects three parameters: self, args, and a dictionary of
all the keyword arguments. The flag is typically combined with
METH_VARARGS, and the parameters are typically processed using
PyArg_ParseTupleAndKeywords().
Methods without parameters don’t need to check whether arguments are given if
they are listed with the METH_NOARGS flag. They need to be of type
PyCFunction. The first parameter is typically named self and will
hold a reference to the module or object instance. In all cases the second
parameter will be NULL.
Methods with a single object argument can be listed with the METH_O
flag, instead of invoking PyArg_ParseTuple() with a "O" argument.
They have the type PyCFunction, with the self parameter, and a
PyObject* parameter representing the single argument.
These two constants are not used to indicate the calling convention but the
binding when use with methods of classes. These may not be used for functions
defined for modules. At most one of these flags may be set for any given
method.
The method will be passed the type object as the first parameter rather
than an instance of the type. This is used to create class methods,
similar to what is created when using the classmethod() built-in
function.
The method will be passed NULL as the first parameter rather than an
instance of the type. This is used to create static methods, similar to
what is created when using the staticmethod() built-in function.
One other constant controls whether a method is loaded in place of another
definition with the same method name.
The method will be loaded in place of existing definitions. Without
METH_COEXIST, the default is to skip repeated definitions. Since slot
wrappers are loaded before the method table, the existence of a
sq_contains slot, for example, would generate a wrapped method named
__contains__() and preclude the loading of a corresponding
PyCFunction with the same name. With the flag defined, the PyCFunction
will be loaded in place of the wrapper object and will co-exist with the
slot. This is helpful because calls to PyCFunctions are optimized more
than wrapper object calls.
the offset in bytes that the
member is located on the
type’s object struct
flags
int
flag bits indicating if the
field should be read-only or
writable
doc
char *
points to the contents of the
docstring
type can be one of many T_ macros corresponding to various C
types. When the member is accessed in Python, it will be converted to the
equivalent Python type.
Macro name
C type
T_SHORT
short
T_INT
int
T_LONG
long
T_FLOAT
float
T_DOUBLE
double
T_STRING
char *
T_OBJECT
PyObject *
T_OBJECT_EX
PyObject *
T_CHAR
char
T_BYTE
char
T_UBYTE
unsigned char
T_UINT
unsigned int
T_USHORT
unsigned short
T_ULONG
unsigned long
T_BOOL
char
T_LONGLONG
long long
T_ULONGLONG
unsigned long long
T_PYSSIZET
Py_ssize_t
T_OBJECT and T_OBJECT_EX differ in that
T_OBJECT returns None if the member is NULL and
T_OBJECT_EX raises an AttributeError. Try to use
T_OBJECT_EX over T_OBJECT because T_OBJECT_EX
handles use of the del statement on that attribute more correctly
than T_OBJECT.
flags can be 0 for write and read access or READONLY for
read-only access. Using T_STRING for type implies
READONLY. Only T_OBJECT and T_OBJECT_EX
members can be deleted. (They are set to NULL).
Perhaps one of the most important structures of the Python object system is the
structure that defines a new type: the PyTypeObject structure. Type
objects can be handled using any of the PyObject_*() or
PyType_*() functions, but do not offer much that’s interesting to most
Python applications. These objects are fundamental to how objects behave, so
they are very important to the interpreter itself and to any extension module
that implements new types.
Type objects are fairly large compared to most of the standard types. The reason
for the size is that each type object stores a large number of values, mostly C
function pointers, each of which implements a small part of the type’s
functionality. The fields of the type object are examined in detail in this
section. The fields will be described in the order in which they occur in the
structure.
The structure definition for PyTypeObject can be found in
Include/object.h. For convenience of reference, this repeats the
definition found there:
typedefstruct_typeobject{PyObject_VAR_HEADchar*tp_name;/* For printing, in format "<module>.<name>" */inttp_basicsize,tp_itemsize;/* For allocation *//* Methods to implement standard operations */destructortp_dealloc;printfunctp_print;getattrfunctp_getattr;setattrfunctp_setattr;void*tp_reserved;reprfunctp_repr;/* Method suites for standard classes */PyNumberMethods*tp_as_number;PySequenceMethods*tp_as_sequence;PyMappingMethods*tp_as_mapping;/* More standard operations (here for binary compatibility) */hashfunctp_hash;ternaryfunctp_call;reprfunctp_str;getattrofunctp_getattro;setattrofunctp_setattro;/* Functions to access object as input/output buffer */PyBufferProcs*tp_as_buffer;/* Flags to define presence of optional/expanded features */longtp_flags;char*tp_doc;/* Documentation string *//* call function for all accessible objects */traverseproctp_traverse;/* delete references to contained objects */inquirytp_clear;/* rich comparisons */richcmpfunctp_richcompare;/* weak reference enabler */longtp_weaklistoffset;/* Iterators */getiterfunctp_iter;iternextfunctp_iternext;/* Attribute descriptor and subclassing stuff */structPyMethodDef*tp_methods;structPyMemberDef*tp_members;structPyGetSetDef*tp_getset;struct_typeobject*tp_base;PyObject*tp_dict;descrgetfunctp_descr_get;descrsetfunctp_descr_set;longtp_dictoffset;initproctp_init;allocfunctp_alloc;newfunctp_new;freefunctp_free;/* Low-level free-memory routine */inquirytp_is_gc;/* For PyObject_IS_GC */PyObject*tp_bases;PyObject*tp_mro;/* method resolution order */PyObject*tp_cache;PyObject*tp_subclasses;PyObject*tp_weaklist;}PyTypeObject;
The type object structure extends the PyVarObject structure. The
ob_size field is used for dynamic types (created by type_new(),
usually called from a class statement). Note that PyType_Type (the
metatype) initializes tp_itemsize, which means that its instances (i.e.
type objects) must have the ob_size field.
These fields are only present when the macro Py_TRACE_REFS is defined.
Their initialization to NULL is taken care of by the PyObject_HEAD_INIT
macro. For statically allocated objects, these fields always remain NULL.
For dynamically allocated objects, these two fields are used to link the object
into a doubly-linked list of all live objects on the heap. This could be used
for various debugging purposes; currently the only use is to print the objects
that are still alive at the end of a run when the environment variable
PYTHONDUMPREFS is set.
This is the type object’s reference count, initialized to 1 by the
PyObject_HEAD_INIT macro. Note that for statically allocated type objects,
the type’s instances (objects whose ob_type points back to the type) do
not count as references. But for dynamically allocated type objects, the
instances do count as references.
This is the type’s type, in other words its metatype. It is initialized by the
argument to the PyObject_HEAD_INIT macro, and its value should normally be
&PyType_Type. However, for dynamically loadable extension modules that must
be usable on Windows (at least), the compiler complains that this is not a valid
initializer. Therefore, the convention is to pass NULL to the
PyObject_HEAD_INIT macro and to initialize this field explicitly at the
start of the module’s initialization function, before doing anything else. This
is typically done like this:
Foo_Type.ob_type=&PyType_Type;
This should be done before any instances of the type are created.
PyType_Ready() checks if ob_type is NULL, and if so,
initializes it to the ob_type field of the base class.
PyType_Ready() will not change this field if it is non-zero.
For statically allocated type objects, this should be initialized to zero. For
dynamically allocated type objects, this field has a special internal meaning.
Pointer to a NUL-terminated string containing the name of the type. For types
that are accessible as module globals, the string should be the full module
name, followed by a dot, followed by the type name; for built-in types, it
should be just the type name. If the module is a submodule of a package, the
full package name is part of the full module name. For example, a type named
T defined in module M in subpackage Q in package P
should have the tp_name initializer "P.Q.M.T".
For dynamically allocated type objects, this should just be the type name, and
the module name explicitly stored in the type dict as the value for key
'__module__'.
For statically allocated type objects, the tp_name field should contain a dot.
Everything before the last dot is made accessible as the __module__
attribute, and everything after the last dot is made accessible as the
__name__ attribute.
If no dot is present, the entire tp_name field is made accessible as the
__name__ attribute, and the __module__ attribute is undefined
(unless explicitly set in the dictionary, as explained above). This means your
type will be impossible to pickle.
These fields allow calculating the size in bytes of instances of the type.
There are two kinds of types: types with fixed-length instances have a zero
tp_itemsize field, types with variable-length instances have a non-zero
tp_itemsize field. For a type with fixed-length instances, all
instances have the same size, given in tp_basicsize.
For a type with variable-length instances, the instances must have an
ob_size field, and the instance size is tp_basicsize plus N
times tp_itemsize, where N is the “length” of the object. The value of
N is typically stored in the instance’s ob_size field. There are
exceptions: for example, ints use a negative ob_size to indicate a
negative number, and N is abs(ob_size) there. Also, the presence of an
ob_size field in the instance layout doesn’t mean that the instance
structure is variable-length (for example, the structure for the list type has
fixed-length instances, yet those instances have a meaningful ob_size
field).
The basic size includes the fields in the instance declared by the macro
PyObject_HEAD or PyObject_VAR_HEAD (whichever is used to
declare the instance struct) and this in turn includes the _ob_prev and
_ob_next fields if they are present. This means that the only correct
way to get an initializer for the tp_basicsize is to use the
sizeof operator on the struct used to declare the instance layout.
The basic size does not include the GC header size.
These fields are inherited separately by subtypes. If the base type has a
non-zero tp_itemsize, it is generally not safe to set
tp_itemsize to a different non-zero value in a subtype (though this
depends on the implementation of the base type).
A note about alignment: if the variable items require a particular alignment,
this should be taken care of by the value of tp_basicsize. Example:
suppose a type implements an array of double. tp_itemsize is
sizeof(double). It is the programmer’s responsibility that
tp_basicsize is a multiple of sizeof(double) (assuming this is the
alignment requirement for double).
A pointer to the instance destructor function. This function must be defined
unless the type guarantees that its instances will never be deallocated (as is
the case for the singletons None and Ellipsis).
The destructor function is called by the Py_DECREF() and
Py_XDECREF() macros when the new reference count is zero. At this point,
the instance is still in existence, but there are no references to it. The
destructor function should free all references which the instance owns, free all
memory buffers owned by the instance (using the freeing function corresponding
to the allocation function used to allocate the buffer), and finally (as its
last action) call the type’s tp_free function. If the type is not
subtypable (doesn’t have the Py_TPFLAGS_BASETYPE flag bit set), it is
permissible to call the object deallocator directly instead of via
tp_free. The object deallocator should be the one used to allocate the
instance; this is normally PyObject_Del() if the instance was allocated
using PyObject_New() or PyObject_VarNew(), or
PyObject_GC_Del() if the instance was allocated using
PyObject_GC_New() or PyObject_GC_NewVar().
An optional pointer to the instance print function.
The print function is only called when the instance is printed to a real file;
when it is printed to a pseudo-file (like a StringIO instance), the
instance’s tp_repr or tp_str function is called to convert it to
a string. These are also called when the type’s tp_print field is
NULL. A type should never implement tp_print in a way that produces
different output than tp_repr or tp_str would.
The print function is called with the same signature as PyObject_Print():
inttp_print(PyObject*self,FILE*file,intflags). The self argument is
the instance to be printed. The file argument is the stdio file to which it
is to be printed. The flags argument is composed of flag bits. The only flag
bit currently defined is Py_PRINT_RAW. When the Py_PRINT_RAW
flag bit is set, the instance should be printed the same way as tp_str
would format it; when the Py_PRINT_RAW flag bit is clear, the instance
should be printed the same was as tp_repr would format it. It should
return -1 and set an exception condition when an error occurred during the
comparison.
It is possible that the tp_print field will be deprecated. In any case,
it is recommended not to define tp_print, but instead to rely on
tp_repr and tp_str for printing.
An optional pointer to the get-attribute-string function.
This field is deprecated. When it is defined, it should point to a function
that acts the same as the tp_getattro function, but taking a C string
instead of a Python string object to give the attribute name. The signature is
the same as for PyObject_GetAttrString().
This field is inherited by subtypes together with tp_getattro: a subtype
inherits both tp_getattr and tp_getattro from its base type when
the subtype’s tp_getattr and tp_getattro are both NULL.
An optional pointer to the set-attribute-string function.
This field is deprecated. When it is defined, it should point to a function
that acts the same as the tp_setattro function, but taking a C string
instead of a Python string object to give the attribute name. The signature is
the same as for PyObject_SetAttrString().
This field is inherited by subtypes together with tp_setattro: a subtype
inherits both tp_setattr and tp_setattro from its base type when
the subtype’s tp_setattr and tp_setattro are both NULL.
An optional pointer to a function that implements the built-in function
repr().
The signature is the same as for PyObject_Repr(); it must return a string
or a Unicode object. Ideally, this function should return a string that, when
passed to eval(), given a suitable environment, returns an object with the
same value. If this is not feasible, it should return a string starting with
'<' and ending with '>' from which both the type and the value of the
object can be deduced.
When this field is not set, a string of the form <%sobjectat%p> is
returned, where %s is replaced by the type name, and %p by the object’s
memory address.
Pointer to an additional structure that contains fields relevant only to
objects which implement the number protocol. These fields are documented in
Number Object Structures.
The tp_as_number field is not inherited, but the contained fields are
inherited individually.
Pointer to an additional structure that contains fields relevant only to
objects which implement the sequence protocol. These fields are documented
in Sequence Object Structures.
The tp_as_sequence field is not inherited, but the contained fields
are inherited individually.
Pointer to an additional structure that contains fields relevant only to
objects which implement the mapping protocol. These fields are documented in
Mapping Object Structures.
The tp_as_mapping field is not inherited, but the contained fields
are inherited individually.
An optional pointer to a function that implements the built-in function
hash().
The signature is the same as for PyObject_Hash(); it must return a
value of the type Py_hash_t. The value -1 should not be returned as a
normal return value; when an error occurs during the computation of the hash
value, the function should set an exception and return -1.
This field can be set explicitly to PyObject_HashNotImplemented() to
block inheritance of the hash method from a parent type. This is interpreted
as the equivalent of __hash__=None at the Python level, causing
isinstance(o,collections.Hashable) to correctly return False. Note
that the converse is also true - setting __hash__=None on a class at
the Python level will result in the tp_hash slot being set to
PyObject_HashNotImplemented().
When this field is not set, an attempt to take the hash of the
object raises TypeError.
This field is inherited by subtypes together with
tp_richcompare: a subtype inherits both of
tp_richcompare and tp_hash, when the subtype’s
tp_richcompare and tp_hash are both NULL.
An optional pointer to a function that implements calling the object. This
should be NULL if the object is not callable. The signature is the same as
for PyObject_Call().
An optional pointer to a function that implements the built-in operation
str(). (Note that str is a type now, and str() calls the
constructor for that type. This constructor calls PyObject_Str() to do
the actual work, and PyObject_Str() will call this handler.)
The signature is the same as for PyObject_Str(); it must return a string
or a Unicode object. This function should return a “friendly” string
representation of the object, as this is the representation that will be used,
among other things, by the print() function.
When this field is not set, PyObject_Repr() is called to return a string
representation.
An optional pointer to the get-attribute function.
The signature is the same as for PyObject_GetAttr(). It is usually
convenient to set this field to PyObject_GenericGetAttr(), which
implements the normal way of looking for object attributes.
This field is inherited by subtypes together with tp_getattr: a subtype
inherits both tp_getattr and tp_getattro from its base type when
the subtype’s tp_getattr and tp_getattro are both NULL.
An optional pointer to the set-attribute function.
The signature is the same as for PyObject_SetAttr(). It is usually
convenient to set this field to PyObject_GenericSetAttr(), which
implements the normal way of setting object attributes.
This field is inherited by subtypes together with tp_setattr: a subtype
inherits both tp_setattr and tp_setattro from its base type when
the subtype’s tp_setattr and tp_setattro are both NULL.
Pointer to an additional structure that contains fields relevant only to objects
which implement the buffer interface. These fields are documented in
Buffer Object Structures.
The tp_as_buffer field is not inherited, but the contained fields are
inherited individually.
This field is a bit mask of various flags. Some flags indicate variant
semantics for certain situations; others are used to indicate that certain
fields in the type object (or in the extension structures referenced via
tp_as_number, tp_as_sequence, tp_as_mapping, and
tp_as_buffer) that were historically not always present are valid; if
such a flag bit is clear, the type fields it guards must not be accessed and
must be considered to have a zero or NULL value instead.
Inheritance of this field is complicated. Most flag bits are inherited
individually, i.e. if the base type has a flag bit set, the subtype inherits
this flag bit. The flag bits that pertain to extension structures are strictly
inherited if the extension structure is inherited, i.e. the base type’s value of
the flag bit is copied into the subtype together with a pointer to the extension
structure. The Py_TPFLAGS_HAVE_GC flag bit is inherited together with
the tp_traverse and tp_clear fields, i.e. if the
Py_TPFLAGS_HAVE_GC flag bit is clear in the subtype and the
tp_traverse and tp_clear fields in the subtype exist and have
NULL values.
The following bit masks are currently defined; these can be ORed together using
the | operator to form the value of the tp_flags field. The macro
PyType_HasFeature() takes a type and a flags value, tp and f, and
checks whether tp->tp_flags&f is non-zero.
This bit is set when the type object itself is allocated on the heap. In this
case, the ob_type field of its instances is considered a reference to
the type, and the type object is INCREF’ed when a new instance is created, and
DECREF’ed when an instance is destroyed (this does not apply to instances of
subtypes; only the type referenced by the instance’s ob_type gets INCREF’ed or
DECREF’ed).
This bit is set when the type can be used as the base type of another type. If
this bit is clear, the type cannot be subtyped (similar to a “final” class in
Java).
This bit is set when the object supports garbage collection. If this bit
is set, instances must be created using PyObject_GC_New() and
destroyed using PyObject_GC_Del(). More information in section
Supporting Cyclic Garbage Collection. This bit also implies that the
GC-related fields tp_traverse and tp_clear are present in
the type object.
This is a bitmask of all the bits that pertain to the existence of certain
fields in the type object and its extension structures. Currently, it includes
the following bits: Py_TPFLAGS_HAVE_STACKLESS_EXTENSION,
Py_TPFLAGS_HAVE_VERSION_TAG.
An optional pointer to a NUL-terminated C string giving the docstring for this
type object. This is exposed as the __doc__ attribute on the type and
instances of the type.
An optional pointer to a traversal function for the garbage collector. This is
only used if the Py_TPFLAGS_HAVE_GC flag bit is set. More information
about Python’s garbage collection scheme can be found in section
Supporting Cyclic Garbage Collection.
The tp_traverse pointer is used by the garbage collector to detect
reference cycles. A typical implementation of a tp_traverse function
simply calls Py_VISIT() on each of the instance’s members that are Python
objects. For example, this is function local_traverse() from the
_thread extension module:
Note that Py_VISIT() is called only on those members that can participate
in reference cycles. Although there is also a self->key member, it can only
be NULL or a Python string and therefore cannot be part of a reference cycle.
On the other hand, even if you know a member can never be part of a cycle, as a
debugging aid you may want to visit it anyway just so the gc module’s
get_referents() function will include it.
Note that Py_VISIT() requires the visit and arg parameters to
local_traverse() to have these specific names; don’t name them just
anything.
This field is inherited by subtypes together with tp_clear and the
Py_TPFLAGS_HAVE_GC flag bit: the flag bit, tp_traverse, and
tp_clear are all inherited from the base type if they are all zero in
the subtype.
An optional pointer to a clear function for the garbage collector. This is only
used if the Py_TPFLAGS_HAVE_GC flag bit is set.
The tp_clear member function is used to break reference cycles in cyclic
garbage detected by the garbage collector. Taken together, all tp_clear
functions in the system must combine to break all reference cycles. This is
subtle, and if in any doubt supply a tp_clear function. For example,
the tuple type does not implement a tp_clear function, because it’s
possible to prove that no reference cycle can be composed entirely of tuples.
Therefore the tp_clear functions of other types must be sufficient to
break any cycle containing a tuple. This isn’t immediately obvious, and there’s
rarely a good reason to avoid implementing tp_clear.
Implementations of tp_clear should drop the instance’s references to
those of its members that may be Python objects, and set its pointers to those
members to NULL, as in the following example:
The Py_CLEAR() macro should be used, because clearing references is
delicate: the reference to the contained object must not be decremented until
after the pointer to the contained object is set to NULL. This is because
decrementing the reference count may cause the contained object to become trash,
triggering a chain of reclamation activity that may include invoking arbitrary
Python code (due to finalizers, or weakref callbacks, associated with the
contained object). If it’s possible for such code to reference self again,
it’s important that the pointer to the contained object be NULL at that time,
so that self knows the contained object can no longer be used. The
Py_CLEAR() macro performs the operations in a safe order.
Because the goal of tp_clear functions is to break reference cycles,
it’s not necessary to clear contained objects like Python strings or Python
integers, which can’t participate in reference cycles. On the other hand, it may
be convenient to clear all contained Python objects, and write the type’s
tp_dealloc function to invoke tp_clear.
This field is inherited by subtypes together with tp_traverse and the
Py_TPFLAGS_HAVE_GC flag bit: the flag bit, tp_traverse, and
tp_clear are all inherited from the base type if they are all zero in
the subtype.
An optional pointer to the rich comparison function, whose signature is
PyObject*tp_richcompare(PyObject*a,PyObject*b,intop).
The function should return the result of the comparison (usually Py_True
or Py_False). If the comparison is undefined, it must return
Py_NotImplemented, if another error occurred it must return NULL and
set an exception condition.
Note
If you want to implement a type for which only a limited set of
comparisons makes sense (e.g. == and !=, but not < and
friends), directly raise TypeError in the rich comparison function.
This field is inherited by subtypes together with tp_hash:
a subtype inherits tp_richcompare and tp_hash when
the subtype’s tp_richcompare and tp_hash are both
NULL.
The following constants are defined to be used as the third argument for
tp_richcompare and for PyObject_RichCompare():
If the instances of this type are weakly referenceable, this field is greater
than zero and contains the offset in the instance structure of the weak
reference list head (ignoring the GC header, if present); this offset is used by
PyObject_ClearWeakRefs() and the PyWeakref_*() functions. The
instance structure needs to include a field of type PyObject* which is
initialized to NULL.
Do not confuse this field with tp_weaklist; that is the list head for
weak references to the type object itself.
This field is inherited by subtypes, but see the rules listed below. A subtype
may override this offset; this means that the subtype uses a different weak
reference list head than the base type. Since the list head is always found via
tp_weaklistoffset, this should not be a problem.
When a type defined by a class statement has no __slots__ declaration,
and none of its base types are weakly referenceable, the type is made weakly
referenceable by adding a weak reference list head slot to the instance layout
and setting the tp_weaklistoffset of that slot’s offset.
When a type’s __slots__ declaration contains a slot named
__weakref__, that slot becomes the weak reference list head for
instances of the type, and the slot’s offset is stored in the type’s
tp_weaklistoffset.
When a type’s __slots__ declaration does not contain a slot named
__weakref__, the type inherits its tp_weaklistoffset from its
base type.
An optional pointer to a function that returns an iterator for the object. Its
presence normally signals that the instances of this type are iterable (although
sequences may be iterable without this function).
An optional pointer to a function that returns the next item in an iterator.
When the iterator is exhausted, it must return NULL; a StopIteration
exception may or may not be set. When another error occurs, it must return
NULL too. Its presence signals that the instances of this type are
iterators.
Iterator types should also define the tp_iter function, and that
function should return the iterator instance itself (not a new iterator
instance).
This function has the same signature as PyIter_Next().
An optional pointer to a static NULL-terminated array of PyMemberDef
structures, declaring regular data members (fields or slots) of instances of
this type.
For each entry in the array, an entry is added to the type’s dictionary (see
tp_dict below) containing a member descriptor.
This field is not inherited by subtypes (members are inherited through a
different mechanism).
An optional pointer to a static NULL-terminated array of PyGetSetDef
structures, declaring computed attributes of instances of this type.
For each entry in the array, an entry is added to the type’s dictionary (see
tp_dict below) containing a getset descriptor.
This field is not inherited by subtypes (computed attributes are inherited
through a different mechanism).
Docs for PyGetSetDef:
typedefPyObject*(*getter)(PyObject*,void*);typedefint(*setter)(PyObject*,PyObject*,void*);typedefstructPyGetSetDef{char*name;/* attribute name */getterget;/* C function to get the attribute */setterset;/* C function to set the attribute */char*doc;/* optional doc string */void*closure;/* optional additional data for getter and setter */}PyGetSetDef;
An optional pointer to a base type from which type properties are inherited. At
this level, only single inheritance is supported; multiple inheritance require
dynamically creating a type object by calling the metatype.
This field is not inherited by subtypes (obviously), but it defaults to
&PyBaseObject_Type (which to Python programmers is known as the type
object).
This field should normally be initialized to NULL before PyType_Ready is
called; it may also be initialized to a dictionary containing initial attributes
for the type. Once PyType_Ready() has initialized the type, extra
attributes for the type may be added to this dictionary only if they don’t
correspond to overloaded operations (like __add__()).
This field is not inherited by subtypes (though the attributes defined in here
are inherited through a different mechanism).
Warning
It is not safe to use PyDict_SetItem() on or otherwise modify
tp_dict with the dictionary C-API.
If the instances of this type have a dictionary containing instance variables,
this field is non-zero and contains the offset in the instances of the type of
the instance variable dictionary; this offset is used by
PyObject_GenericGetAttr().
Do not confuse this field with tp_dict; that is the dictionary for
attributes of the type object itself.
If the value of this field is greater than zero, it specifies the offset from
the start of the instance structure. If the value is less than zero, it
specifies the offset from the end of the instance structure. A negative
offset is more expensive to use, and should only be used when the instance
structure contains a variable-length part. This is used for example to add an
instance variable dictionary to subtypes of str or tuple. Note
that the tp_basicsize field should account for the dictionary added to
the end in that case, even though the dictionary is not included in the basic
object layout. On a system with a pointer size of 4 bytes,
tp_dictoffset should be set to -4 to indicate that the dictionary is
at the very end of the structure.
The real dictionary offset in an instance can be computed from a negative
tp_dictoffset as follows:
where tp_basicsize, tp_itemsize and tp_dictoffset are
taken from the type object, and ob_size is taken from the instance. The
absolute value is taken because ints use the sign of ob_size to
store the sign of the number. (There’s never a need to do this calculation
yourself; it is done for you by _PyObject_GetDictPtr().)
This field is inherited by subtypes, but see the rules listed below. A subtype
may override this offset; this means that the subtype instances store the
dictionary at a difference offset than the base type. Since the dictionary is
always found via tp_dictoffset, this should not be a problem.
When a type defined by a class statement has no __slots__ declaration,
and none of its base types has an instance variable dictionary, a dictionary
slot is added to the instance layout and the tp_dictoffset is set to
that slot’s offset.
When a type defined by a class statement has a __slots__ declaration,
the type inherits its tp_dictoffset from its base type.
(Adding a slot named __dict__ to the __slots__ declaration does
not have the expected effect, it just causes confusion. Maybe this should be
added as a feature just like __weakref__ though.)
An optional pointer to an instance initialization function.
This function corresponds to the __init__() method of classes. Like
__init__(), it is possible to create an instance without calling
__init__(), and it is possible to reinitialize an instance by calling its
__init__() method again.
The self argument is the instance to be initialized; the args and kwds
arguments represent positional and keyword arguments of the call to
__init__().
The tp_init function, if not NULL, is called when an instance is
created normally by calling its type, after the type’s tp_new function
has returned an instance of the type. If the tp_new function returns an
instance of some other type that is not a subtype of the original type, no
tp_init function is called; if tp_new returns an instance of a
subtype of the original type, the subtype’s tp_init is called.
The purpose of this function is to separate memory allocation from memory
initialization. It should return a pointer to a block of memory of adequate
length for the instance, suitably aligned, and initialized to zeros, but with
ob_refcnt set to 1 and ob_type set to the type argument. If
the type’s tp_itemsize is non-zero, the object’s ob_size field
should be initialized to nitems and the length of the allocated memory block
should be tp_basicsize+nitems*tp_itemsize, rounded up to a multiple of
sizeof(void*); otherwise, nitems is not used and the length of the block
should be tp_basicsize.
Do not use this function to do any other instance initialization, not even to
allocate additional memory; that should be done by tp_new.
This field is inherited by static subtypes, but not by dynamic subtypes
(subtypes created by a class statement); in the latter, this field is always set
to PyType_GenericAlloc(), to force a standard heap allocation strategy.
That is also the recommended value for statically defined types.
An optional pointer to an instance creation function.
If this function is NULL for a particular type, that type cannot be called to
create new instances; presumably there is some other way to create instances,
like a factory function.
The subtype argument is the type of the object being created; the args and
kwds arguments represent positional and keyword arguments of the call to the
type. Note that subtype doesn’t have to equal the type whose tp_new
function is called; it may be a subtype of that type (but not an unrelated
type).
The tp_new function should call subtype->tp_alloc(subtype,nitems)
to allocate space for the object, and then do only as much further
initialization as is absolutely necessary. Initialization that can safely be
ignored or repeated should be placed in the tp_init handler. A good
rule of thumb is that for immutable types, all initialization should take place
in tp_new, while for mutable types, most initialization should be
deferred to tp_init.
This field is inherited by subtypes, except it is not inherited by static types
whose tp_base is NULL or &PyBaseObject_Type.
An optional pointer to an instance deallocation function. Its signature is
freefunc:
voidtp_free(void*)
An initializer that is compatible with this signature is PyObject_Free().
This field is inherited by static subtypes, but not by dynamic subtypes
(subtypes created by a class statement); in the latter, this field is set to a
deallocator suitable to match PyType_GenericAlloc() and the value of the
Py_TPFLAGS_HAVE_GC flag bit.
An optional pointer to a function called by the garbage collector.
The garbage collector needs to know whether a particular object is collectible
or not. Normally, it is sufficient to look at the object’s type’s
tp_flags field, and check the Py_TPFLAGS_HAVE_GC flag bit. But
some types have a mixture of statically and dynamically allocated instances, and
the statically allocated instances are not collectible. Such types should
define this function; it should return 1 for a collectible instance, and
0 for a non-collectible instance. The signature is
inttp_is_gc(PyObject*self)
(The only example of this are types themselves. The metatype,
PyType_Type, defines this function to distinguish between statically
and dynamically allocated types.)
Weak reference list head, for weak references to this type object. Not
inherited. Internal use only.
The remaining fields are only defined if the feature test macro
COUNT_ALLOCS is defined, and are for internal use only. They are
documented here for completeness. None of these fields are inherited by
subtypes.
Pointer to the next type object with a non-zero tp_allocs field.
Also, note that, in a garbage collected Python, tp_dealloc may be called from
any Python thread, not just the thread which created the object (if the object
becomes part of a refcount cycle, that cycle might be collected by a garbage
collection on any thread). This is not a problem for Python API calls, since
the thread on which tp_dealloc is called will own the Global Interpreter Lock
(GIL). However, if the object being destroyed in turn destroys objects from some
other C or C++ library, care should be taken to ensure that destroying those
objects on the thread which called tp_dealloc will not violate any assumptions
of the library.
This structure holds pointers to the functions which an object uses to
implement the number protocol. Each function is used by the function of
similar name documented in the Number Protocol section.
Binary and ternary functions must check the type of all their operands,
and implement the necessary conversions (at least one of the operands is
an instance of the defined type). If the operation is not defined for the
given operands, binary and ternary functions must return
Py_NotImplemented, if another error occurred they must return NULL
and set an exception.
Note
The nb_reserved field should always be NULL. It
was previously called nb_long, and was renamed in
Python 3.0.1.
This function is used by PyMapping_Length() and
PyObject_Size(), and has the same signature. This slot may be set to
NULL if the object has no defined length.
This function is used by PyObject_GetItem() and has the same
signature. This slot must be filled for the PyMapping_Check()
function to return 1, it can be NULL otherwise.
This function is used by PySequence_Concat() and has the same
signature. It is also used by the + operator, after trying the numeric
addition via the tp_as_number.nb_add slot.
This function is used by PySequence_Repeat() and has the same
signature. It is also used by the * operator, after trying numeric
multiplication via the tp_as_number.nb_mul slot.
This function is used by PySequence_GetItem() and has the same
signature. This slot must be filled for the PySequence_Check()
function to return 1, it can be NULL otherwise.
Negative indexes are handled as follows: if the sq_length slot is
filled, it is called and the sequence length is used to compute a positive
index which is passed to sq_item. If sq_length is NULL,
the index is passed as is to the function.
This function is used by PySequence_SetItem() and has the same
signature. This slot may be left to NULL if the object does not support
item assignment.
This function may be used by PySequence_Contains() and has the same
signature. This slot may be left to NULL, in this case
PySequence_Contains() simply traverses the sequence until it finds a
match.
The buffer interface exports a model where an object can expose its internal
data.
If an object does not export the buffer interface, then its tp_as_buffer
member in the PyTypeObject structure should be NULL. Otherwise, the
tp_as_buffer will point to a PyBufferProcs structure.
This should fill a Py_buffer with the necessary data for
exporting the type. The signature of getbufferproc is int(PyObject*obj,Py_buffer*view,intflags). obj is the object to
export, view is the Py_buffer struct to fill, and flags gives
the conditions the caller wants the memory under. (See
PyObject_GetBuffer() for all flags.) bf_getbuffer is
responsible for filling view with the appropriate information.
(PyBuffer_FillView() can be used in simple cases.) See
Py_buffers docs for what needs to be filled in.
This should release the resources of the buffer. The signature of
releasebufferproc is void(PyObject*obj,Py_buffer*view).
If the bf_releasebuffer function is not provided (i.e. it is
NULL), then it does not ever need to be called.
The exporter of the buffer interface must make sure that any memory
pointed to in the Py_buffer structure remains valid until
releasebuffer is called. Exporters will need to define a
bf_releasebuffer function if they can re-allocate their memory,
strides, shape, suboffsets, or format variables which they might share
through the struct bufferinfo.
Python’s support for detecting and collecting garbage which involves circular
references requires support from object types which are “containers” for other
objects which may also be containers. Types which do not store references to
other objects, or which only store references to atomic types (such as numbers
or strings), do not need to provide any explicit support for garbage
collection.
To create a container type, the tp_flags field of the type object must
include the Py_TPFLAGS_HAVE_GC and provide an implementation of the
tp_traverse handler. If instances of the type are mutable, a
tp_clear implementation must also be provided.
Py_TPFLAGS_HAVE_GC
Objects with a type with this flag set must conform with the rules
documented here. For convenience these objects will be referred to as
container objects.
Constructors for container types must conform to two rules:
Adds the object op to the set of container objects tracked by the
collector. The collector can run at unexpected times so objects must be
valid while being tracked. This should be called once all the fields
followed by the tp_traverse handler become valid, usually near the
end of the constructor.
Remove the object op from the set of container objects tracked by the
collector. Note that PyObject_GC_Track() can be called again on
this object to add it back to the set of tracked objects. The deallocator
(tp_dealloc handler) should call this for the object before any of
the fields used by the tp_traverse handler become invalid.
Type of the visitor function passed to the tp_traverse handler.
The function should be called with an object to traverse as object and
the third parameter to the tp_traverse handler as arg. The
Python core uses several visitor functions to implement cyclic garbage
detection; it’s not expected that users will need to write their own
visitor functions.
The tp_traverse handler must have the following type:
Traversal function for a container object. Implementations must call the
visit function for each object directly contained by self, with the
parameters to visit being the contained object and the arg value passed
to the handler. The visit function must not be called with a NULL
object argument. If visit returns a non-zero value that value should be
returned immediately.
To simplify writing tp_traverse handlers, a Py_VISIT() macro is
provided. In order to use this macro, the tp_traverse implementation
must name its arguments exactly visit and arg:
Call the visit callback, with arguments o and arg. If visit returns
a non-zero value, then return it. Using this macro, tp_traverse
handlers look like:
Drop references that may have created reference cycles. Immutable objects
do not have to define this method since they can never directly create
reference cycles. Note that the object must still be valid after calling
this method (don’t just call Py_DECREF() on a reference). The
collector will call this method if it detects that this object is involved
in a reference cycle.
This document describes the Python Distribution Utilities (“Distutils”) from
the module developer’s point of view, describing how to use the Distutils to
make Python modules and extensions easily available to a wider audience with
very little overhead for build/release/install mechanics.
This document covers using the Distutils to distribute your Python modules,
concentrating on the role of developer/distributor: if you’re looking for
information on installing Python modules, you should refer to the
Installing Python Modules chapter.
Using the Distutils is quite simple, both for module developers and for
users/administrators installing third-party modules. As a developer, your
responsibilities (apart from writing solid, well-documented and well-tested
code, of course!) are:
write a setup script (setup.py by convention)
(optional) write a setup configuration file
create a source distribution
(optional) create one or more built (binary) distributions
Each of these tasks is covered in this document.
Not all module developers have access to a multitude of platforms, so it’s not
always feasible to expect them to create a multitude of built distributions. It
is hoped that a class of intermediaries, called packagers, will arise to
address this need. Packagers will take source distributions released by module
developers, build them on one or more platforms, and release the resulting built
distributions. Thus, users on the most popular platforms will be able to
install most popular Python module distributions in the most natural way for
their platform, without having to run a single setup script or compile a line of
code.
The setup script is usually quite simple, although since it’s written in Python,
there are no arbitrary limits to what you can do with it, though you should be
careful about putting arbitrarily expensive operations in your setup script.
Unlike, say, Autoconf-style configure scripts, the setup script may be run
multiple times in the course of building and installing your module
distribution.
If all you want to do is distribute a module called foo, contained in a
file foo.py, then your setup script can be as simple as this:
from distutils.core import setup
setup(name='foo',
version='1.0',
py_modules=['foo'],
)
Some observations:
most information that you supply to the Distutils is supplied as keyword
arguments to the setup() function
those keyword arguments fall into two categories: package metadata (name,
version number) and information about what’s in the package (a list of pure
Python modules, in this case)
modules are specified by module name, not filename (the same will hold true
for packages and extensions)
it’s recommended that you supply a little more metadata, in particular your
name, email address and a URL for the project (see section Writing the Setup Script
for an example)
To create a source distribution for this module, you would create a setup
script, setup.py, containing the above code, and run this command from a
terminal:
pythonsetup.pysdist
For Windows, open a command prompt windows (“DOS box”) and change the command
to:
setup.pysdist
sdist will create an archive file (e.g., tarball on Unix, ZIP file on Windows)
containing your setup script setup.py, and your module foo.py.
The archive file will be named foo-1.0.tar.gz (or .zip), and
will unpack into a directory foo-1.0.
If an end-user wishes to install your foo module, all she has to do is
download foo-1.0.tar.gz (or .zip), unpack it, and—from the
foo-1.0 directory—run
pythonsetup.pyinstall
which will ultimately copy foo.py to the appropriate directory for
third-party modules in their Python installation.
This simple example demonstrates some fundamental concepts of the Distutils.
First, both developers and installers have the same basic user interface, i.e.
the setup script. The difference is which Distutils commands they use: the
sdist command is almost exclusively for module developers, while
install is more often for installers (although most developers will
want to install their own code occasionally).
If you want to make things really easy for your users, you can create one or
more built distributions for them. For instance, if you are running on a
Windows machine, and want to make things easy for other Windows users, you can
create an executable installer (the most appropriate type of built distribution
for this platform) with the bdist_wininst command. For example:
pythonsetup.pybdist_wininst
will create an executable installer, foo-1.0.win32.exe, in the current
directory.
Other useful built distribution formats are RPM, implemented by the
bdist_rpm command, Solaris pkgtool
(bdist_pkgtool), and HP-UX swinstall
(bdist_sdux). For example, the following command will create an RPM
file called foo-1.0.noarch.rpm:
pythonsetup.pybdist_rpm
(The bdist_rpm command uses the rpm executable, therefore
this has to be run on an RPM-based system such as Red Hat Linux, SuSE Linux, or
Mandrake Linux.)
You can find out what distribution formats are available at any time by running
If you’re reading this document, you probably have a good idea of what modules,
extensions, and so forth are. Nevertheless, just to be sure that everyone is
operating from a common starting point, we offer the following glossary of
common Python terms:
module
the basic unit of code reusability in Python: a block of code imported by some
other code. Three types of modules concern us here: pure Python modules,
extension modules, and packages.
pure Python module
a module written in Python and contained in a single .py file (and
possibly associated .pyc and/or .pyo files). Sometimes referred
to as a “pure module.”
extension module
a module written in the low-level language of the Python implementation: C/C++
for Python, Java for Jython. Typically contained in a single dynamically
loadable pre-compiled file, e.g. a shared object (.so) file for Python
extensions on Unix, a DLL (given the .pyd extension) for Python
extensions on Windows, or a Java class file for Jython extensions. (Note that
currently, the Distutils only handles C/C++ extensions for Python.)
package
a module that contains other modules; typically contained in a directory in the
filesystem and distinguished from other directories by the presence of a file
__init__.py.
root package
the root of the hierarchy of packages. (This isn’t really a package, since it
doesn’t have an __init__.py file. But we have to call it something.)
The vast majority of the standard library is in the root package, as are many
small, standalone third-party modules that don’t belong to a larger module
collection. Unlike regular packages, modules in the root package can be found in
many directories: in fact, every directory listed in sys.path contributes
modules to the root package.
The following terms apply more specifically to the domain of distributing Python
modules using the Distutils:
module distribution
a collection of Python modules distributed together as a single downloadable
resource and meant to be installed en masse. Examples of some well-known
module distributions are NumPy, SciPy, PIL (the Python Imaging
Library), or mxBase. (This would be called a package, except that term is
already taken in the Python context: a single module distribution may contain
zero, one, or many Python packages.)
pure module distribution
a module distribution that contains only pure Python modules and packages.
Sometimes referred to as a “pure distribution.”
non-pure module distribution
a module distribution that contains at least one extension module. Sometimes
referred to as a “non-pure distribution.”
distribution root
the top-level directory of your source tree (or source distribution); the
directory where setup.py exists. Generally setup.py will be
run from this directory.
The setup script is the centre of all activity in building, distributing, and
installing modules using the Distutils. The main purpose of the setup script is
to describe your module distribution to the Distutils, so that the various
commands that operate on your modules do the right thing. As we saw in section
A Simple Example above, the setup script consists mainly of a call to
setup(), and most information supplied to the Distutils by the module
developer is supplied as keyword arguments to setup().
Here’s a slightly more involved example, which we’ll follow for the next couple
of sections: the Distutils’ own setup script. (Keep in mind that although the
Distutils are included with Python 1.6 and later, they also have an independent
existence so that Python 1.5.2 users can use them to install other module
distributions. The Distutils’ own setup script, shown here, is used to install
the package into Python 1.5.2.)
#!/usr/bin/env python
from distutils.core import setup
setup(name='Distutils',
version='1.0',
description='Python Distribution Utilities',
author='Greg Ward',
author_email='gward@python.net',
url='http://www.python.org/sigs/distutils-sig/',
packages=['distutils', 'distutils.command'],
)
There are only two differences between this and the trivial one-file
distribution presented in section A Simple Example: more metadata, and the
specification of pure Python modules by package, rather than by module. This is
important since the Distutils consist of a couple of dozen modules split into
(so far) two packages; an explicit list of every module would be tedious to
generate and difficult to maintain. For more information on the additional
meta-data, see section Additional meta-data.
Note that any pathnames (files or directories) supplied in the setup script
should be written using the Unix convention, i.e. slash-separated. The
Distutils will take care of converting this platform-neutral representation into
whatever is appropriate on your current platform before actually using the
pathname. This makes your setup script portable across operating systems, which
of course is one of the major goals of the Distutils. In this spirit, all
pathnames in this document are slash-separated.
This, of course, only applies to pathnames given to Distutils functions. If
you, for example, use standard Python functions such as glob.glob() or
os.listdir() to specify files, you should be careful to write portable
code instead of hardcoding path separators:
The packages option tells the Distutils to process (build, distribute,
install, etc.) all pure Python modules found in each package mentioned in the
packages list. In order to do this, of course, there has to be a
correspondence between package names and directories in the filesystem. The
default correspondence is the most obvious one, i.e. package distutils is
found in the directory distutils relative to the distribution root.
Thus, when you say packages=['foo'] in your setup script, you are
promising that the Distutils will find a file foo/__init__.py (which
might be spelled differently on your system, but you get the idea) relative to
the directory where your setup script lives. If you break this promise, the
Distutils will issue a warning but still process the broken package anyway.
If you use a different convention to lay out your source directory, that’s no
problem: you just have to supply the package_dir option to tell the
Distutils about your convention. For example, say you keep all Python source
under lib, so that modules in the “root package” (i.e., not in any
package at all) are in lib, modules in the foo package are in
lib/foo, and so forth. Then you would put
package_dir = {'': 'lib'}
in your setup script. The keys to this dictionary are package names, and an
empty package name stands for the root package. The values are directory names
relative to your distribution root. In this case, when you say packages=['foo'], you are promising that the file lib/foo/__init__.py exists.
Another possible convention is to put the foo package right in
lib, the foo.bar package in lib/bar, etc. This would be
written in the setup script as
package_dir = {'foo': 'lib'}
A package:dir entry in the package_dir dictionary implicitly
applies to all packages below package, so the foo.bar case is
automatically handled here. In this example, having packages=['foo','foo.bar'] tells the Distutils to look for lib/__init__.py and
lib/bar/__init__.py. (Keep in mind that although package_dir
applies recursively, you must explicitly list all packages in
packages: the Distutils will not recursively scan your source tree
looking for any directory with an __init__.py file.)
For a small module distribution, you might prefer to list all modules rather
than listing packages—especially the case of a single module that goes in the
“root package” (i.e., no package at all). This simplest case was shown in
section A Simple Example; here is a slightly more involved example:
py_modules = ['mod1', 'pkg.mod2']
This describes two modules, one of them in the “root” package, the other in the
pkg package. Again, the default package/directory layout implies that
these two modules can be found in mod1.py and pkg/mod2.py, and
that pkg/__init__.py exists as well. And again, you can override the
package/directory correspondence using the package_dir option.
Just as writing Python extension modules is a bit more complicated than writing
pure Python modules, describing them to the Distutils is a bit more complicated.
Unlike pure modules, it’s not enough just to list modules or packages and expect
the Distutils to go out and find the right files; you have to specify the
extension name, source file(s), and any compile/link requirements (include
directories, libraries to link with, etc.).
All of this is done through another keyword argument to setup(), the
ext_modules option. ext_modules is just a list of
Extension instances, each of which describes a single extension module.
Suppose your distribution includes a single extension, called foo and
implemented by foo.c. If no additional instructions to the
compiler/linker are needed, describing this extension is quite simple:
Extension('foo', ['foo.c'])
The Extension class can be imported from distutils.core along
with setup(). Thus, the setup script for a module distribution that
contains only this one extension and nothing else might be:
from distutils.core import setup, Extension
setup(name='foo',
version='1.0',
ext_modules=[Extension('foo', ['foo.c'])],
)
The Extension class (actually, the underlying extension-building
machinery implemented by the build_ext command) supports a great deal
of flexibility in describing Python extensions, which is explained in the
following sections.
describes the same extension in the pkg package. The source files and
resulting object code are identical in both cases; the only difference is where
in the filesystem (and therefore where in Python’s namespace hierarchy) the
resulting extension lives.
If you have a number of extensions all in the same package (or all under the
same base package), use the ext_package keyword argument to
setup(). For example,
The second argument to the Extension constructor is a list of source
files. Since the Distutils currently only support C, C++, and Objective-C
extensions, these are normally C/C++/Objective-C source files. (Be sure to use
appropriate extensions to distinguish C++source files: .cc and
.cpp seem to be recognized by both Unix and Windows compilers.)
However, you can also include SWIG interface (.i) files in the list; the
build_ext command knows how to deal with SWIG extensions: it will run
SWIG on the interface file and compile the resulting C/C++ file into your
extension.
This warning notwithstanding, options to SWIG can be currently passed like
this:
On some platforms, you can include non-source files that are processed by the
compiler and included in your extension. Currently, this just means Windows
message text (.mc) files and resource definition (.rc) files for
Visual C++. These will be compiled to binary resource (.res) files and
linked into the executable.
Three optional arguments to Extension will help if you need to specify
include directories to search or preprocessor macros to define/undefine:
include_dirs, define_macros, and undef_macros.
For example, if your extension requires header files in the include
directory under your distribution root, use the include_dirs option:
You can specify absolute directories there; if you know that your extension will
only be built on Unix systems with X11R6 installed to /usr, you can get
away with
You should avoid this sort of non-portable usage if you plan to distribute your
code: it’s probably better to write C code like
#include <X11/Xlib.h>
If you need to include header files from some other Python extension, you can
take advantage of the fact that header files are installed in a consistent way
by the Distutils install_header command. For example, the Numerical
Python header files are installed (on a standard Unix installation) to
/usr/local/include/python1.5/Numerical. (The exact location will differ
according to your platform and Python installation.) Since the Python include
directory—/usr/local/include/python1.5 in this case—is always
included in the search path when building Python extensions, the best approach
is to write C code like
#include <Numerical/arrayobject.h>
If you must put the Numerical include directory right into your header
search path, though, you can find that directory using the Distutils
distutils.sysconfig module:
Even though this is quite portable—it will work on any Python installation,
regardless of platform—it’s probably easier to just write your C code in the
sensible way.
You can define and undefine pre-processor macros with the define_macros and
undef_macros options. define_macros takes a list of (name,value)
tuples, where name is the name of the macro to define (a string) and
value is its value: either a string or None. (Defining a macro FOO
to None is the equivalent of a bare #defineFOO in your C source: with
most compilers, this sets FOO to the string 1.) undef_macros is
just a list of macros to undefine.
You can also specify the libraries to link against when building your extension,
and the directories to search for those libraries. The libraries option is
a list of libraries to link against, library_dirs is a list of directories
to search for libraries at link-time, and runtime_library_dirs is a list of
directories to search for shared (dynamically loaded) libraries at run-time.
For example, if you need to link against libraries known to be in the standard
library search path on target systems
Extension(...,
libraries=['gdbm', 'readline'])
If you need to link with libraries in a non-standard location, you’ll have to
include the location in library_dirs:
There are still some other options which can be used to handle special cases.
The optional option is a boolean; if it is true,
a build failure in the extension will not abort the build process, but
instead simply not install the failing extension.
The extra_objects option is a list of object files to be passed to the
linker. These files must not have extensions, as the default extension for the
compiler is used.
extra_compile_args and extra_link_args can be used to
specify additional command line options for the respective compiler and linker
command lines.
export_symbols is only useful on Windows. It can contain a list of
symbols (functions or variables) to be exported. This option is not needed when
building compiled extensions: Distutils will automatically add initmodule
to the list of exported symbols.
The depends option is a list of files that the extension depends on
(for example header files). The build command will call the compiler on the
sources to rebuild extension if any on this files has been modified since the
previous build.
A distribution may relate to packages in three specific ways:
It can require packages or modules.
It can provide packages or modules.
It can obsolete packages or modules.
These relationships can be specified using keyword arguments to the
distutils.core.setup() function.
Dependencies on other Python modules and packages can be specified by supplying
the requires keyword argument to setup(). The value must be a list of
strings. Each string specifies a package that is required, and optionally what
versions are sufficient.
To specify that any version of a module or package is required, the string
should consist entirely of the module or package name. Examples include
'mymodule' and 'xml.parsers.expat'.
If specific versions are required, a sequence of qualifiers can be supplied in
parentheses. Each qualifier may consist of a comparison operator and a version
number. The accepted comparison operators are:
<>==<=>=!=
These can be combined by using multiple qualifiers separated by commas (and
optional whitespace). In this case, all of the qualifiers must be matched; a
logical AND is used to combine the evaluations.
Let’s look at a bunch of examples:
Requires Expression
Explanation
==1.0
Only version 1.0 is compatible
>1.0,!=1.5.1,<2.0
Any version after 1.0 and before 2.0
is compatible, except 1.5.1
Now that we can specify dependencies, we also need to be able to specify what we
provide that other distributions can require. This is done using the provides
keyword argument to setup(). The value for this keyword is a list of
strings, each of which names a Python module or package, and optionally
identifies the version. If the version is not specified, it is assumed to match
that of the distribution.
Some examples:
Provides Expression
Explanation
mypkg
Provide mypkg, using the distribution
version
mypkg(1.1)
Provide mypkg version 1.1, regardless of
the distribution version
A package can declare that it obsoletes other packages using the obsoletes
keyword argument. The value for this is similar to that of the requires
keyword: a list of strings giving module or package specifiers. Each specifier
consists of a module or package name optionally followed by one or more version
qualifiers. Version qualifiers are given in parentheses after the module or
package name.
The versions identified by the qualifiers are those that are obsoleted by the
distribution being described. If no qualifiers are given, all versions of the
named module or package are understood to be obsoleted.
So far we have been dealing with pure and non-pure Python modules, which are
usually not run by themselves but imported by scripts.
Scripts are files containing Python source code, intended to be started from the
command line. Scripts don’t require Distutils to do anything very complicated.
The only clever feature is that if the first line of the script starts with
#! and contains the word “python”, the Distutils will adjust the first line
to refer to the current interpreter location. By default, it is replaced with
the current interpreter location. The --executable (or -e)
option will allow the interpreter path to be explicitly overridden.
The scripts option simply is a list of files to be handled in this
way. From the PyXML setup script:
Often, additional files need to be installed into a package. These files are
often data that’s closely related to the package’s implementation, or text files
containing documentation that might be of interest to programmers using the
package. These files are called package data.
Package data can be added to packages using the package_data keyword
argument to the setup() function. The value must be a mapping from
package name to a list of relative path names that should be copied into the
package. The paths are interpreted as relative to the directory containing the
package (information from the package_dir mapping is used if appropriate);
that is, the files are expected to be part of the package in the source
directories. They may contain glob patterns as well.
The path names may contain directory portions; any necessary directories will be
created in the installation.
For example, if a package should contain a subdirectory with several data files,
the files can be arranged like this in the source tree:
Changed in version 3.1: All the files that match package_data will be added to the MANIFEST
file if no template is provided. See Specifying the files to distribute.
The data_files option can be used to specify additional files needed
by the module distribution: configuration files, message catalogs, data files,
anything which doesn’t fit in the previous categories.
data_files specifies a sequence of (directory, files) pairs in the
following way:
Note that you can specify the directory names where the data files will be
installed, but you cannot rename the data files themselves.
Each (directory, files) pair in the sequence specifies the installation
directory and the files to install there. If directory is a relative path, it
is interpreted relative to the installation prefix (Python’s sys.prefix for
pure-Python packages, sys.exec_prefix for packages that contain extension
modules). Each file name in files is interpreted relative to the
setup.py script at the top of the package source distribution. No
directory information from files is used to determine the final location of
the installed file; only the name of the file is used.
You can specify the data_files options as a simple sequence of files
without specifying a target directory, but this is not recommended, and the
install command will print a warning in this case. To install data
files directly in the target directory, an empty string should be given as the
directory.
Changed in version 3.1:
Changed in version 3.1: All the files that match data_files will be added to the MANIFEST
file if no template is provided. See Specifying the files to distribute.
The setup script may include additional meta-data beyond the name and version.
This information includes:
Meta-Data
Description
Value
Notes
name
name of the package
short string
(1)
version
version of this release
short string
(1)(2)
author
package author’s name
short string
(3)
author_email
email address of the
package author
email address
(3)
maintainer
package maintainer’s name
short string
(3)
maintainer_email
email address of the
package maintainer
email address
(3)
url
home page for the package
URL
(1)
description
short, summary
description of the
package
short string
long_description
longer description of the
package
long string
(5)
download_url
location where the
package may be downloaded
URL
(4)
classifiers
a list of classifiers
list of strings
(4)
platforms
a list of platforms
list of strings
license
license for the package
short string
(6)
Notes:
These fields are required.
It is recommended that versions take the form major.minor[.patch[.sub]].
Either the author or the maintainer must be identified.
These fields should not be used if your package is to be compatible with Python
versions prior to 2.2.3 or 2.3. The list is available from the PyPI website.
The long_description field is used by PyPI when you are registering a
package, to build its home page.
The license field is a text indicating the license covering the
package where the license is not a selection from the “License” Trove
classifiers. See the Classifier field. Notice that
there’s a licence distribution option which is deprecated but still
acts as an alias for license.
‘short string’
A single line of text, not more than 200 characters.
Encoding the version information is an art in itself. Python packages generally
adhere to the version format major.minor[.patch][sub]. The major number is 0
for initial, experimental releases of software. It is incremented for releases
that represent major milestones in a package. The minor number is incremented
when important new features are added to the package. The patch number
increments when bug-fix releases are made. Additional trailing version
information is sometimes used to indicate sub-releases. These are
“a1,a2,...,aN” (for alpha releases, where functionality and API may change),
“b1,b2,...,bN” (for beta releases, which only fix bugs) and “pr1,pr2,...,prN”
(for final pre-release release testing). Some examples:
0.1.0
the first, experimental release of a package
1.0.1a2
the second alpha release of the first patch version of 1.0
classifiers are specified in a Python list:
setup(...,
classifiers=[
'Development Status :: 4 - Beta',
'Environment :: Console',
'Environment :: Web Environment',
'Intended Audience :: End Users/Desktop',
'Intended Audience :: Developers',
'Intended Audience :: System Administrators',
'License :: OSI Approved :: Python Software Foundation License',
'Operating System :: MacOS :: MacOS X',
'Operating System :: Microsoft :: Windows',
'Operating System :: POSIX',
'Programming Language :: Python',
'Topic :: Communications :: Email',
'Topic :: Office/Business',
'Topic :: Software Development :: Bug Tracking',
],
)
If you wish to include classifiers in your setup.py file and also wish
to remain backwards-compatible with Python releases prior to 2.2.3, then you can
include the following code fragment in your setup.py before the
setup() call.
# patch distutils if it can't cope with the "classifiers" or
# "download_url" keywords
from sys import version
if version < '2.2.3':
from distutils.dist import DistributionMetadata
DistributionMetadata.classifiers = None
DistributionMetadata.download_url = None
Sometimes things go wrong, and the setup script doesn’t do what the developer
wants.
Distutils catches any exceptions when running the setup script, and print a
simple error message before the script is terminated. The motivation for this
behaviour is to not confuse administrators who don’t know much about Python and
are trying to install a package. If they get a big long traceback from deep
inside the guts of Distutils, they may think the package or the Python
installation is broken because they don’t read all the way down to the bottom
and see that it’s a permission problem.
On the other hand, this doesn’t help the developer to find the cause of the
failure. For this purpose, the DISTUTILS_DEBUG environment variable can be set
to anything except an empty string, and distutils will now print detailed
information what it is doing, and prints the full traceback in case an exception
occurs.
Often, it’s not possible to write down everything needed to build a distribution
a priori: you may need to get some information from the user, or from the
user’s system, in order to proceed. As long as that information is fairly
simple—a list of directories to search for C header files or libraries, for
example—then providing a configuration file, setup.cfg, for users to
edit is a cheap and easy way to solicit it. Configuration files also let you
provide default values for any command option, which the installer can then
override either on the command-line or by editing the config file.
The setup configuration file is a useful middle-ground between the setup script
—which, ideally, would be opaque to installers [1]—and the command-line to
the setup script, which is outside of your control and entirely up to the
installer. In fact, setup.cfg (and any other Distutils configuration
files present on the target system) are processed after the contents of the
setup script, but before the command-line. This has several useful
consequences:
installers can override some of what you put in setup.py by editing
setup.cfg
you can provide non-standard defaults for options that are not easily set in
setup.py
installers can override anything in setup.cfg using the command-line
options to setup.py
The basic syntax of the configuration file is simple:
[command]option=value...
where command is one of the Distutils commands (e.g. build_py,
install), and option is one of the options that command supports.
Any number of options can be supplied for each command, and any number of
command sections can be included in the file. Blank lines are ignored, as are
comments, which run from a '#' character until the end of the line. Long
option values can be split across multiple lines simply by indenting the
continuation lines.
You can find out the list of options supported by a particular command with the
universal --help option, e.g.
> python setup.py --help build_ext
[...]
Options for 'build_ext' command:
--build-lib (-b) directory for compiled extension modules
--build-temp (-t) directory for temporary files (build by-products)
--inplace (-i) ignore build-lib and put compiled extensions into the
source directory alongside your pure Python modules
--include-dirs (-I) list of directories to search for header files
--define (-D) C preprocessor macros to define
--undef (-U) C preprocessor macros to undefine
--swig-opts list of SWIG command line options
[...]
Note that an option spelled --foo-bar on the command-line is spelled
foo_bar in configuration files.
For example, say you want your extensions to be built “in-place”—that is, you
have an extension pkg.ext, and you want the compiled extension file
(ext.so on Unix, say) to be put in the same source directory as your
pure Python modules pkg.mod1 and pkg.mod2. You can always use the
--inplace option on the command-line to ensure this:
pythonsetup.pybuild_ext--inplace
But this requires that you always specify the build_ext command
explicitly, and remember to provide --inplace. An easier way is to
“set and forget” this option, by encoding it in setup.cfg, the
configuration file for this distribution:
[build_ext]inplace=1
This will affect all builds of this module distribution, whether or not you
explicitly specify build_ext. If you include setup.cfg in
your source distribution, it will also affect end-user builds—which is
probably a bad idea for this option, since always building extensions in-place
would break installation of the module distribution. In certain peculiar cases,
though, modules are built right in their installation directory, so this is
conceivably a useful ability. (Distributing extensions that expect to be built
in their installation directory is almost always a bad idea, though.)
Another example: certain commands take a lot of options that don’t change from
run to run; for example, bdist_rpm needs to know everything required
to generate a “spec” file for creating an RPM distribution. Some of this
information comes from the setup script, and some is automatically generated by
the Distutils (such as the list of files installed). But some of it has to be
supplied as options to bdist_rpm, which would be very tedious to do
on the command-line for every run. Hence, here is a snippet from the Distutils’
own setup.cfg:
As shown in section A Simple Example, you use the sdist command
to create a source distribution. In the simplest case,
pythonsetup.pysdist
(assuming you haven’t specified any sdist options in the setup script
or config file), sdist creates the archive of the default format for
the current platform. The default format is a gzip’ed tar file
(.tar.gz) on Unix, and ZIP file on Windows.
You can specify as many formats as you like using the --formats
option, for example:
pythonsetup.pysdist--formats=gztar,zip
to create a gzipped tarball and a zip file. The available formats are:
Format
Description
Notes
zip
zip file (.zip)
(1),(3)
gztar
gzip’ed tar file
(.tar.gz)
(2),(4)
bztar
bzip2’ed tar file
(.tar.bz2)
(4)
ztar
compressed tar file
(.tar.Z)
(4)
tar
tar file (.tar)
(4)
Notes:
default on Windows
default on Unix
requires either external zip utility or zipfile module (part
of the standard Python library since Python 1.6)
requires external utilities: tar and possibly one of gzip,
bzip2, or compress
If you don’t supply an explicit list of files (or instructions on how to
generate one), the sdist command puts a minimal default set into the
source distribution:
all Python source files implied by the py_modules and
packages options
all C source files mentioned in the ext_modules or
libraries options (
anything that looks like a test script: test/test*.py (currently, the
Distutils don’t do anything with test scripts except include them in source
distributions, but in the future there will be a standard for testing Python
module distributions)
README.txt (or README), setup.py (or whatever you
called your setup script), and setup.cfg
Sometimes this is enough, but usually you will want to specify additional files
to distribute. The typical way to do this is to write a manifest template,
called MANIFEST.in by default. The manifest template is just a list of
instructions for how to generate your manifest file, MANIFEST, which is
the exact list of files to include in your source distribution. The
sdist command processes this template and generates a manifest based
on its instructions and what it finds in the filesystem.
If you prefer to roll your own manifest file, the format is simple: one filename
per line, regular files (or symlinks to them) only. If you do supply your own
MANIFEST, you must specify everything: the default set of files
described above does not apply in this case.
Changed in version 3.1:
Changed in version 3.1: An existing generated MANIFEST will be regenerated without
sdist comparing its modification time to the one of
MANIFEST.in or setup.py.
Changed in version 3.1.3:
Changed in version 3.1.3: MANIFEST files start with a comment indicating they are generated.
Files without this comment are not overwritten or removed.
Changed in version 3.2.2:
Changed in version 3.2.2: sdist will read a MANIFEST file if no MANIFEST.in
exists, like it used to do.
The manifest template has one command per line, where each command specifies a
set of files to include or exclude from the source distribution. For an
example, again we turn to the Distutils’ own manifest template:
The meanings should be fairly clear: include all files in the distribution root
matching *.txt, all files anywhere under the examples directory
matching *.txt or *.py, and exclude all directories matching
examples/sample?/build. All of this is done after the standard
include set, so you can exclude files from the standard set with explicit
instructions in the manifest template. (Or, you can use the
--no-defaults option to disable the standard set entirely.) There are
several other commands available in the manifest template mini-language; see
section Creating a source distribution: the sdist command.
The order of commands in the manifest template matters: initially, we have the
list of default files as described above, and each command in the template adds
to or removes from that list of files. Once we have fully processed the
manifest template, we remove files that should not be included in the source
distribution:
all files in the Distutils “build” tree (default build/)
all files in directories named RCS, CVS, .svn,
.hg, .git, .bzr or _darcs
Now we have our complete list of files, which is written to the manifest for
future reference, and then used to build the source distribution archive(s).
You can disable the default set of included files with the
--no-defaults option, and you can disable the standard exclude set
with --no-prune.
Following the Distutils’ own manifest template, let’s trace how the
sdist command builds the list of files to include in the Distutils
source distribution:
include all Python source files in the distutils and
distutils/command subdirectories (because packages corresponding to
those two directories were mentioned in the packages option in the
setup script—see section Writing the Setup Script)
include README.txt, setup.py, and setup.cfg (standard
files)
include test/test*.py (standard files)
include *.txt in the distribution root (this will find
README.txt a second time, but such redundancies are weeded out later)
include anything matching *.txt or *.py in the sub-tree
under examples,
exclude all files in the sub-trees starting at directories matching
examples/sample?/build—this may exclude files included by the
previous two steps, so it’s important that the prune command in the manifest
template comes after the recursive-include command
exclude the entire build tree, and any RCS, CVS,
.svn, .hg, .git, .bzr and _darcs
directories
Just like in the setup script, file and directory names in the manifest template
should always be slash-separated; the Distutils will take care of converting
them to the standard representation on your platform. That way, the manifest
template is portable across operating systems.
The normal course of operations for the sdist command is as follows:
if the manifest file (MANIFEST by default) exists and the first line
does not have a comment indicating it is generated from MANIFEST.in,
then it is used as is, unaltered
if the manifest file doesn’t exist or has been previously automatically
generated, read MANIFEST.in and create the manifest
if neither MANIFEST nor MANIFEST.in exist, create a manifest
with just the default file set
use the list of files now in MANIFEST (either just generated or read
in) to create the source distribution archive(s)
There are a couple of options that modify this behaviour. First, use the
--no-defaults and --no-prune to disable the standard
“include” and “exclude” sets.
Second, you might just want to (re)generate the manifest, but not create a source
distribution:
A “built distribution” is what you’re probably used to thinking of either as a
“binary package” or an “installer” (depending on your background). It’s not
necessarily binary, though, because it might contain only Python source code
and/or byte-code; and we don’t call it a package, because that word is already
spoken for in Python. (And “installer” is a term specific to the world of
mainstream desktop systems.)
A built distribution is how you make life as easy as possible for installers of
your module distribution: for users of RPM-based Linux systems, it’s a binary
RPM; for Windows users, it’s an executable installer; for Debian-based Linux
users, it’s a Debian package; and so forth. Obviously, no one person will be
able to create built distributions for every platform under the sun, so the
Distutils are designed to enable module developers to concentrate on their
specialty—writing code and creating source distributions—while an
intermediary species called packagers springs up to turn source distributions
into built distributions for as many platforms as there are packagers.
Of course, the module developer could be his own packager; or the packager could
be a volunteer “out there” somewhere who has access to a platform which the
original developer does not; or it could be software periodically grabbing new
source distributions and turning them into built distributions for as many
platforms as the software has access to. Regardless of who they are, a packager
uses the setup script and the bdist command family to generate built
distributions.
As a simple example, if I run the following command in the Distutils source
tree:
pythonsetup.pybdist
then the Distutils builds my module distribution (the Distutils itself in this
case), does a “fake” installation (also in the build directory), and
creates the default type of built distribution for my platform. The default
format for built distributions is a “dumb” tar file on Unix, and a simple
executable installer on Windows. (That tar file is considered “dumb” because it
has to be unpacked in a specific location to work.)
Thus, the above command on a Unix system creates
Distutils-1.0.plat.tar.gz; unpacking this tarball from the right place
installs the Distutils just as though you had downloaded the source distribution
and run pythonsetup.pyinstall. (The “right place” is either the root of
the filesystem or Python’s prefix directory, depending on the options
given to the bdist_dumb command; the default is to make dumb
distributions relative to prefix.)
Obviously, for pure Python distributions, this isn’t any simpler than just
running pythonsetup.pyinstall—but for non-pure distributions, which
include extensions that would need to be compiled, it can mean the difference
between someone being able to use your extensions or not. And creating “smart”
built distributions, such as an RPM package or an executable installer for
Windows, is far more convenient for users even if your distribution doesn’t
include any extensions.
The bdist command has a --formats option, similar to the
sdist command, which you can use to select the types of built
distribution to generate: for example,
pythonsetup.pybdist--format=zip
would, when run on a Unix system, create Distutils-1.0.plat.zip—again, this archive would be unpacked from the root directory to install the
Distutils.
The available formats for built distributions are:
Format
Description
Notes
gztar
gzipped tar file
(.tar.gz)
(1),(3)
ztar
compressed tar file
(.tar.Z)
(3)
tar
tar file (.tar)
(3)
zip
zip file (.zip)
(2),(4)
rpm
RPM
(5)
pkgtool
Solaris pkgtool
sdux
HP-UX swinstall
wininst
self-extracting ZIP file for
Windows
(4)
msi
Microsoft Installer.
Notes:
default on Unix
default on Windows
requires external utilities: tar and possibly one of gzip,
bzip2, or compress
requires either external zip utility or zipfile module (part
of the standard Python library since Python 1.6)
requires external rpm utility, version 3.0.4 or better (use rpm--version to find out which version you have)
You don’t have to use the bdist command with the --formats
option; you can also use the command that directly implements the format you’re
interested in. Some of these bdist “sub-commands” actually generate
several similar formats; for instance, the bdist_dumb command
generates all the “dumb” archive formats (tar, ztar, gztar, and
zip), and bdist_rpm generates both binary and source RPMs. The
bdist sub-commands, and the formats generated by each, are:
Command
Formats
bdist_dumb
tar, ztar, gztar, zip
bdist_rpm
rpm, srpm
bdist_wininst
wininst
bdist_msi
msi
The following sections give details on the individual bdist_*
commands.
The RPM format is used by many popular Linux distributions, including Red Hat,
SuSE, and Mandrake. If one of these (or any of the other RPM-based Linux
distributions) is your usual environment, creating RPM packages for other users
of that same distribution is trivial. Depending on the complexity of your module
distribution and differences between Linux distributions, you may also be able
to create RPMs that work on different RPM-based distributions.
The usual way to create an RPM of your module distribution is to run the
bdist_rpm command:
pythonsetup.pybdist_rpm
or the bdist command with the --format option:
pythonsetup.pybdist--formats=rpm
The former allows you to specify RPM-specific options; the latter allows you to
easily specify multiple formats in one run. If you need to do both, you can
explicitly specify multiple bdist_* commands and their options:
pythonsetup.pybdist_rpm--packager="John Doe <jdoe@example.org>" \
bdist_wininst--target-version="2.0"
Creating RPM packages is driven by a .spec file, much as using the
Distutils is driven by the setup script. To make your life easier, the
bdist_rpm command normally creates a .spec file based on the
information you supply in the setup script, on the command line, and in any
Distutils configuration files. Various options and sections in the
.spec file are derived from options in the setup script as follows:
RPM .spec file option or section
Distutils setup script option
Name
name
Summary (in preamble)
description
Version
version
Vendor
author and author_email,
or — & maintainer and
maintainer_email
Copyright
license
Url
url
%description (section)
long_description
Additionally, there are many options in .spec files that don’t have
corresponding options in the setup script. Most of these are handled through
options to the bdist_rpm command as follows:
RPM .spec file option
or section
bdist_rpm option
default value
Release
release
“1”
Group
group
“Development/Libraries”
Vendor
vendor
(see above)
Packager
packager
(none)
Provides
provides
(none)
Requires
requires
(none)
Conflicts
conflicts
(none)
Obsoletes
obsoletes
(none)
Distribution
distribution_name
(none)
BuildRequires
build_requires
(none)
Icon
icon
(none)
Obviously, supplying even a few of these options on the command-line would be
tedious and error-prone, so it’s usually best to put them in the setup
configuration file, setup.cfg—see section Writing the Setup Configuration File. If
you distribute or package many Python module distributions, you might want to
put options that apply to all of them in your personal Distutils configuration
file (~/.pydistutils.cfg).
There are three steps to building a binary RPM package, all of which are
handled automatically by the Distutils:
create a .spec file, which describes the package (analogous to the
Distutils setup script; in fact, much of the information in the setup script
winds up in the .spec file)
create the source RPM
create the “binary” RPM (which may or may not contain binary code, depending
on whether your module distribution contains Python extensions)
Normally, RPM bundles the last two steps together; when you use the Distutils,
all three steps are typically bundled together.
If you wish, you can separate these three steps. You can use the
--spec-only option to make bdist_rpm just create the
.spec file and exit; in this case, the .spec file will be
written to the “distribution directory”—normally dist/, but
customizable with the --dist-dir option. (Normally, the .spec
file winds up deep in the “build tree,” in a temporary directory created by
bdist_rpm.)
Executable installers are the natural format for binary distributions on
Windows. They display a nice graphical user interface, display some information
about the module distribution to be installed taken from the metadata in the
setup script, let the user select a few options, and start or cancel the
installation.
Since the metadata is taken from the setup script, creating Windows installers
is usually as easy as running:
pythonsetup.pybdist_wininst
or the bdist command with the --formats option:
pythonsetup.pybdist--formats=wininst
If you have a pure module distribution (only containing pure Python modules and
packages), the resulting installer will be version independent and have a name
like foo-1.0.win32.exe. These installers can even be created on Unix
platforms or Mac OS X.
If you have a non-pure distribution, the extensions can only be created on a
Windows platform, and will be Python version dependent. The installer filename
will reflect this and now has the form foo-1.0.win32-py2.0.exe. You
have to create a separate installer for every Python version you want to
support.
The installer will try to compile pure modules into bytecode after installation
on the target system in normal and optimizing mode. If you don’t want this to
happen for some reason, you can run the bdist_wininst command with
the --no-target-compile and/or the --no-target-optimize
option.
By default the installer will display the cool “Python Powered” logo when it is
run, but you can also supply your own 152x261 bitmap which must be a Windows
.bmp file with the --bitmap option.
The installer will also display a large title on the desktop background window
when it is run, which is constructed from the name of your distribution and the
version number. This can be changed to another text by using the
--title option.
The installer file will be written to the “distribution directory” — normally
dist/, but customizable with the --dist-dir option.
Starting with Python 2.6, distutils is capable of cross-compiling between
Windows platforms. In practice, this means that with the correct tools
installed, you can use a 32bit version of Windows to create 64bit extensions
and vice-versa.
To build for an alternate platform, specify the --plat-name option
to the build command. Valid values are currently ‘win32’, ‘win-amd64’ and
‘win-ia64’. For example, on a 32bit version of Windows, you could execute:
pythonsetup.pybuild--plat-name=win-amd64
to build a 64bit version of your extension. The Windows Installers also
support this option, so the command:
would create a 64bit installation executable on your 32bit version of Windows.
To cross-compile, you must download the Python source code and cross-compile
Python itself for the platform you are targetting - it is not possible from a
binary installation of Python (as the .lib etc file for other platforms are
not included.) In practice, this means the user of a 32 bit operating
system will need to use Visual Studio 2008 to open the
PCBuild/PCbuild.sln solution in the Python source tree and build the
“x64” configuration of the ‘pythoncore’ project before cross-compiling
extensions is possible.
Note that by default, Visual Studio 2008 does not install 64bit compilers or
tools. You may need to reexecute the Visual Studio setup process and select
these tools (using Control Panel->[Add/Remove] Programs is a convenient way to
check or modify your existing install.)
Starting with Python 2.3, a postinstallation script can be specified with the
--install-script option. The basename of the script must be
specified, and the script filename must also be listed in the scripts argument
to the setup function.
This script will be run at installation time on the target system after all the
files have been copied, with argv[1] set to -install, and again at
uninstallation time before the files are removed with argv[1] set to
-remove.
The installation script runs embedded in the windows installer, every output
(sys.stdout, sys.stderr) is redirected into a buffer and will be
displayed in the GUI after the script has finished.
Some functions especially useful in this context are available as additional
built-in functions in the installation script.
These functions should be called when a directory or file is created by the
postinstall script at installation time. It will register path with the
uninstaller, so that it will be removed when the distribution is uninstalled.
To be safe, directories are only removed if they are empty.
This function can be used to retrieve special folder locations on Windows like
the Start Menu or the Desktop. It returns the full path to the folder.
csidl_string must be one of the following strings:
If the folder cannot be retrieved, OSError is raised.
Which folders are available depends on the exact Windows version, and probably
also the configuration. For details refer to Microsoft’s documentation of the
SHGetSpecialFolderPath() function.
This function creates a shortcut. target is the path to the program to be
started by the shortcut. description is the description of the shortcut.
filename is the title of the shortcut that the user will see. arguments
specifies the command line arguments, if any. workdir is the working directory
for the program. iconpath is the file containing the icon for the shortcut,
and iconindex is the index of the icon in the file iconpath. Again, for
details consult the Microsoft documentation for the IShellLink
interface.
Starting with Python 2.6, bdist_wininst supports a --user-access-control
option. The default is ‘none’ (meaning no UAC handling is done), and other
valid values are ‘auto’ (meaning prompt for UAC elevation if Python was
installed for all users) and ‘force’ (meaning always prompt for elevation).
The Python Package Index (PyPI) holds meta-data describing distributions
packaged with distutils. The distutils command register is used to
submit your distribution’s meta-data to the index. It is invoked as follows:
Note: if your username and password are saved locally, you will not see this
menu.
If you have not registered with PyPI, then you will need to do so now. You
should choose option 2, and enter your details as required. Soon after
submitting your details, you will receive an email which will be used to confirm
your registration.
Once you are registered, you may choose option 1 from the menu. You will be
prompted for your PyPI username and password, and register will then
submit your meta-data to the index.
You may submit any number of versions of your distribution to the index. If you
alter the meta-data for a particular version, you may submit it again and the
index will be updated.
PyPI holds a record for each (name, version) combination submitted. The first
user to submit information for a given name is designated the Owner of that
name. They may submit changes through the register command or through
the web interface. They may also designate other users as Owners or Maintainers.
Maintainers may edit the package information, but not designate other Owners or
Maintainers.
By default PyPI will list all versions of a given package. To hide certain
versions, the Hidden property should be set to yes. This must be edited through
the web interface.
The Python Package Index (PyPI) not only stores the package info, but also the
package data if the author of the package wishes to. The distutils command
upload pushes the distribution files to PyPI.
The command is invoked immediately after building one or more distribution
files. For example, the command
pythonsetup.pysdistbdist_wininstupload
will cause the source distribution and the Windows installer to be uploaded to
PyPI. Note that these will be uploaded even if they are built using an earlier
invocation of setup.py, but that only distributions named on the command
line for the invocation including the upload command are uploaded.
The upload command uses the username, password, and repository URL
from the $HOME/.pypirc file (see section The .pypirc file for more on this
file). If a register command was previously called in the same command,
and if the password was entered in the prompt, upload will reuse the
entered password. This is useful if you do not want to store a clear text
password in the $HOME/.pypirc file.
You can specify another PyPI server with the --repository=*url* option:
See section The .pypirc file for more on defining several servers.
You can use the --sign option to tell upload to sign each
uploaded file using GPG (GNU Privacy Guard). The gpg program must
be available for execution on the system PATH. You can also specify
which key to use for signing using the --identity=*name* option.
Other upload options include --repository= or
--repository= where url is the url of the server and
section the name of the section in $HOME/.pypirc, and
--show-response (which displays the full response text from the PyPI
server for help in debugging upload problems).
The long_description field plays a special role at PyPI. It is used by
the server to display a home page for the registered package.
If you use the reStructuredText
syntax for this field, PyPI will parse it and display an HTML output for
the package home page.
The long_description field can be attached to a text file located
in the package:
from distutils.core import setup
with open('README.txt') as file:
long_description = file.read()
setup(name='Distutils',
long_description=long_description)
In that case, README.txt is a regular reStructuredText text file located
in the root of the package besides setup.py.
To prevent registering broken reStructuredText content, you can use the
rst2html program that is provided by the docutils package and
check the long_description from the command line:
This chapter provides a number of basic examples to help get started with
distutils. Additional information about using distutils can be found in the
Distutils Cookbook.
If you’re just distributing a couple of modules, especially if they don’t live
in a particular package, you can specify them individually using the
py_modules option in the setup script.
In the simplest case, you’ll have two files to worry about: a setup script and
the single module you’re distributing, foo.py in this example:
<root>/setup.pyfoo.py
(In all diagrams in this section, <root> will refer to the distribution root
directory.) A minimal setup script to describe this situation would be:
from distutils.core import setup
setup(name='foo',
version='1.0',
py_modules=['foo'],
)
Note that the name of the distribution is specified independently with the
name option, and there’s no rule that says it has to be the same as
the name of the sole module in the distribution (although that’s probably a good
convention to follow). However, the distribution name is used to generate
filenames, so you should stick to letters, digits, underscores, and hyphens.
Since py_modules is a list, you can of course specify multiple
modules, eg. if you’re distributing modules foo and bar, your
setup might look like this:
<root>/setup.pyfoo.pybar.py
and the setup script might be
from distutils.core import setup
setup(name='foobar',
version='1.0',
py_modules=['foo', 'bar'],
)
You can put module source files into another directory, but if you have enough
modules to do that, it’s probably easier to specify modules by package rather
than listing them individually.
If you have more than a couple of modules to distribute, especially if they are
in multiple packages, it’s probably easier to specify whole packages rather than
individual modules. This works even if your modules are not in a package; you
can just tell the Distutils to process modules from the root package, and that
works the same as any other package (except that you don’t have to have an
__init__.py file).
The setup script from the last example could also be written as
from distutils.core import setup
setup(name='foobar',
version='1.0',
packages=[''],
)
(The empty string stands for the root package.)
If those two files are moved into a subdirectory, but remain in the root
package, e.g.:
<root>/setup.pysrc/foo.pybar.py
then you would still specify the root package, but you have to tell the
Distutils where source files in the root package live:
from distutils.core import setup
setup(name='foobar',
version='1.0',
package_dir={'': 'src'},
packages=[''],
)
More typically, though, you will want to distribute multiple modules in the same
package (or in sub-packages). For example, if the foo and bar
modules belong in package foobar, one way to layout your source tree is
<root>/setup.pyfoobar/__init__.pyfoo.pybar.py
This is in fact the default layout expected by the Distutils, and the one that
requires the least work to describe in your setup script:
from distutils.core import setup
setup(name='foobar',
version='1.0',
packages=['foobar'],
)
If you want to put modules in directories not named for their package, then you
need to use the package_dir option again. For example, if the
src directory holds modules in the foobar package:
<root>/setup.pysrc/__init__.pyfoo.pybar.py
an appropriate setup script would be
from distutils.core import setup
setup(name='foobar',
version='1.0',
package_dir={'foobar': 'src'},
packages=['foobar'],
)
Or, you might put modules from your main package right in the distribution
root:
<root>/setup.py__init__.pyfoo.pybar.py
in which case your setup script would be
from distutils.core import setup
setup(name='foobar',
version='1.0',
package_dir={'foobar': ''},
packages=['foobar'],
)
(The empty string also stands for the current directory.)
If you have sub-packages, they must be explicitly listed in packages,
but any entries in package_dir automatically extend to sub-packages.
(In other words, the Distutils does not scan your source tree, trying to
figure out which directories correspond to Python packages by looking for
__init__.py files.) Thus, if the default layout grows a sub-package:
Extension modules are specified using the ext_modules option.
package_dir has no effect on where extension source files are found;
it only affects the source for pure Python modules. The simplest case, a
single extension module in a single C source file, is:
<root>/setup.pyfoo.c
If the foo extension belongs in the root package, the setup script for
this could be
from distutils.core import setup
from distutils.extension import Extension
setup(name='foobar',
version='1.0',
ext_modules=[Extension('foo', ['foo.c'])],
)
If the extension actually belongs in a package, say foopkg, then
With exactly the same source tree layout, this extension can be put in the
foopkg package simply by changing the name of the extension:
from distutils.core import setup
from distutils.extension import Extension
setup(name='foobar',
version='1.0',
ext_modules=[Extension('foopkg.foo', ['foo.c'])],
)
The check command allows you to verify if your package meta-data
meet the minimum requirements to build a distribution.
To run it, just call it using your setup.py script. If something is
missing, check will display a warning.
Let’s take an example with a simple script:
from distutils.core import setup
setup(name='foobar')
Running the check command will display some warnings:
$ python setup.py check
running check
warning: check: missing required meta-data: version, url
warning: check: missing meta-data: either (author and author_email) or
(maintainer and maintainer_email) must be supplied
If you use the reStructuredText syntax in the long_description field and
docutils is installed you can check if the syntax is fine with the
check command, using the restructuredtext option.
For example, if the setup.py script is changed like this:
from distutils.core import setup
desc = """\
My description
=============
This is the description of the ``foobar`` package.
"""
setup(name='foobar', version='1', author='tarek',
author_email='tarek@ziade.org',
url='http://example.com', long_description=desc)
Where the long description is broken, check will be able to detect it
by using the docutils parser:
$ python setup.py check --restructuredtext
running check
warning: check: Title underline too short. (line 2)
warning: check: Could not finish the parsing.
Distutils can be extended in various ways. Most extensions take the form of new
commands or replacements for existing commands. New commands may be written to
support new types of platform-specific packaging, for example, while
replacements for existing commands may be made to modify details of how the
command operates on a package.
Most extensions of the distutils are made within setup.py scripts that
want to modify existing commands; many simply add a few file extensions that
should be copied into packages in addition to .py files as a
convenience.
Most distutils command implementations are subclasses of the
distutils.cmd.Command class. New commands may directly inherit from
Command, while replacements often derive from Command
indirectly, directly subclassing the command they are replacing. Commands are
required to derive from Command.
There are different ways to integrate new command implementations into
distutils. The most difficult is to lobby for the inclusion of the new features
in distutils itself, and wait for (and require) a version of Python that
provides that support. This is really hard for many reasons.
The most common, and possibly the most reasonable for most needs, is to include
the new implementations with your setup.py script, and cause the
distutils.core.setup() function use them:
from distutils.command.build_py import build_py as _build_py
from distutils.core import setup
class build_py(_build_py):
"""Specialized Python source builder."""
# implement whatever needs to be different...
setup(cmdclass={'build_py': build_py},
...)
This approach is most valuable if the new implementations must be used to use a
particular package, as everyone interested in the package will need to have the
new command implementation.
Beginning with Python 2.4, a third option is available, intended to allow new
commands to be added which can support existing setup.py scripts without
requiring modifications to the Python installation. This is expected to allow
third-party extensions to provide support for additional packaging systems, but
the commands can be used for anything distutils commands can be used for. A new
configuration option, command_packages (command-line option
--command-packages), can be used to specify additional packages to be
searched for modules implementing commands. Like all distutils options, this
can be specified on the command line or in a configuration file. This option
can only be set in the [global] section of a configuration file, or before
any commands on the command line. If set in a configuration file, it can be
overridden from the command line; setting it to an empty string on the command
line causes the default to be used. This should never be set in a configuration
file provided with a package.
This new option can be used to add any number of packages to the list of
packages searched for command implementations; multiple package names should be
separated by commas. When not specified, the search is only performed in the
distutils.command package. When setup.py is run with the option
--command-packagesdistcmds,buildcmds, however, the packages
distutils.command, distcmds, and buildcmds will be searched
in that order. New commands are expected to be implemented in modules of the
same name as the command by classes sharing the same name. Given the example
command line option above, the command bdist_openpkg could be
implemented by the class distcmds.bdist_openpkg.bdist_openpkg or
buildcmds.bdist_openpkg.bdist_openpkg.
Commands that create distributions (files in the dist/ directory) need
to add (command,filename) pairs to self.distribution.dist_files so that
upload can upload it to PyPI. The filename in the pair contains no
path information, only the name of the file itself. In dry-run mode, pairs
should still be added to represent what would have been created.
This command installs all (Python) scripts in the distribution.
Creating a source distribution: the sdist command¶
The manifest template commands are:
Command
Description
include pat1 pat2 ...
include all files matching any of the listed
patterns
exclude pat1 pat2 ...
exclude all files matching any of the listed
patterns
recursive-include dir pat1 pat2
...
include all files under dir matching any of
the listed patterns
recursive-exclude dir pat1 pat2
...
exclude all files under dir matching any of
the listed patterns
global-include pat1 pat2 ...
include all files anywhere in the source tree
matching — & any of the listed patterns
global-exclude pat1 pat2 ...
exclude all files anywhere in the source tree
matching — & any of the listed patterns
prune dir
exclude all files under dir
graft dir
include all files under dir
The patterns here are Unix-style “glob” patterns: * matches any sequence of
regular filename characters, ? matches any single regular filename
character, and [range] matches any of the characters in range (e.g.,
a-z, a-zA-Z, a-f0-9_.). The definition of “regular filename
character” is platform-specific: on Unix it is anything except slash; on Windows
anything except backslash or colon.
The distutils.core module is the only module that needs to be installed
to use the Distutils. It provides the setup() (which is called from the
setup script). Indirectly provides the distutils.dist.Distribution and
distutils.cmd.Command class.
Run a setup script in a somewhat controlled environment, and return the
distutils.dist.Distribution instance that drives things. This is
useful if you need to find out the distribution meta-data (passed as keyword
args from script to setup()), or the contents of the config files or
command-line.
script_name is a file that will be read and run with exec(). sys.argv[0]
will be replaced with script for the duration of the call. script_args is a
list of strings; if supplied, sys.argv[1:] will be replaced by script_args
for the duration of the call.
stop_after tells setup() when to stop processing; possible values:
value
description
init
Stop after the Distribution
instance has been created and populated
with the keyword arguments to setup()
config
Stop after config files have been parsed
(and their data stored in the
Distribution instance)
commandline
Stop after the command-line
(sys.argv[1:] or script_args) have
been parsed (and the data stored in the
Distribution instance.)
run
Stop after all commands have been run (the
same as if setup() had been called
in the usual way). This is the default
value.
In addition, the distutils.core module exposed a number of classes that
live elsewhere.
The Extension class describes a single C or C++extension module in a setup
script. It accepts the following keyword arguments in its constructor
argument name
value
type
name
the full name of the
extension, including any
packages — ie. not a
filename or pathname, but
Python dotted name
string
sources
list of source filenames,
relative to the distribution
root (where the setup script
lives), in Unix form (slash-
separated) for portability.
Source files may be C, C++,
SWIG (.i), platform-specific
resource files, or whatever
else is recognized by the
build_ext command
as source for a Python
extension.
string
include_dirs
list of directories to search
for C/C++ header files (in
Unix form for portability)
string
define_macros
list of macros to define; each
macro is defined using a
2-tuple (name,value),
where value is
either the string to define it
to or None to define it
without a particular value
(equivalent of #defineFOO
in source or -DFOO
on Unix C compiler command
line)
(string, string) tuple or
(name, None)
undef_macros
list of macros to undefine
explicitly
string
library_dirs
list of directories to search
for C/C++ libraries at link
time
string
libraries
list of library names (not
filenames or paths) to link
against
string
runtime_library_dirs
list of directories to search
for C/C++ libraries at run
time (for shared extensions,
this is when the extension is
loaded)
string
extra_objects
list of extra files to link
with (eg. object files not
implied by ‘sources’, static
library that must be
explicitly specified, binary
resource files, etc.)
string
extra_compile_args
any extra platform- and
compiler-specific information
to use when compiling the
source files in ‘sources’. For
platforms and compilers where
a command line makes sense,
this is typically a list of
command-line arguments, but
for other platforms it could
be anything.
string
extra_link_args
any extra platform- and
compiler-specific information
to use when linking object
files together to create the
extension (or to create a new
static Python interpreter).
Similar interpretation as for
‘extra_compile_args’.
string
export_symbols
list of symbols to be exported
from a shared extension. Not
used on all platforms, and not
generally necessary for Python
extensions, which typically
export exactly one symbol:
init + extension_name.
string
depends
list of files that the
extension depends on
string
language
extension language (i.e.
'c', 'c++',
'objc'). Will be detected
from the source extensions if
not provided.
string
optional
specifies that a build failure
in the extension should not
abort the build process, but
simply skip the extension.
This module provides the abstract base class for the CCompiler
classes. A CCompiler instance can be used for all the compile and
link steps needed to build a single project. Methods are provided to set
options for the compiler — macro definitions, include directories, link path,
libraries and the like.
Generate linker options for searching library directories and linking with
specific libraries. libraries and library_dirs are, respectively, lists of
library names (not filenames!) and search directories. Returns a list of
command-line options suitable for use with some compiler (depending on the two
format strings passed in).
Generate C pre-processor options (-D, -U, -I) as
used by at least two types of compilers: the typical Unix compiler and Visual
C++. macros is the usual thing, a list of 1- or 2-tuples, where (name,)
means undefine (-U) macro name, and (name,value) means define
(-D) macro name to value. include_dirs is just a list of
directory names to be added to the header file search path (-I).
Returns a list of command-line options suitable for either Unix compilers or
Visual C++.
Determine the default compiler to use for the given platform.
osname should be one of the standard Python OS names (i.e. the ones returned
by os.name) and platform the common value returned by sys.platform for
the platform in question.
The default values are os.name and sys.platform in case the parameters
are not given.
Factory function to generate an instance of some CCompiler subclass for the
supplied platform/compiler combination. plat defaults to os.name (eg.
'posix', 'nt'), and compiler defaults to the default compiler for
that platform. Currently only 'posix' and 'nt' are supported, and the
default compilers are “traditional Unix interface” (UnixCCompiler
class) and Visual C++ (MSVCCompiler class). Note that it’s perfectly
possible to ask for a Unix compiler object under Windows, and a Microsoft
compiler object under Unix—if you supply a value for compiler, plat is
ignored.
Print list of available compilers (used by the --help-compiler options
to build, build_ext, build_clib).
class distutils.ccompiler.CCompiler([verbose=0, dry_run=0, force=0])¶
The abstract base class CCompiler defines the interface that must be
implemented by real compiler classes. The class also has some utility methods
used by several compiler classes.
The basic idea behind a compiler abstraction class is that each instance can be
used for all the compile/link steps in building a single project. Thus,
attributes common to all of those compile and link steps — include
directories, macros to define, libraries to link against, etc. — are
attributes of the compiler instance. To allow for variability in how individual
files are treated, most of those attributes may be varied on a per-compilation
or per-link basis.
The constructor for each subclass creates an instance of the Compiler object.
Flags are verbose (show verbose output), dry_run (don’t actually execute the
steps) and force (rebuild everything, regardless of dependencies). All of
these flags default to 0 (off). Note that you probably don’t want to
instantiate CCompiler or one of its subclasses directly - use the
distutils.CCompiler.new_compiler() factory function instead.
The following methods allow you to manually alter compiler options for the
instance of the Compiler class.
Add dir to the list of directories that will be searched for header files.
The compiler is instructed to search directories in the order in which they are
supplied by successive calls to add_include_dir().
Set the list of directories that will be searched to dirs (a list of strings).
Overrides any preceding calls to add_include_dir(); subsequent calls to
add_include_dir() add to the list passed to set_include_dirs().
This does not affect any list of standard include directories that the compiler
may search by default.
Add libname to the list of libraries that will be included in all links driven
by this compiler object. Note that libname should *not* be the name of a
file containing a library, but the name of the library itself: the actual
filename will be inferred by the linker, the compiler, or the compiler class
(depending on the platform).
The linker will be instructed to link against libraries in the order they were
supplied to add_library() and/or set_libraries(). It is perfectly
valid to duplicate library names; the linker will be instructed to link against
libraries as many times as they are mentioned.
Set the list of libraries to be included in all links driven by this compiler
object to libnames (a list of strings). This does not affect any standard
system libraries that the linker may include by default.
Set the list of library search directories to dirs (a list of strings). This
does not affect any standard library search path that the linker may search by
default.
Set the list of directories to search for shared libraries at runtime to dirs
(a list of strings). This does not affect any standard search path that the
runtime linker may search by default.
Define a preprocessor macro for all compilations driven by this compiler object.
The optional parameter value should be a string; if it is not supplied, then
the macro will be defined without an explicit value and the exact outcome
depends on the compiler used (XXX true? does ANSI say anything about this?)
Undefine a preprocessor macro for all compilations driven by this compiler
object. If the same macro is defined by define_macro() and
undefined by undefine_macro() the last call takes precedence
(including multiple redefinitions or undefinitions). If the macro is
redefined/undefined on a per-compilation basis (ie. in the call to
compile()), then that takes precedence.
Add object to the list of object files (or analogues, such as explicitly named
library files or the output of “resource compilers”) to be included in every
link driven by this compiler object.
Set the list of object files (or analogues) to be included in every link to
objects. This does not affect any standard object files that the linker may
include by default (such as system libraries).
The following methods implement methods for autodetection of compiler options,
providing some functionality similar to GNU autoconf.
Detect the language of a given file, or list of files. Uses the instance
attributes language_map (a dictionary), and language_order (a
list) to do the job.
Search the specified list of directories for a static or shared library file
lib and return the full path to that file. If debug is true, look for a
debugging version (if that makes sense on the current platform). Return
None if lib wasn’t found in any of the specified directories.
Return a boolean indicating whether funcname is supported on the current
platform. The optional arguments can be used to augment the compilation
environment by providing additional include files and paths and libraries and
paths.
Define the executables (and options for them) that will be run to perform the
various stages of compilation. The exact set of executables that may be
specified here depends on the compiler class (via the ‘executables’ class
attribute), but most will have:
attribute
description
compiler
the C/C++ compiler
linker_so
linker used to create shared objects and
libraries
linker_exe
linker used to create binary executables
archiver
static library creator
On platforms with a command-line (Unix, DOS/Windows), each of these is a string
that will be split into executable name and (optional) list of arguments.
(Splitting the string is done similarly to how Unix shells operate: words are
delimited by spaces, but quotes and backslashes can override this. See
distutils.util.split_quoted().)
The following methods invoke stages in the build process.
Compile one or more source files. Generates object files (e.g. transforms a
.c file to a .o file.)
sources must be a list of filenames, most likely C/C++ files, but in reality
anything that can be handled by a particular compiler and compiler class (eg.
MSVCCompiler can handle resource files in sources). Return a list of
object filenames, one per source filename in sources. Depending on the
implementation, not all source files will necessarily be compiled, but all
corresponding object filenames will be returned.
If output_dir is given, object files will be put under it, while retaining
their original path component. That is, foo/bar.c normally compiles to
foo/bar.o (for a Unix implementation); if output_dir is build, then
it would compile to build/foo/bar.o.
macros, if given, must be a list of macro definitions. A macro definition is
either a (name,value) 2-tuple or a (name,) 1-tuple. The former defines
a macro; if the value is None, the macro is defined without an explicit
value. The 1-tuple case undefines a macro. Later
definitions/redefinitions/undefinitions take precedence.
include_dirs, if given, must be a list of strings, the directories to add to
the default include file search path for this compilation only.
debug is a boolean; if true, the compiler will be instructed to output debug
symbols in (or alongside) the object file(s).
extra_preargs and extra_postargs are implementation-dependent. On platforms
that have the notion of a command-line (e.g. Unix, DOS/Windows), they are most
likely lists of strings: extra command-line arguments to prepend/append to the
compiler command line. On other platforms, consult the implementation class
documentation. In any event, they are intended as an escape hatch for those
occasions when the abstract compiler framework doesn’t cut the mustard.
depends, if given, is a list of filenames that all targets depend on. If a
source file is older than any file in depends, then the source file will be
recompiled. This supports dependency tracking, but only at a coarse
granularity.
Link a bunch of stuff together to create a static library file. The “bunch of
stuff” consists of the list of object files supplied as objects, the extra
object files supplied to add_link_object() and/or
set_link_objects(), the libraries supplied to add_library() and/or
set_libraries(), and the libraries supplied as libraries (if any).
output_libname should be a library name, not a filename; the filename will be
inferred from the library name. output_dir is the directory where the library
file will be put. XXX defaults to what?
debug is a boolean; if true, debugging information will be included in the
library (note that on most platforms, it is the compile step where this matters:
the debug flag is included here just for consistency).
target_lang is the target language for which the given objects are being
compiled. This allows specific linkage time treatment of certain languages.
Link a bunch of stuff together to create an executable or shared library file.
The “bunch of stuff” consists of the list of object files supplied as objects.
output_filename should be a filename. If output_dir is supplied,
output_filename is relative to it (i.e. output_filename can provide
directory components if needed).
libraries is a list of libraries to link against. These are library names,
not filenames, since they’re translated into filenames in a platform-specific
way (eg. foo becomes libfoo.a on Unix and foo.lib on
DOS/Windows). However, they can include a directory component, which means the
linker will look in that specific directory rather than searching all the normal
locations.
library_dirs, if supplied, should be a list of directories to search for
libraries that were specified as bare library names (ie. no directory
component). These are on top of the system default and those supplied to
add_library_dir() and/or set_library_dirs(). runtime_library_dirs
is a list of directories that will be embedded into the shared library and used
to search for other shared libraries that *it* depends on at run-time. (This
may only be relevant on Unix.)
export_symbols is a list of symbols that the shared library will export.
(This appears to be relevant only on Windows.)
debug is as for compile() and create_static_lib(), with the
slight distinction that it actually matters on most platforms (as opposed to
create_static_lib(), which includes a debug flag mostly for form’s
sake).
extra_preargs and extra_postargs are as for compile() (except of
course that they supply command-line arguments for the particular linker being
used).
target_lang is the target language for which the given objects are being
compiled. This allows specific linkage time treatment of certain languages.
Link an executable. output_progname is the name of the file executable, while
objects are a list of object filenames to link in. Other arguments are as for
the link() method.
Link a shared library. output_libname is the name of the output library,
while objects is a list of object filenames to link in. Other arguments are
as for the link() method.
Link a shared object. output_filename is the name of the shared object that
will be created, while objects is a list of object filenames to link in.
Other arguments are as for the link() method.
Preprocess a single C/C++ source file, named in source. Output will be written
to file named output_file, or stdout if output_file not supplied.
macros is a list of macro definitions as for compile(), which will
augment the macros set with define_macro() and undefine_macro().
include_dirs is a list of directory names that will be added to the default
list, in the same way as add_include_dir().
Raises PreprocessError on failure.
The following utility methods are defined by the CCompiler class, for
use by the various concrete subclasses.
Returns the filename of the executable for the given basename. Typically for
non-Windows platforms this is the same as the basename, while Windows will get
a .exe added.
Returns the filename for the given library name on the current platform. On Unix
a library with lib_type of 'static' will typically be of the form
liblibname.a, while a lib_type of 'dynamic' will be of the form
liblibname.so.
Invokes distutils.util.execute() This method invokes a Python function
func with the given arguments args, after logging and taking into account
the dry_run flag. XXX see also.
This module provides MSVCCompiler, an implementation of the abstract
CCompiler class for Microsoft Visual Studio. Typically, extension
modules need to be compiled with the same compiler that was used to compile
Python. For Python 2.3 and earlier, the compiler was Visual Studio 6. For Python
2.4 and 2.5, the compiler is Visual Studio .NET 2003. The AMD64 and Itanium
binaries are created using the Platform SDK.
MSVCCompiler will normally choose the right compiler, linker etc. on
its own. To override this choice, the environment variables DISTUTILS_USE_SDK
and MSSdk must be both set. MSSdk indicates that the current environment has
been setup by the SDK’s SetEnv.Cmd script, or that the environment variables
had been registered when the SDK was installed; DISTUTILS_USE_SDK indicates
that the distutils user has made an explicit choice to override the compiler
selection by MSVCCompiler.
This module provides the CygwinCCompiler class, a subclass of
UnixCCompiler that handles the Cygwin port of the GNU C compiler to
Windows. It also contains the Mingw32CCompiler class which handles the mingw32
port of GCC (same as cygwin in no-cygwin mode).
Create an archive file (eg. zip or tar). base_name is the name of
the file to create, minus any format-specific extension; format is the
archive format: one of zip, tar, ztar, or gztar. root_dir is
a directory that will be the root directory of the archive; ie. we typically
chdir into root_dir before creating the archive. base_dir is the
directory where we start archiving from; ie. base_dir will be the common
prefix of all files and directories in the archive. root_dir and base_dir
both default to the current directory. Returns the name of the archive file.
‘Create an (optional compressed) archive as a tar file from all files in and
under base_dir. compress must be 'gzip' (the default), 'compress',
'bzip2', or None. Both tar and the compression utility named
by compress must be on the default program search path, so this is probably
Unix-specific. The output tar file will be named base_dir.tar,
possibly plus the appropriate compression extension (.gz, .bz2
or .Z). Return the output filename.
Create a zip file from all files in and under base_dir. The output zip file
will be named base_name + .zip. Uses either the zipfile Python
module (if available) or the InfoZIP zip utility (if installed and
found on the default search path). If neither tool is available, raises
DistutilsExecError. Returns the name of the output zip file.
This module provides functions for performing simple, timestamp-based
dependency of files and groups of files; also, functions based entirely on such
timestamp dependency analysis.
Return true if source exists and is more recently modified than target, or
if source exists and target doesn’t. Return false if both exist and target
is the same age or newer than source. Raise DistutilsFileError if
source does not exist.
Walk two filename lists in parallel, testing if each source is newer than its
corresponding target. Return a pair of lists (sources, targets) where
source is newer than target, according to the semantics of newer()
Return true if target is out-of-date with respect to any file listed in
sources In other words, if target exists and is newer than every file in
sources, return false; otherwise return true. missing controls what we do
when a source file is missing; the default ('error') is to blow up with an
OSError from inside os.stat(); if it is 'ignore', we silently
drop any missing source files; if it is 'newer', any missing source files
make us assume that target is out-of-date (this is handy in “dry-run” mode:
it’ll make you pretend to carry out commands that wouldn’t work because inputs
are missing, but that doesn’t matter because you’re not actually going to run
the commands).
Create a directory and any missing ancestor directories. If the directory
already exists (or if name is the empty string, which means the current
directory, which of course exists), then do nothing. Raise
DistutilsFileError if unable to create some directory along the way (eg.
some sub-path exists, but is a file rather than a directory). If verbose is
true, print a one-line summary of each mkdir to stdout. Return the list of
directories actually created.
Create all the empty directories under base_dir needed to put files there.
base_dir is just the a name of a directory which doesn’t necessarily exist
yet; files is a list of filenames to be interpreted relative to base_dir.
base_dir + the directory portion of every file in files will be created if
it doesn’t already exist. mode, verbose and dry_run flags are as for
mkpath().
Copy an entire directory tree src to a new location dst. Both src and
dst must be directory names. If src is not a directory, raise
DistutilsFileError. If dst does not exist, it is created with
mkpath(). The end result of the copy is that every file in src is
copied to dst, and directories under src are recursively copied to dst.
Return the list of files that were copied or might have been copied, using their
output name. The return value is unaffected by update or dry_run: it is
simply the list of all files under src, with the names changed to be under
dst.
preserve_mode and preserve_times are the same as for copy_file() in
distutils.file_util; note that they only apply to regular files, not to
directories. If preserve_symlinks is true, symlinks will be copied as
symlinks (on platforms that support them!); otherwise (the default), the
destination of the symlink will be copied. update and verbose are the same
as for copy_file().
Recursively remove directory and all files and directories underneath it. Any
errors are ignored (apart from being reported to sys.stdout if verbose is
true).
Copy file src to dst. If dst is a directory, then src is copied there
with the same name; otherwise, it must be a filename. (If the file exists, it
will be ruthlessly clobbered.) If preserve_mode is true (the default), the
file’s mode (type and permission bits, or whatever is analogous on the
current platform) is copied. If preserve_times is true (the default), the
last-modified and last-access times are copied as well. If update is true,
src will only be copied if dst does not exist, or if dst does exist but
is older than src.
link allows you to make hard links (using os.link()) or symbolic links
(using os.symlink()) instead of copying: set it to 'hard' or
'sym'; if it is None (the default), files are copied. Don’t set link
on systems that don’t support it: copy_file() doesn’t check if hard or
symbolic linking is available. It uses _copy_file_contents() to copy file
contents.
Return a tuple (dest_name,copied): dest_name is the actual name of the
output file, and copied is true if the file was copied (or would have been
copied, if dry_run true).
Move file src to dst. If dst is a directory, the file will be moved into
it with the same name; otherwise, src is just renamed to dst. Returns the
new full name of the file.
Warning
Handles cross-device moves on Unix using copy_file(). What about
other systems?
Return a string that identifies the current platform. This is used mainly to
distinguish platform-specific build directories and platform-specific built
distributions. Typically includes the OS name and version and the architecture
(as supplied by ‘os.uname()’), although the exact information included depends
on the OS; eg. for IRIX the architecture isn’t particularly important (IRIX only
runs on SGI hardware), but for Linux the kernel version isn’t particularly
important.
Examples of returned values:
linux-i586
linux-alpha
solaris-2.6-sun4u
irix-5.3
irix64-6.2
For non-POSIX platforms, currently just returns sys.platform.
For Mac OS X systems the OS version reflects the minimal version on which
binaries will run (that is, the value of MACOSX_DEPLOYMENT_TARGET
during the build of Python), not the OS version of the current system.
For universal binary builds on Mac OS X the architecture value reflects
the univeral binary status instead of the architecture of the current
processor. For 32-bit universal binaries the architecture is fat,
for 64-bit universal binaries the architecture is fat64, and
for 4-way universal binaries the architecture is universal. Starting
from Python 2.7 and Python 3.2 the architecture fat3 is used for
a 3-way universal build (ppc, i386, x86_64) and intel is used for
a univeral build with the i386 and x86_64 architectures
Return ‘pathname’ as a name that will work on the native filesystem, i.e. split
it on ‘/’ and put it back together again using the current directory separator.
Needed because filenames in the setup script are always supplied in Unix style,
and have to be converted to the local convention before we can actually use them
in the filesystem. Raises ValueError on non-Unix-ish systems if
pathname either starts or ends with a slash.
Return pathname with new_root prepended. If pathname is relative, this is
equivalent to os.path.join(new_root,pathname) Otherwise, it requires making
pathname relative and then joining the two, which is tricky on DOS/Windows.
Ensure that ‘os.environ’ has all the environment variables we guarantee that
users can use in config files, command-line options, etc. Currently this
includes:
HOME - user’s home directory (Unix only)
PLAT - description of the current platform, including hardware and
OS (see get_platform())
Perform shell/Perl-style variable substitution on s. Every occurrence of
$ followed by a name is considered a variable, and variable is substituted
by the value found in the local_vars dictionary, or in os.environ if it’s
not in local_vars. os.environ is first checked/augmented to guarantee that
it contains certain values: see check_environ(). Raise ValueError
for any variables not found in either local_vars or os.environ.
Note that this is not a fully-fledged string interpolation function. A valid
$variable can consist only of upper and lower case letters, numbers and an
underscore. No { } or ( ) style quoting is available.
Generate a useful error message from an EnvironmentError (IOError
or OSError) exception object. Handles Python 1.5.1 and later styles,
and does what it can to deal with exception objects that don’t have a filename
(which happens when the error is due to a two-file operation, such as
rename() or link()). Returns the error message as a string
prefixed with prefix.
Split a string up according to Unix shell-like rules for quotes and backslashes.
In short: words are delimited by spaces, as long as those spaces are not escaped
by a backslash, or inside a quoted string. Single and double quotes are
equivalent, and the quote characters can be backslash-escaped. The backslash is
stripped from any two-character escape sequence, leaving only the escaped
character. The quote characters are stripped from any quoted string. Returns a
list of words.
Perform some action that affects the outside world (for instance, writing to the
filesystem). Such actions are special because they are disabled by the
dry_run flag. This method takes care of all that bureaucracy for you; all
you have to do is supply the function to call and an argument tuple for it (to
embody the “external action” being performed), and an optional message to print.
Byte-compile a collection of Python source files to either .pyc or
.pyo files in the same directory. py_files is a list of files to
compile; any files that don’t end in .py are silently skipped.
optimize must be one of the following:
0 - don’t optimize (generate .pyc)
1 - normal optimization (like python-O)
2 - extra optimization (like python-OO)
If force is true, all files are recompiled regardless of timestamps.
The source filename encoded in each bytecode file defaults to the filenames
listed in py_files; you can modify these with prefix and basedir.
prefix is a string that will be stripped off of each source filename, and
base_dir is a directory name that will be prepended (after prefix is
stripped). You can supply either or both (or neither) of prefix and
base_dir, as you wish.
If dry_run is true, doesn’t actually do anything that would affect the
filesystem.
Byte-compilation is either done directly in this interpreter process with the
standard py_compile module, or indirectly by writing a temporary script
and executing it. Normally, you should let byte_compile() figure out to
use direct compilation or not (see the source for details). The direct flag
is used by the script generated in indirect mode; unless you know what you’re
doing, leave it set to None.
Return a version of header escaped for inclusion in an RFC 822 header, by
ensuring there are 8 spaces space after each newline. Note that it does no other
modification of the string.
Provides exceptions used by the Distutils modules. Note that Distutils modules
may raise standard exceptions; in particular, SystemExit is usually raised for
errors that are obviously the end-user’s fault (eg. bad command-line arguments).
This module is safe to use in from...import* mode; it only exports
symbols whose names start with Distutils and end with Error.
This module provides a wrapper around the standard getopt module that
provides the following additional features:
short and long options are tied together
options have help strings, so fancy_getopt() could potentially create a
complete usage summary
options set attributes of a passed-in object
boolean options can have “negative aliases” — eg. if --quiet is
the “negative alias” of --verbose, then --quiet on the
command line sets verbose to false.
Wrapper function. options is a list of (long_option,short_option,help_string) 3-tuples as described in the constructor for
FancyGetopt. negative_opt should be a dictionary mapping option names
to option names, both the key and value should be in the options list.
object is an object which will be used to store values (see the getopt()
method of the FancyGetopt class). args is the argument list. Will use
sys.argv[1:] if you pass None as args.
class distutils.fancy_getopt.FancyGetopt([option_table=None])¶
The option_table is a list of 3-tuples: (long_option,short_option,help_string)
If an option takes an argument, its long_option should have '=' appended;
short_option should just be a single character, no ':' in any case.
short_option should be None if a long_option doesn’t have a
corresponding short_option. All option tuples must have long options.
The FancyGetopt class provides the following methods:
Parse command-line options in args. Store as attributes on object.
If args is None or not supplied, uses sys.argv[1:]. If object is
None or not supplied, creates a new OptionDummy instance, stores
option values there, and returns a tuple (args,object). If object is
supplied, it is modified in place and getopt() just returns args; in
both cases, the returned args is a modified copy of the passed-in args list,
which is left untouched.
This module provides the spawn() function, a front-end to various
platform-specific functions for launching another program in a sub-process.
Also provides find_executable() to search the path for a given executable
name.
The distutils.sysconfig module provides access to Python’s low-level
configuration information. The specific configuration variables available
depend heavily on the platform and configuration. The specific variables depend
on the build process for the specific version of Python being run; the variables
are those found in the Makefile and configuration header that are
installed with Python on Unix systems. The configuration header is called
pyconfig.h for Python versions starting with 2.2, and config.h
for earlier versions of Python.
Some additional functions are provided which perform some useful manipulations
for other parts of the distutils package.
Return a set of variable definitions. If there are no arguments, this returns a
dictionary mapping names of configuration variables to values. If arguments are
provided, they should be strings, and the return value will be a sequence giving
the associated values. If a given name does not have a corresponding value,
None will be included for that variable.
Return the full path name of the configuration header. For Unix, this will be
the header generated by the configure script; for other platforms the
header will have been supplied directly by the Python source distribution. The
file is a platform-specific text file.
Return the full path name of the Makefile used to build Python. For
Unix, this will be a file generated by the configure script; the
meaning for other platforms will vary. The file is a platform-specific text
file, if it exists. This function is only useful on POSIX platforms.
Return the directory for either the general or platform-dependent C include
files. If plat_specific is true, the platform-dependent include directory is
returned; if false or omitted, the platform-independent directory is returned.
If prefix is given, it is used as either the prefix instead of
PREFIX, or as the exec-prefix instead of EXEC_PREFIX if
plat_specific is true.
Return the directory for either the general or platform-dependent library
installation. If plat_specific is true, the platform-dependent include
directory is returned; if false or omitted, the platform-independent directory
is returned. If prefix is given, it is used as either the prefix instead of
PREFIX, or as the exec-prefix instead of EXEC_PREFIX if
plat_specific is true. If standard_lib is true, the directory for the
standard library is returned rather than the directory for the installation of
third-party extensions.
The following function is only intended for use within the distutils
package.
This function is only needed on Unix at this time, but should be called
consistently to support forward-compatibility. It inserts the information that
varies across Unix flavors and is stored in Python’s Makefile. This
information includes the selected compiler, compiler and linker options, and the
extension used by the linker for shared objects.
This function is even more special-purpose, and should only be used from
Python’s own build procedures.
Inform the distutils.sysconfig module that it is being used as part of
the build process for Python. This changes a lot of relative locations for
files, allowing them to be located in the build area rather than in an installed
Python.
This module provides the TextFile class, which gives an interface to
text files that (optionally) takes care of stripping comments, ignoring blank
lines, and joining lines with backslashes.
class distutils.text_file.TextFile([filename=None, file=None, **options])¶
This class provides a file-like object that takes care of all the things you
commonly want to do when processing a text file that has some line-by-line
syntax: strip comments (as long as # is your comment character), skip blank
lines, join adjacent lines by escaping the newline (ie. backslash at end of
line), strip leading and/or trailing whitespace. All of these are optional and
independently controllable.
The class provides a warn() method so you can generate warning messages
that report physical line number, even if the logical line in question spans
multiple physical lines. Also provides unreadline() for implementing
line-at-a-time lookahead.
TextFile instances are create with either filename, file, or both.
RuntimeError is raised if both are None. filename should be a
string, and file a file object (or something that provides readline()
and close() methods). It is recommended that you supply at least
filename, so that TextFile can include it in warning messages. If
file is not supplied, TextFile creates its own using the
open() built-in function.
The options are all boolean, and affect the values returned by readline()
option name
description
default
strip_comments
strip from '#' to end-of-
line, as well as any
whitespace leading up to the
'#'—unless it is
escaped by a backslash
true
lstrip_ws
strip leading whitespace from
each line before returning it
false
rstrip_ws
strip trailing whitespace
(including line terminator!)
from each line before
returning it.
true
skip_blanks
skip lines that are empty
*after* stripping comments
and whitespace. (If both
lstrip_ws and rstrip_ws are
false, then some lines may
consist of solely whitespace:
these will *not* be skipped,
even if skip_blanks is
true.)
true
join_lines
if a backslash is the last
non-newline character on a
line after stripping comments
and whitespace, join the
following line to it to form
one logical line; if N
consecutive lines end with a
backslash, then N+1 physical
lines will be joined to form
one logical line.
false
collapse_join
strip leading whitespace from
lines that are joined to their
predecessor; only matters if
(join_linesandnotlstrip_ws)
false
Note that since rstrip_ws can strip the trailing newline, the semantics of
readline() must differ from those of the built-in file object’s
readline() method! In particular, readline() returns None for
end-of-file: an empty string might just be a blank line (or an all-whitespace
line), if rstrip_ws is true but skip_blanks is not.
Print (to stderr) a warning message tied to the current logical line in the
current file. If the current logical line in the file spans multiple physical
lines, the warning refers to the whole range, such as "lines3-5". If
line is supplied, it overrides the current line number; it may be a list or
tuple to indicate a range of physical lines, or an integer for a single
physical line.
Read and return a single logical line from the current file (or from an internal
buffer if lines have previously been “unread” with unreadline()). If the
join_lines option is true, this may involve reading multiple physical lines
concatenated into a single string. Updates the current line number, so calling
warn() after readline() emits a warning about the physical line(s)
just read. Returns None on end-of-file, since the empty string can occur
if rstrip_ws is true but strip_blanks is not.
Push line (a string) onto an internal buffer that will be checked by future
readline() calls. Handy for implementing a parser with line-at-a-time
lookahead. Note that lines that are “unread” with unreadline() are not
subsequently re-cleansed (whitespace stripped, or whatever) when read with
readline(). If multiple calls are made to unreadline() before a call
to readline(), the lines will be returned most in most recent first order.
Abstract base class for defining command classes, the “worker bees” of the
Distutils. A useful analogy for command classes is to think of them as
subroutines with local variables called options. The options are declared
in initialize_options() and defined (given their final values) in
finalize_options(), both of which must be defined by every command
class. The distinction between the two is necessary because option values
might come from the outside world (command line, config file, ...), and any
options dependent on other options must be computed after these outside
influences have been processed — hence finalize_options(). The body
of the subroutine, where it does all its work based on the values of its
options, is the run() method, which must also be implemented by every
command class.
The class constructor takes a single argument dist, a Distribution
instance.
This section outlines the steps to create a new Distutils command.
A new command lives in a module in the distutils.command package. There
is a sample template in that directory called command_template. Copy
this file to a new module with the same name as the new command you’re
implementing. This module should implement a class with the same name as the
module (and the command). So, for instance, to create the command
peel_banana (so that users can run setup.pypeel_banana), you’d copy
command_template to distutils/command/peel_banana.py, then edit
it so that it’s implementing the class peel_banana, a subclass of
distutils.cmd.Command.
Subclasses of Command must define the following methods.
Set default values for all the options that this command supports. Note that
these defaults may be overridden by other commands, by the setup script, by
config files, or by the command-line. Thus, this is not the place to code
dependencies between options; generally, initialize_options()
implementations are just a bunch of self.foo=None assignments.
Set final values for all the options that this command supports. This is
always called as late as possible, ie. after any option assignments from the
command-line or from other commands have been done. Thus, this is the place
to to code option dependencies: if foo depends on bar, then it is safe to
set foo from bar as long as foo still has the same value it was
assigned in initialize_options().
A command’s raison d’etre: carry out the action it exists to perform, controlled
by the options initialized in initialize_options(), customized by other
commands, the setup script, the command-line, and config files, and finalized in
finalize_options(). All terminal output and filesystem interaction should
be done by run().
sub_commands formalizes the notion of a “family” of commands,
e.g. install as the parent with sub-commands install_lib,
install_headers, etc. The parent of a family of commands defines
sub_commands as a class attribute; it’s a list of 2-tuples (command_name,predicate), with command_name a string and predicate a function, a
string or None. predicate is a method of the parent command that
determines whether the corresponding command is applicable in the current
situation. (E.g. install_headers is only applicable if we have any C
header files to install.) If predicate is None, that command is always
applicable.
sub_commands is usually defined at the end of a class, because
predicates can be methods of the class, so they must already have been
defined. The canonical example is the install command.
In most cases, the bdist_msi installer is a better choice than the
bdist_wininst installer, because it provides better support for
Win64 platforms, allows administrators to perform non-interactive
installations, and allows installation through group policies.
Alternative implementation of build_py which also runs the
2to3 conversion library on each .py file that is going to be
installed. To use this in a setup.py file for a distribution
that is designed to run with both Python 2.x and 3.x, add:
The check command performs some tests on the meta-data of a package.
For example, it verifies that all required meta-data are provided as
the arguments passed to the setup() function.
This document describes the Python Distribution Utilities (“Distutils”) from the
end-user’s point-of-view, describing how to extend the capabilities of a
standard Python installation by building and installing third-party Python
modules and extensions.
Although Python’s extensive standard library covers many programming needs,
there often comes a time when you need to add some new functionality to your
Python installation in the form of third-party modules. This might be necessary
to support your own programming, or to support an application that you want to
use and that happens to be written in Python.
In the past, there has been little support for adding third-party modules to an
existing Python installation. With the introduction of the Python Distribution
Utilities (Distutils for short) in Python 2.0, this changed.
This document is aimed primarily at the people who need to install third-party
Python modules: end-users and system administrators who just need to get some
Python application running, and existing Python programmers who want to add some
new goodies to their toolbox. You don’t need to know Python to read this
document; there will be some brief forays into using Python’s interactive mode
to explore your installation, but that’s it. If you’re looking for information
on how to distribute your own Python modules so that others may use them, see
the Distributing Python Modules manual.
In the best case, someone will have prepared a special version of the module
distribution you want to install that is targeted specifically at your platform
and is installed just like any other software on your platform. For example,
the module developer might make an executable installer available for Windows
users, an RPM package for users of RPM-based Linux systems (Red Hat, SuSE,
Mandrake, and many others), a Debian package for users of Debian-based Linux
systems, and so forth.
In that case, you would download the installer appropriate to your platform and
do the obvious thing with it: run it if it’s an executable installer, rpm--install it if it’s an RPM, etc. You don’t need to run Python or a setup
script, you don’t need to compile anything—you might not even need to read any
instructions (although it’s always a good idea to do so anyway).
Of course, things will not always be that easy. You might be interested in a
module distribution that doesn’t have an easy-to-use installer for your
platform. In that case, you’ll have to start with the source distribution
released by the module’s author/maintainer. Installing from a source
distribution is not too hard, as long as the modules are packaged in the
standard way. The bulk of this document is about building and installing
modules from standard source distributions.
If you download a module source distribution, you can tell pretty quickly if it
was packaged and distributed in the standard way, i.e. using the Distutils.
First, the distribution’s name and version number will be featured prominently
in the name of the downloaded archive, e.g. foo-1.0.tar.gz or
widget-0.9.7.zip. Next, the archive will unpack into a similarly-named
directory: foo-1.0 or widget-0.9.7. Additionally, the
distribution will contain a setup script setup.py, and a file named
README.txt or possibly just README, which should explain that
building and installing the module distribution is a simple matter of running
one command from a terminal:
python setup.py install
For Windows, this command should be run from a command prompt windows (“DOS
box”):
setup.py install
If all these things are true, then you already know how to build and install the
modules you’ve just downloaded: Run the command above. Unless you need to
install things in a non-standard way or customize the build process, you don’t
really need this manual. Or rather, the above command is everything you need to
get out of this manual.
As described in section The new standard: Distutils, building and installing a module
distribution using the Distutils is usually one simple command to run from a
terminal:
You should always run the setup command from the distribution root directory,
i.e. the top-level subdirectory that the module source distribution unpacks
into. For example, if you’ve just downloaded a module source distribution
foo-1.0.tar.gz onto a Unix system, the normal thing to do is:
gunzip -c foo-1.0.tar.gz | tar xf - # unpacks into directory foo-1.0
cd foo-1.0
python setup.py install
On Windows, you’d probably download foo-1.0.zip. If you downloaded the
archive file to C:\Temp, then it would unpack into
C:\Temp\foo-1.0; you can use either a archive manipulator with a
graphical user interface (such as WinZip) or a command-line tool (such as
unzip or pkunzip) to unpack the archive. Then, open a
command prompt window (“DOS box”), and run:
Running setup.pyinstall builds and installs all modules in one run. If you
prefer to work incrementally—especially useful if you want to customize the
build process, or if things are going wrong—you can use the setup script to do
one thing at a time. This is particularly helpful when the build and install
will be done by different users—for example, you might want to build a module
distribution and hand it off to a system administrator for installation (or do
it yourself, with super-user privileges).
For example, you can build everything in one step, and then install everything
in a second step, by invoking the setup script twice:
python setup.py build
python setup.py install
If you do this, you will notice that running the install command
first runs the build command, which—in this case—quickly notices
that it has nothing to do, since everything in the build directory is
up-to-date.
You may not need this ability to break things down often if all you do is
install modules downloaded off the ‘net, but it’s very handy for more advanced
tasks. If you get into distributing your own Python modules and extensions,
you’ll run lots of individual Distutils commands on their own.
As implied above, the build command is responsible for putting the
files to install into a build directory. By default, this is build
under the distribution root; if you’re excessively concerned with speed, or want
to keep the source tree pristine, you can change the build directory with the
--build-base option. For example:
(Or you could do this permanently with a directive in your system or personal
Distutils configuration file; see section Distutils Configuration Files.) Normally, this
isn’t necessary.
The default layout for the build tree is as follows:
--- build/ --- lib/
or
--- build/ --- lib.<plat>/
temp.<plat>/
where <plat> expands to a brief description of the current OS/hardware
platform and Python version. The first form, with just a lib directory,
is used for “pure module distributions”—that is, module distributions that
include only pure Python modules. If a module distribution contains any
extensions (modules written in C/C++), then the second form, with two <plat>
directories, is used. In that case, the temp.plat directory holds
temporary files generated by the compile/link process that don’t actually get
installed. In either case, the lib (or lib.plat) directory
contains all Python modules (pure Python and extensions) that will be installed.
In the future, more directories will be added to handle Python scripts,
documentation, binary executables, and whatever else is needed to handle the job
of installing Python modules and applications.
After the build command runs (whether you run it explicitly, or the
install command does it for you), the work of the install
command is relatively simple: all it has to do is copy everything under
build/lib (or build/lib.plat) to your chosen installation
directory.
If you don’t choose an installation directory—i.e., if you just run setup.pyinstall—then the install command installs to the standard
location for third-party Python modules. This location varies by platform and
by how you built/installed Python itself. On Unix (and Mac OS X, which is also
Unix-based), it also depends on whether the module distribution being installed
is pure Python or contains extensions (“non-pure”):
Platform
Standard installation location
Default value
Notes
Unix (pure)
prefix/lib/pythonX.Y/site-packages
/usr/local/lib/pythonX.Y/site-packages
(1)
Unix (non-pure)
exec-prefix/lib/pythonX.Y/site-packages
/usr/local/lib/pythonX.Y/site-packages
(1)
Windows
prefix\Lib\site-packages
C:\PythonXY\Lib\site-packages
(2)
Notes:
Most Linux distributions include Python as a standard part of the system, so
prefix and exec-prefix are usually both /usr on
Linux. If you build Python yourself on Linux (or any Unix-like system), the
default prefix and exec-prefix are /usr/local.
The default installation directory on Windows was C:\ProgramFiles\Python under Python 1.6a1, 1.5.2, and earlier.
prefix and exec-prefix stand for the directories that Python
is installed to, and where it finds its libraries at run-time. They are always
the same under Windows, and very often the same under Unix and Mac OS X. You
can find out what your Python installation uses for prefix and
exec-prefix by running Python in interactive mode and typing a few
simple commands. Under Unix, just type python at the shell prompt. Under
Windows, choose Start ‣ Programs ‣ Python X.Y ‣
Python (command line). Once the interpreter is started, you type Python code
at the prompt. For example, on my Linux system, I type the three Python
statements shown below, and get the output as shown, to find out my
prefix and exec-prefix:
Python 2.4 (#26, Aug 7 2004, 17:19:02)
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.prefix
'/usr'
>>> sys.exec_prefix
'/usr'
A few other placeholders are used in this document: X.Y stands for the
version of Python, for example 3.2; abiflags will be replaced by
the value of sys.abiflags or the empty string for platforms which don’t
define ABI flags; distname will be replaced by the name of the module
distribution being installed. Dots and capitalization are important in the
paths; for example, a value that uses python3.2 on UNIX will typically use
Python32 on Windows.
If you don’t want to install modules to the standard location, or if you don’t
have permission to write there, then you need to read about alternate
installations in section Alternate Installation. If you want to customize your
installation directories more heavily, see section Custom Installation on
custom installations.
Often, it is necessary or desirable to install modules to a location other than
the standard location for third-party Python modules. For example, on a Unix
system you might not have permission to write to the standard third-party module
directory. Or you might wish to try out a module before making it a standard
part of your local Python installation. This is especially true when upgrading
a distribution already present: you want to make sure your existing base of
scripts still works with the new version before actually upgrading.
The Distutils install command is designed to make installing module
distributions to an alternate location simple and painless. The basic idea is
that you supply a base directory for the installation, and the
install command picks a set of directories (called an installation
scheme) under this base directory in which to install files. The details
differ across platforms, so read whichever of the following sections applies to
you.
Note that the various alternate installation schemes are mutually exclusive: you
can pass --user, or --home, or --prefix and --exec-prefix, or
--install-base and --install-platbase, but you can’t mix from these
groups.
This scheme is designed to be the most convenient solution for users that don’t
have write permission to the global site-packages directory or don’t want to
install into it. It is enabled with a simple option:
python setup.py install --user
Files will be installed into subdirectories of site.USER_BASE (written
as userbase hereafter). This scheme installs pure Python modules and
extension modules in the same location (also known as site.USER_SITE).
Here are the values for UNIX, including Mac OS X:
Type of file
Installation directory
modules
userbase/lib/pythonX.Y/site-packages
scripts
userbase/bin
data
userbase
C headers
userbase/include/pythonX.Yabiflags/distname
And here are the values used on Windows:
Type of file
Installation directory
modules
userbase\PythonXY\site-packages
scripts
userbase\Scripts
data
userbase
C headers
userbase\PythonXY\Include\distname
The advantage of using this scheme compared to the other ones described below is
that the user site-packages directory is under normal conditions always included
in sys.path (see site for more information), which means that
there is no additional step to perform after running the setup.py script
to finalize the installation.
The build_ext command also has a --user option to add
userbase/include to the compiler search path for header files and
userbase/lib to the compiler search path for libraries as well as to
the runtime search path for shared C libraries (rpath).
The idea behind the “home scheme” is that you build and maintain a personal
stash of Python modules. This scheme’s name is derived from the idea of a
“home” directory on Unix, since it’s not unusual for a Unix user to make their
home directory have a layout similar to /usr/ or /usr/local/.
This scheme can be used by anyone, regardless of the operating system they
are installing for.
Installing a new module distribution is as simple as
python setup.py install --home=<dir>
where you can supply any directory you like for the --home option. On
Unix, lazy typists can just type a tilde (~); the install command
will expand this to your home directory:
The “prefix scheme” is useful when you wish to use one Python installation to
perform the build/install (i.e., to run the setup script), but install modules
into the third-party module directory of a different Python installation (or
something that looks like a different Python installation). If this sounds a
trifle unusual, it is—that’s why the user and home schemes come before. However,
there are at least two known cases where the prefix scheme will be useful.
First, consider that many Linux distributions put Python in /usr, rather
than the more traditional /usr/local. This is entirely appropriate,
since in those cases Python is part of “the system” rather than a local add-on.
However, if you are installing Python modules from source, you probably want
them to go in /usr/local/lib/python2.X rather than
/usr/lib/python2.X. This can be done with
Another possibility is a network filesystem where the name used to write to a
remote directory is different from the name used to read it: for example, the
Python interpreter accessed as /usr/local/bin/python might search for
modules in /usr/local/lib/python2.X, but those modules would have to
be installed to, say, /mnt/@server/export/lib/python2.X. This could
be done with
In either case, the --prefix option defines the installation base, and
the --exec-prefix option defines the platform-specific installation
base, which is used for platform-specific files. (Currently, this just means
non-pure module distributions, but could be expanded to C libraries, binary
executables, etc.) If --exec-prefix is not supplied, it defaults to
--prefix. Files are installed as follows:
Type of file
Installation directory
Python modules
prefix/lib/pythonX.Y/site-packages
extension modules
exec-prefix/lib/pythonX.Y/site-packages
scripts
prefix/bin
data
prefix
C headers
prefix/include/pythonX.Yabiflags/distname
There is no requirement that --prefix or --exec-prefix
actually point to an alternate Python installation; if the directories listed
above do not already exist, they are created at installation time.
Incidentally, the real reason the prefix scheme is important is simply that a
standard Unix installation uses the prefix scheme, but with --prefix
and --exec-prefix supplied by Python itself as sys.prefix and
sys.exec_prefix. Thus, you might think you’ll never use the prefix scheme,
but every time you run pythonsetup.pyinstall without any other options,
you’re using it.
Note that installing extensions to an alternate Python installation has no
effect on how those extensions are built: in particular, the Python header files
(Python.h and friends) installed with the Python interpreter used to run
the setup script will be used in compiling extensions. It is your
responsibility to ensure that the interpreter used to run extensions installed
in this way is compatible with the interpreter used to build them. The best way
to do this is to ensure that the two interpreters are the same version of Python
(possibly different builds, or possibly copies of the same build). (Of course,
if your --prefix and --exec-prefix don’t even point to an
alternate Python installation, this is immaterial.)
Alternate installation: Windows (the prefix scheme)¶
Windows has no concept of a user’s home directory, and since the standard Python
installation under Windows is simpler than under Unix, the --prefix
option has traditionally been used to install additional packages in separate
locations on Windows.
python setup.py install --prefix="\Temp\Python"
to install modules to the \Temp\Python directory on the current drive.
The installation base is defined by the --prefix option; the
--exec-prefix option is not supported under Windows, which means that
pure Python modules and extension modules are installed into the same location.
Files are installed as follows:
Sometimes, the alternate installation schemes described in section
Alternate Installation just don’t do what you want. You might want to tweak just
one or two directories while keeping everything under the same base directory,
or you might want to completely redefine the installation scheme. In either
case, you’re creating a custom installation scheme.
To create a custom installation scheme, you start with one of the alternate
schemes and override some of the installation directories used for the various
types of files, using these options:
Type of file
Override option
Python modules
--install-purelib
extension modules
--install-platlib
all modules
--install-lib
scripts
--install-scripts
data
--install-data
C headers
--install-headers
These override options can be relative, absolute,
or explicitly defined in terms of one of the installation base directories.
(There are two installation base directories, and they are normally the same—
they only differ when you use the Unix “prefix scheme” and supply different
--prefix and --exec-prefix options; using --install-lib will
override values computed or given for --install-purelib and
--install-platlib, and is recommended for schemes that don’t make a
difference between Python and extension modules.)
For example, say you’re installing a module distribution to your home directory
under Unix—but you want scripts to go in ~/scripts rather than
~/bin. As you might expect, you can override this directory with the
--install-scripts option; in this case, it makes most sense to supply
a relative path, which will be interpreted relative to the installation base
directory (your home directory, in this case):
Another Unix example: suppose your Python installation was built and installed
with a prefix of /usr/local/python, so under a standard installation
scripts will wind up in /usr/local/python/bin. If you want them in
/usr/local/bin instead, you would supply this absolute directory for the
--install-scripts option:
(This performs an installation using the “prefix scheme,” where the prefix is
whatever your Python interpreter was installed with— /usr/local/python
in this case.)
If you maintain Python on Windows, you might want third-party modules to live in
a subdirectory of prefix, rather than right in prefix
itself. This is almost as easy as customizing the script installation directory
—you just have to remember that there are two types of modules to worry about,
Python and extension modules, which can conveniently be both controlled by one
option:
python setup.py install --install-lib=Site
The specified installation directory is relative to prefix. Of
course, you also have to ensure that this directory is in Python’s module
search path, such as by putting a .pth file in a site directory (see
site). See section Modifying Python’s Search Path to find out how to modify
Python’s search path.
If you want to define an entire installation scheme, you just have to supply all
of the installation directory options. The recommended way to do this is to
supply relative paths; for example, if you want to maintain all Python
module-related files under python in your home directory, and you want a
separate directory for each platform that you use your home directory from, you
might define the following installation scheme:
$PLAT is not (necessarily) an environment variable—it will be expanded by
the Distutils as it parses your command line options, just as it does when
parsing your configuration file(s).
Obviously, specifying the entire installation scheme every time you install a
new module distribution would be very tedious. Thus, you can put these options
into your Distutils config file (see section Distutils Configuration Files):
Note that these two are not equivalent if you supply a different installation
base directory when you run the setup script. For example,
python setup.py install --install-base=/tmp
would install pure modules to /tmp/python/lib in the first case, and
to /tmp/lib in the second case. (For the second case, you probably
want to supply an installation base of /tmp/python.)
You probably noticed the use of $HOME and $PLAT in the sample
configuration file input. These are Distutils configuration variables, which
bear a strong resemblance to environment variables. In fact, you can use
environment variables in config files on platforms that have such a notion but
the Distutils additionally define a few extra variables that may not be in your
environment, such as $PLAT. (And of course, on systems that don’t have
environment variables, such as Mac OS 9, the configuration variables supplied by
the Distutils are the only ones you can use.) See section Distutils Configuration Files
for details.
When the Python interpreter executes an import statement, it searches
for both Python code and extension modules along a search path. A default value
for the path is configured into the Python binary when the interpreter is built.
You can determine the path by importing the sys module and printing the
value of sys.path.
$ python
Python 2.2 (#11, Oct 3 2002, 13:31:27)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/local/lib/python2.3', '/usr/local/lib/python2.3/plat-linux2',
'/usr/local/lib/python2.3/lib-tk', '/usr/local/lib/python2.3/lib-dynload',
'/usr/local/lib/python2.3/site-packages']
>>>
The null string in sys.path represents the current working directory.
The expected convention for locally installed packages is to put them in the
.../site-packages/ directory, but you may want to install Python
modules into some arbitrary directory. For example, your site may have a
convention of keeping all software related to the web server under /www.
Add-on Python modules might then belong in /www/python, and in order to
import them, this directory must be added to sys.path. There are several
different ways to add the directory.
The most convenient way is to add a path configuration file to a directory
that’s already on Python’s path, usually to the .../site-packages/
directory. Path configuration files have an extension of .pth, and each
line must contain a single path that will be appended to sys.path. (Because
the new paths are appended to sys.path, modules in the added directories
will not override standard modules. This means you can’t use this mechanism for
installing fixed versions of standard modules.)
Paths can be absolute or relative, in which case they’re relative to the
directory containing the .pth file. See the documentation of
the site module for more information.
A slightly less convenient way is to edit the site.py file in Python’s
standard library, and modify sys.path. site.py is automatically
imported when the Python interpreter is executed, unless the -S switch
is supplied to suppress this behaviour. So you could simply edit
site.py and add two lines to it:
import sys
sys.path.append('/www/python/')
However, if you reinstall the same major version of Python (perhaps when
upgrading from 2.2 to 2.2.2, for example) site.py will be overwritten by
the stock version. You’d have to remember that it was modified and save a copy
before doing the installation.
There are two environment variables that can modify sys.path.
PYTHONHOME sets an alternate value for the prefix of the Python
installation. For example, if PYTHONHOME is set to /www/python,
the search path will be set to ['','/www/python/lib/pythonX.Y/','/www/python/lib/pythonX.Y/plat-linux2',...].
The PYTHONPATH variable can be set to a list of paths that will be
added to the beginning of sys.path. For example, if PYTHONPATH is
set to /www/python:/opt/py, the search path will begin with
['/www/python','/opt/py']. (Note that directories must exist in order to
be added to sys.path; the site module removes paths that don’t
exist.)
Finally, sys.path is just a regular Python list, so any Python application
can modify it by adding or removing entries.
As mentioned above, you can use Distutils configuration files to record personal
or site preferences for any Distutils options. That is, any option to any
command can be stored in one of two or three (depending on your platform)
configuration files, which will be consulted before the command-line is parsed.
This means that configuration files will override default values, and the
command-line will in turn override configuration files. Furthermore, if
multiple configuration files apply, values from “earlier” files are overridden
by “later” files.
The names and locations of the configuration files vary slightly across
platforms. On Unix and Mac OS X, the three configuration files (in the order
they are processed) are:
Type of file
Location and filename
Notes
system
prefix/lib/pythonver/distutils/distutils.cfg
(1)
personal
$HOME/.pydistutils.cfg
(2)
local
setup.cfg
(3)
And on Windows, the configuration files are:
Type of file
Location and filename
Notes
system
prefix\Lib\distutils\distutils.cfg
(4)
personal
%HOME%\pydistutils.cfg
(5)
local
setup.cfg
(3)
On all platforms, the “personal” file can be temporarily disabled by
passing the –no-user-cfg option.
Notes:
Strictly speaking, the system-wide configuration file lives in the directory
where the Distutils are installed; under Python 1.6 and later on Unix, this is
as shown. For Python 1.5.2, the Distutils will normally be installed to
prefix/lib/python1.5/site-packages/distutils, so the system
configuration file should be put there under Python 1.5.2.
On Unix, if the HOME environment variable is not defined, the user’s
home directory will be determined with the getpwuid() function from the
standard pwd module. This is done by the os.path.expanduser()
function used by Distutils.
I.e., in the current directory (usually the location of the setup script).
(See also note (1).) Under Python 1.6 and later, Python’s default “installation
prefix” is C:\Python, so the system configuration file is normally
C:\Python\Lib\distutils\distutils.cfg. Under Python 1.5.2, the
default prefix was C:\ProgramFiles\Python, and the Distutils were not
part of the standard library—so the system configuration file would be
C:\ProgramFiles\Python\distutils\distutils.cfg in a standard Python
1.5.2 installation under Windows.
On Windows, if the HOME environment variable is not defined,
USERPROFILE then HOMEDRIVE and HOMEPATH will
be tried. This is done by the os.path.expanduser() function used
by Distutils.
The Distutils configuration files all have the same syntax. The config files
are grouped into sections. There is one section for each Distutils command,
plus a global section for global options that affect every command. Each
section consists of one option per line, specified as option=value.
For example, the following is a complete config file that just forces all
commands to run quietly by default:
[global]
verbose=0
If this is installed as the system config file, it will affect all processing of
any Python module distribution by any user on the current system. If it is
installed as your personal config file (on systems that support them), it will
affect only module distributions processed by you. And if it is used as the
setup.cfg for a particular module distribution, it affects only that
distribution.
You could override the default “build base” directory and make the
build* commands always forcibly rebuild all files with the
following:
[build]
build-base=blib
force=1
which corresponds to the command-line arguments
python setup.py build --build-base=blib --force
except that including the build command on the command-line means
that command will be run. Including a particular command in config files has no
such implication; it only means that if the command is run, the options in the
config file will apply. (Or if other commands that derive values from it are
run, they will use the values in the config file.)
You can find out the complete list of options for any command using the
--help option, e.g.:
python setup.py build --help
and you can find out the complete list of global options by using
--help without a command:
python setup.py --help
See also the “Reference” section of the “Distributing Python Modules” manual.
Whenever possible, the Distutils try to use the configuration information made
available by the Python interpreter used to run the setup.py script.
For example, the same compiler and linker flags used to compile Python will also
be used for compiling extensions. Usually this will work well, but in
complicated situations this might be inappropriate. This section discusses how
to override the usual Distutils behaviour.
Compiling a Python extension written in C or C++ will sometimes require
specifying custom flags for the compiler and linker in order to use a particular
library or produce a special kind of object code. This is especially true if the
extension hasn’t been tested on your platform, or if you’re trying to
cross-compile Python.
In the most general case, the extension author might have foreseen that
compiling the extensions would be complicated, and provided a Setup file
for you to edit. This will likely only be done if the module distribution
contains many separate extension modules, or if they often require elaborate
sets of compiler flags in order to work.
A Setup file, if present, is parsed in order to get a list of extensions
to build. Each line in a Setup describes a single module. Lines have
the following structure:
module is the name of the extension module to be built, and should be a
valid Python identifier. You can’t just change this in order to rename a module
(edits to the source code would also be needed), so this should be left alone.
sourcefile is anything that’s likely to be a source code file, at least
judging by the filename. Filenames ending in .c are assumed to be
written in C, filenames ending in .C, .cc, and .c++ are
assumed to be C++, and filenames ending in .m or .mm are assumed
to be in Objective C.
cpparg is an argument for the C preprocessor, and is anything starting with
-I, -D, -U or -C.
library is anything ending in .a or beginning with -l or
-L.
If a particular platform requires a special library on your platform, you can
add it by editing the Setup file and running pythonsetup.pybuild.
For example, if the module defined by the line
foo foomodule.c
must be linked with the math library libm.a on your platform, simply add
-lm to the line:
foo foomodule.c -lm
Arbitrary switches intended for the compiler or the linker can be supplied with
the -Xcompilerarg and -Xlinkerarg options:
The next option after -Xcompiler and -Xlinker will be
appended to the proper command line, so in the above example the compiler will
be passed the -o32 option, and the linker will be passed
-shared. If a compiler option requires an argument, you’ll have to
supply multiple -Xcompiler options; for example, to pass -xc++
the Setup file would have to contain -Xcompiler-x-Xcompilerc++.
Compiler flags can also be supplied through setting the CFLAGS
environment variable. If set, the contents of CFLAGS will be added to
the compiler flags specified in the Setup file.
This subsection describes the necessary steps to use Distutils with the Borland
C++ compiler version 5.5. First you have to know that Borland’s object file
format (OMF) is different from the format used by the Python version you can
download from the Python or ActiveState Web site. (Python is built with
Microsoft Visual C++, which uses COFF as the object file format.) For this
reason you have to convert Python’s library python25.lib into the
Borland format. You can do this as follows:
coff2omf python25.lib python25_bcpp.lib
The coff2omf program comes with the Borland compiler. The file
python25.lib is in the Libs directory of your Python
installation. If your extension uses other libraries (zlib, ...) you have to
convert them too.
The converted files have to reside in the same directories as the normal
libraries.
How does Distutils manage to use these libraries with their changed names? If
the extension needs a library (eg. foo) Distutils checks first if it
finds a library with suffix _bcpp (eg. foo_bcpp.lib) and then
uses this library. In the case it doesn’t find such a special library it uses
the default name (foo.lib.) [1]
To let Distutils compile your extension with Borland C++ you now have to type:
python setup.py build --compiler=bcpp
If you want to use the Borland C++ compiler as the default, you could specify
this in your personal or system-wide configuration file for Distutils (see
section Distutils Configuration Files.)
This section describes the necessary steps to use Distutils with the GNU C/C++
compilers in their Cygwin and MinGW distributions. [2] For a Python interpreter
that was built with Cygwin, everything should work without any of these
following steps.
Not all extensions can be built with MinGW or Cygwin, but many can. Extensions
most likely to not work are those that use C++ or depend on Microsoft Visual C
extensions.
To let Distutils compile your extension with Cygwin you have to type:
python setup.py build --compiler=cygwin
and for Cygwin in no-cygwin mode [3] or for MinGW type:
python setup.py build --compiler=mingw32
If you want to use any of these options/compilers as default, you should
consider writing it in your personal or system-wide configuration file for
Distutils (see section Distutils Configuration Files.)
The following instructions only apply if you’re using a version of Python
inferior to 2.4.1 with a MinGW inferior to 3.0.0 (with
binutils-2.13.90-20030111-1).
These compilers require some special libraries. This task is more complex than
for Borland’s C++, because there is no program to convert the library. First
you have to create a list of symbols which the Python DLL exports. (You can find
a good program for this task at
http://www.emmestech.com/software/pexports-0.43/download_pexports.html).
pexports python25.dll >python25.def
The location of an installed python25.dll will depend on the
installation options and the version and language of Windows. In a “just for
me” installation, it will appear in the root of the installation directory. In
a shared installation, it will be located in the system directory.
Then you can create from these information an import library for gcc.
The resulting library has to be placed in the same directory as
python25.lib. (Should be the libs directory under your Python
installation directory.)
If your extension uses other libraries (zlib,...) you might have to convert
them too. The converted files have to reside in the same directories as the
normal libraries do.
For “central processing unit.” Many style guides say this should be spelled
out on the first use (and if you must use it, do so!). For the Python
documentation, this abbreviation should be avoided since there’s no
reasonable way to predict which occurrence will be the first seen by the
reader. It is better to use the word “processor” instead.
POSIX
The name assigned to a particular group of standards. This is always
uppercase.
Python
The name of our favorite programming language is always capitalized.
Unicode
The name of a character set and matching encoding. This is always written
capitalized.
Unix
The name of the operating system developed at AT&T Bell Labs in the early
1970s.
* This is a bulleted list.
* It has two items, the second
item uses two lines.
1. This is a numbered list.
2. It has two items too.
#. This is a numbered list.
#. It has two items too.
嵌套的列表是可以的, 但是注意它们必须要与父列表以空行分割:
* this is
* a list
* with a nested list
* and some subitems
* and here the parent list continues
定义列表像下面这样定义:
term (up to a line of text)
Definition of the term, which must be indented
and can even consist of multiple paragraphs
next term
Description.
This is a normal text paragraph. The next paragraph is a code sample:: It is not processed in any way, except that the indentation is removed. It can span multiple lines.
This is a normal text paragraph again.
处理 :: 会很智能:
如果在段落中出现, 那么这个段落还是完整的保留下来.
如果前面有空格, 那么这个标记就被删除了.
如果前面不是空格, 那么就会被替换成一个冒号.
那么, 上面的那句例子就会成为如 “The next paragraph is a code sample:”
的样子.
There are some problems one commonly runs into while authoring reST documents:
Separation of inline markup: As said above, inline markup spans must be
separated from the surrounding text by non-word characters, you have to use
an escaped space to get around that.
:mod:`parrot` -- Dead parrot access===================================..module:: parrot
:platform:Unix, Windows:synopsis:Analyze and reanimate dead parrots...moduleauthor:: Eric Cleese <eric@python.invalid>
..moduleauthor:: John Idle <john@python.invalid>
.._my-reference-label:Section to cross-reference--------------------------
This is the text of the section.
It refers to the section itself, see :ref:`my-reference-label`.
CPython implementation detail: This describes some implementation detail.
More explanation.
或者:
..impl-detail:: This shortly mentions an implementation detail.
“CPython implementation detail:” is automatically prepended to the
content.
“CPython implementation detail:” 会自动的增加.
seealso
很多章节包含了一个列表, 放着一些参考的文档.
它们就被放在 seealso 中.
seealso 指示符一般防在一个章节到另一个字章节的前面.
在 HTML 输出时, 它会防在一个悬浮的框中.
在 seealso 中应该是一个 reST 的定义列表. 比如:
..seealso::
Module :mod:`zipfile`
Documentation of the :mod:`zipfile` standard module.
`GNU tar manual, Basic Tar Format <http://link>`_
Documentation for tar archive files, including GNU tar extensions.
The classdesc* and excclassdesc environments have been dropped, the
class and exception directives support classes documented with and without
constructor arguments.
Multiple objects
与 ...line 命令等同的是:
..function:: do_foo(bar)
do_bar(baz)
Description of the functions.
Python HOWTOs are documents that cover a single, specific topic,
and attempt to cover it fairly completely. Modelled on the Linux
Documentation Project’s HOWTO collection, this collection is an
effort to foster documentation that’s more detailed than the
Python Library Reference.
It’s usually difficult to get your management to accept open source software,
and Python is no exception to this rule. This document discusses reasons to use
Python, strategies for winning acceptance, facts and arguments you can use, and
cases where you shouldn’t try to use Python.
There are several reasons to incorporate a scripting language into your
development process, and this section will discuss them, and why Python has some
properties that make it a particularly good choice.
Programs are often organized in a modular fashion. Lower-level operations are
grouped together, and called by higher-level functions, which may in turn be
used as basic operations by still further upper levels.
For example, the lowest level might define a very low-level set of functions for
accessing a hash table. The next level might use hash tables to store the
headers of a mail message, mapping a header name like Date to a value such
as Tue,13May199720:00:54-0400. A yet higher level may operate on
message objects, without knowing or caring that message headers are stored in a
hash table, and so forth.
Often, the lowest levels do very simple things; they implement a data structure
such as a binary tree or hash table, or they perform some simple computation,
such as converting a date string to a number. The higher levels then contain
logic connecting these primitive operations. Using the approach, the primitives
can be seen as basic building blocks which are then glued together to produce
the complete product.
Why is this design approach relevant to Python? Because Python is well suited
to functioning as such a glue language. A common approach is to write a Python
module that implements the lower level operations; for the sake of speed, the
implementation might be in C, Java, or even Fortran. Once the primitives are
available to Python programs, the logic underlying higher level operations is
written in the form of Python code. The high-level logic is then more
understandable, and easier to modify.
John Ousterhout wrote a paper that explains this idea at greater length,
entitled “Scripting: Higher Level Programming for the 21st Century”. I
recommend that you read this paper; see the references for the URL. Ousterhout
is the inventor of the Tcl language, and therefore argues that Tcl should be
used for this purpose; he only briefly refers to other languages such as Python,
Perl, and Lisp/Scheme, but in reality, Ousterhout’s argument applies to
scripting languages in general, since you could equally write extensions for any
of the languages mentioned above.
In The Mythical Man-Month, Fredrick Brooks suggests the following rule when
planning software projects: “Plan to throw one away; you will anyway.” Brooks
is saying that the first attempt at a software design often turns out to be
wrong; unless the problem is very simple or you’re an extremely good designer,
you’ll find that new requirements and features become apparent once development
has actually started. If these new requirements can’t be cleanly incorporated
into the program’s structure, you’re presented with two unpleasant choices:
hammer the new features into the program somehow, or scrap everything and write
a new version of the program, taking the new features into account from the
beginning.
Python provides you with a good environment for quickly developing an initial
prototype. That lets you get the overall program structure and logic right, and
you can fine-tune small details in the fast development cycle that Python
provides. Once you’re satisfied with the GUI interface or program output, you
can translate the Python code into C++, Fortran, Java, or some other compiled
language.
Prototyping means you have to be careful not to use too many Python features
that are hard to implement in your other language. Using eval(), or regular
expressions, or the pickle module, means that you’re going to need C or
Java libraries for formula evaluation, regular expressions, and serialization,
for example. But it’s not hard to avoid such tricky code, and in the end the
translation usually isn’t very difficult. The resulting code can be rapidly
debugged, because any serious logical errors will have been removed from the
prototype, leaving only more minor slip-ups in the translation to track down.
This strategy builds on the earlier discussion of programmability. Using Python
as glue to connect lower-level components has obvious relevance for constructing
prototype systems. In this way Python can help you with development, even if
end users never come in contact with Python code at all. If the performance of
the Python version is adequate and corporate politics allow it, you may not need
to do a translation into C or Java, but it can still be faster to develop a
prototype and then translate it, instead of attempting to produce the final
version immediately.
One example of this development strategy is Microsoft Merchant Server. Version
1.0 was written in pure Python, by a company that subsequently was purchased by
Microsoft. Version 2.0 began to translate the code into C++, shipping with some
C++code and some Python code. Version 3.0 didn’t contain any Python at all; all
the code had been translated into C++. Even though the product doesn’t contain
a Python interpreter, the Python language has still served a useful purpose by
speeding up development.
This is a very common use for Python. Past conference papers have also
described this approach for developing high-level numerical algorithms; see
David M. Beazley and Peter S. Lomdahl’s paper “Feeding a Large-scale Physics
Application to Python” in the references for a good example. If an algorithm’s
basic operations are things like “Take the inverse of this 4000x4000 matrix”,
and are implemented in some lower-level language, then Python has almost no
additional performance cost; the extra time required for Python to evaluate an
expression like m.invert() is dwarfed by the cost of the actual computation.
It’s particularly good for applications where seemingly endless tweaking is
required to get things right. GUI interfaces and Web sites are prime examples.
The Python code is also shorter and faster to write (once you’re familiar with
Python), so it’s easier to throw it away if you decide your approach was wrong;
if you’d spent two weeks working on it instead of just two hours, you might
waste time trying to patch up what you’ve got out of a natural reluctance to
admit that those two weeks were wasted. Truthfully, those two weeks haven’t
been wasted, since you’ve learnt something about the problem and the technology
you’re using to solve it, but it’s human nature to view this as a failure of
some sort.
Python is definitely not a toy language that’s only usable for small tasks.
The language features are general and powerful enough to enable it to be used
for many different purposes. It’s useful at the small end, for 10- or 20-line
scripts, but it also scales up to larger systems that contain thousands of lines
of code.
However, this expressiveness doesn’t come at the cost of an obscure or tricky
syntax. While Python has some dark corners that can lead to obscure code, there
are relatively few such corners, and proper design can isolate their use to only
a few classes or modules. It’s certainly possible to write confusing code by
using too many features with too little concern for clarity, but most Python
code can look a lot like a slightly-formalized version of human-understandable
pseudocode.
In The New Hacker’s Dictionary, Eric S. Raymond gives the following definition
for “compact”:
Compact adj. Of a design, describes the valuable property that it can all be
apprehended at once in one’s head. This generally means the thing created from
the design can be used with greater facility and fewer errors than an equivalent
tool that is not compact. Compactness does not imply triviality or lack of
power; for example, C is compact and FORTRAN is not, but C is more powerful than
FORTRAN. Designs become non-compact through accreting features and cruft that
don’t merge cleanly into the overall design scheme (thus, some fans of Classic C
maintain that ANSI C is no longer compact).
In this sense of the word, Python is quite compact, because the language has
just a few ideas, which are used in lots of places. Take namespaces, for
example. Import a module with importmath, and you create a new namespace
called math. Classes are also namespaces that share many of the properties
of modules, and have a few of their own; for example, you can create instances
of a class. Instances? They’re yet another namespace. Namespaces are currently
implemented as Python dictionaries, so they have the same methods as the
standard dictionary data type: .keys() returns all the keys, and so forth.
This simplicity arises from Python’s development history. The language syntax
derives from different sources; ABC, a relatively obscure teaching language, is
one primary influence, and Modula-3 is another. (For more information about ABC
and Modula-3, consult their respective Web sites at http://www.cwi.nl/~steven/abc/
and http://www.m3.org.) Other features have come from C, Icon,
Algol-68, and even Perl. Python hasn’t really innovated very much, but instead
has tried to keep the language small and easy to learn, building on ideas that
have been tried in other languages and found useful.
Simplicity is a virtue that should not be underestimated. It lets you learn the
language more quickly, and then rapidly write code – code that often works the
first time you run it.
If you’re working with Java, Jython (http://www.jython.org/) is definitely worth
your attention. Jython is a re-implementation of Python in Java that compiles
Python code into Java bytecodes. The resulting environment has very tight,
almost seamless, integration with Java. It’s trivial to access Java classes
from Python, and you can write Python classes that subclass Java classes.
Jython can be used for prototyping Java applications in much the same way
CPython is used, and it can also be used for test suites for Java code, or
embedded in a Java application to add scripting capabilities.
Let’s say that you’ve decided upon Python as the best choice for your
application. How can you convince your management, or your fellow developers,
to use Python? This section lists some common arguments against using Python,
and provides some possible rebuttals.
Python is freely available software that doesn’t cost anything. How good can
it be?
Very good, indeed. These days Linux and Apache, two other pieces of open source
software, are becoming more respected as alternatives to commercial software,
but Python hasn’t had all the publicity.
Python has been around for several years, with many users and developers.
Accordingly, the interpreter has been used by many people, and has gotten most
of the bugs shaken out of it. While bugs are still discovered at intervals,
they’re usually either quite obscure (they’d have to be, for no one to have run
into them before) or they involve interfaces to external libraries. The
internals of the language itself are quite stable.
Having the source code should be viewed as making the software available for
peer review; people can examine the code, suggest (and implement) improvements,
and track down bugs. To find out more about the idea of open source code, along
with arguments and case studies supporting it, go to http://www.opensource.org.
Who’s going to support it?
Python has a sizable community of developers, and the number is still growing.
The Internet community surrounding the language is an active one, and is worth
being considered another one of Python’s advantages. Most questions posted to
the comp.lang.python newsgroup are quickly answered by someone.
Should you need to dig into the source code, you’ll find it’s clear and
well-organized, so it’s not very difficult to write extensions and track down
bugs yourself. If you’d prefer to pay for support, there are companies and
individuals who offer commercial support for Python.
Who uses Python for serious work?
Lots of people; one interesting thing about Python is the surprising diversity
of applications that it’s been used for. People are using Python to:
Run Web sites
Write GUI interfaces
Control number-crunching code on supercomputers
Make a commercial application scriptable by embedding the Python interpreter
inside it
Process large XML data sets
Build test suites for C or Java code
Whatever your application domain is, there’s probably someone who’s used Python
for something similar. Yet, despite being useable for such high-end
applications, Python’s still simple enough to use for little jobs.
They’re practically nonexistent. Consult the Misc/COPYRIGHT file in the
source distribution, or the section History and License for the full
language, but it boils down to three conditions:
You have to leave the copyright notice on the software; if you don’t include
the source code in a product, you have to put the copyright notice in the
supporting documentation.
Don’t claim that the institutions that have developed Python endorse your
product in any way.
If something goes wrong, you can’t sue for damages. Practically all software
licenses contain this condition.
Notice that you don’t have to provide source code for anything that contains
Python or is built with it. Also, the Python interpreter and accompanying
documentation can be modified and redistributed in any way you like, and you
don’t have to pay anyone any licensing fees at all.
Why should we use an obscure language like Python instead of well-known
language X?
I hope this HOWTO, and the documents listed in the final section, will help
convince you that Python isn’t obscure, and has a healthily growing user base.
One word of advice: always present Python’s positive advantages, instead of
concentrating on language X’s failings. People want to know why a solution is
good, rather than why all the other solutions are bad. So instead of attacking
a competing solution on various grounds, simply show how Python’s virtues can
help.
John Ousterhout’s white paper on scripting is a good argument for the utility of
scripting languages, though naturally enough, he emphasizes Tcl, the language he
developed. Most of the arguments would apply to any scripting language.
The authors, David M. Beazley and Peter S. Lomdahl, describe their use of
Python at Los Alamos National Laboratory. It’s another good example of how
Python can help get real work done. This quotation from the paper has been
echoed by many people:
Originally developed as a large monolithic application for massively parallel
processing systems, we have used Python to transform our application into a
flexible, highly modular, and extremely powerful system for performing
simulation, data analysis, and visualization. In addition, we describe how
Python has solved a number of important problems related to the development,
debugging, deployment, and maintenance of scientific software.
This interview with Andy Feit, discussing Infoseek’s use of Python, can be used
to show that choosing Python didn’t introduce any difficulties into a company’s
development process, and provided some substantial benefits.
Management may be doubtful of the reliability and usefulness of software that
wasn’t written commercially. This site presents arguments that show how open
source software can have considerable advantages over closed-source software.
The Linux Advocacy mini-HOWTO was the inspiration for this document, and is also
well worth reading for general suggestions on winning acceptance for a new
technology, such as Linux or Python. In general, you won’t make much progress
by simply attacking existing systems and complaining about their inadequacies;
this often ends up looking like unfocused whining. It’s much better to point
out some of the many areas where Python is an improvement over other systems.
With Python 3 being the future of Python while Python 2 is still in active
use, it is good to have your project available for both major releases of
Python. This guide is meant to help you choose which strategy works best
for your project to support both Python 2 & 3 along with how to execute
that strategy.
When a project makes the decision that it’s time to support both Python 2 & 3,
a decision needs to be made as to how to go about accomplishing that goal.
The chosen strategy will depend on how large the project’s existing
codebase is and how much divergence you want from your Python 2 codebase from
your Python 3 one (e.g., starting a new version with Python 3).
If your project is brand-new or does not have a large codebase, then you may
want to consider writing/porting all of your code for Python 3
and use 3to2 to port your code for Python 2.
If you would prefer to maintain a codebase which is semantically and
syntactically compatible with Python 2 & 3 simultaneously, you can write
Python 2/3 Compatible Source. While this tends to lead to somewhat non-idiomatic
code, it does mean you keep a rapid development process for you, the developer.
Finally, you do have the option of using 2to3 to translate
Python 2 code into Python 3 code (with some manual help). This can take the
form of branching your code and using 2to3 to start a Python 3 branch. You can
also have users perform the translation as installation time automatically so
that you only have to maintain a Python 2 codebase.
Regardless of which approach you choose, porting is not as hard or
time-consuming as you might initially think. You can also tackle the problem
piece-meal as a good portion of porting is simply updating your code to follow
current best practices in a Python 2/3 compatible way.
Regardless of what strategy you pick, there are a few things you should
consider.
One is make sure you have a robust test suite. You need to make sure everything
continues to work, just like when you support a new minor version of Python.
This means making sure your test suite is thorough and is ported properly
between Python 2 & 3. You will also most likely want to use something like tox
to automate testing between both a Python 2 and Python 3 VM.
setup(
name='Your Library',
version='1.0',
classifiers=[
# make sure to use :: Python *and* :: Python :: 3 so
# that pypi can list the package on the python 3 page
'Programming Language :: Python',
'Programming Language :: Python :: 3'
],
packages=['yourlibrary'],
# make sure to add custom_fixers to the MANIFEST.in
include_package_data=True,
# ...
)
Doing so will cause your project to show up in the
Python 3 packages list. You will know
you set the classifier properly as visiting your project page on the Cheeseshop
will show a Python 3 logo in the upper-left corner of the page.
Three, the six project provides a library which helps iron out differences
between Python 2 & 3. If you find there is a sticky point that is a continual
point of contention in your translation or maintenance of code, consider using
a source-compatible solution relying on six. If you have to create your own
Python 2/3 compatible solution, you can use sys.version_info[0]>=3 as a
guard.
Four, read all the approaches. Just because some bit of advice applies to one
approach more than another doesn’t mean that some advice doesn’t apply to other
strategies.
Five, drop support for older Python versions if possible. Python 2.5
introduced a lot of useful syntax and libraries which have become idiomatic
in Python 3. Python 2.6 introduced future statements which makes
compatibility much easier if you are going from Python 2 to 3.
Python 2.7 continues the trend in the stdlib. So choose the newest version
of Python which you believe can be your minimum support version
and work from there.
If you are starting a new project or your codebase is small enough, you may
want to consider writing your code for Python 3 and backporting to Python 2
using 3to2. Thanks to Python 3 being more strict about things than Python 2
(e.g., bytes vs. strings), the source translation can be easier and more
straightforward than from Python 2 to 3. Plus it gives you more direct
experience developing in Python 3 which, since it is the future of Python, is a
good thing long-term.
A drawback of this approach is that 3to2 is a third-party project. This means
that the Python core developers (and thus this guide) can make no promises
about how well 3to2 works at any time. There is nothing to suggest, though,
that 3to2 is not a high-quality project.
Included with Python since 2.6, the 2to3 tool (and lib2to3 module)
helps with porting Python 2 to Python 3 by performing various source
translations. This is a perfect solution for projects which wish to branch
their Python 3 code from their Python 2 codebase and maintain them as
independent codebases. You can even begin preparing to use this approach
today by writing future-compatible Python code which works cleanly in
Python 2 in conjunction with 2to3; all steps outlined below will work
with Python 2 code up to the point when the actual use of 2to3 occurs.
Use of 2to3 as an on-demand translation step at install time is also possible,
preventing the need to maintain a separate Python 3 codebase, but this approach
does come with some drawbacks. While users will only have to pay the
translation cost once at installation, you as a developer will need to pay the
cost regularly during development. If your codebase is sufficiently large
enough then the translation step ends up acting like a compilation step,
robbing you of the rapid development process you are used to with Python.
Obviously the time required to translate a project will vary, so do an
experimental translation just to see how long it takes to evaluate whether you
prefer this approach compared to using Python 2/3 Compatible Source or simply keeping
a separate Python 3 codebase.
Below are the typical steps taken by a project which uses a 2to3-based approach
to supporting Python 2 & 3.
As a first step, make sure that your project is compatible with Python 2.7.
This is just good to do as Python 2.7 is the last release of Python 2 and thus
will be used for a rather long time. It also allows for use of the -3 flag
to Python to help discover places in your code which 2to3 cannot handle but are
known to cause issues.
While not possible for all projects, if you can support Python 2.6 and newer
only, your life will be much easier. Various future statements, stdlib
additions, etc. exist only in Python 2.6 and later which greatly assist in
porting to Python 3. But if you project must keep support for Python 2.5 (or
even Python 2.4) then it is still possible to port to Python 3.
Below are the benefits you gain if you only have to support Python 2.6 and
newer. Some of these options are personal choice while others are
strongly recommended (the ones that are more for personal choice are
labeled as such). If you continue to support older versions of Python then you
at least need to watch out for situations that these solutions fix.
This is a personal choice. 2to3 handles the translation from the print
statement to the print function rather well so this is an optional step. This
future statement does help, though, with getting used to typing
print('Hello,World') instead of print'Hello,World'.
Another personal choice. You can always mark what you want to be a (unicode)
string with a u prefix to get the same effect. But regardless of whether
you use this future statement or not, you must make sure you know exactly
which Python 2 strings you want to be bytes, and which are to be strings. This
means you should, at minimum mark all strings that are meant to be text
strings with a u prefix if you do not use this future statement.
This is a very important one. The ability to prefix Python 2 strings that
are meant to contain bytes with a b prefix help to very clearly delineate
what is and is not a Python 3 string. When you run 2to3 on code, all Python 2
strings become Python 3 strings unless they are prefixed with b.
There are some differences between byte literals in Python 2 and those in
Python 3 thanks to the bytes type just being an alias to str in Python 2.
Probably the biggest “gotcha” is that indexing results in different values. In
Python 2, the value of b'py'[1] is 'y', while in Python 3 it’s 121.
You can avoid this disparity by always slicing at the size of a single element:
b'py'[1:2] is 'y' in Python 2 and b'y' in Python 3 (i.e., close
enough).
You cannot concatenate bytes and strings in Python 3. But since in Python
2 has bytes aliased to str, it will succeed: b'a'+u'b' works in
Python 2, but b'a'+'b' in Python 3 is a TypeError. A similar issue
also comes about when doing comparisons between bytes and strings.
Implicit relative imports (e.g., importing spam.bacon from within
spam.eggs with the statement importbacon) does not work in Python 3.
This future statement moves away from that and allows the use of explicit
relative imports (e.g., from.importbacon).
In Python 2.5 you must use
the __future__ statement to get to use explicit relative imports and prevent
implicit ones. In Python 2.6 explicit relative imports are available without
the statement, but you still want the __future__ statement to prevent implicit
relative imports. In Python 2.7 the __future__ statement is not needed. In
other words, unless you are only supporting Python 2.7 or a version earlier
than Python 2.5, use the __future__ statement.
There are a few things that just consistently come up as sticking points for
people which 2to3 cannot handle automatically or can easily be done in Python 2
to help modernize your code.
While the exact same outcome can be had by using the -Qnew argument to
Python, using this future statement lifts the requirement that your users use
the flag to get the expected behavior of division in Python 3
(e.g., 1/2==0.5;1//2==0).
Unless you have been working on Windows, there is a chance you have not always
bothered to add the b mode when opening a binary file (e.g., rb for
binary reading). Under Python 3, binary files and text files are clearly
distinct and mutually incompatible; see the io module for details.
Therefore, you must make a decision of whether a file will be used for
binary access (allowing to read and/or write bytes data) or text access
(allowing to read and/or write unicode data).
Text files created using open() under Python 2 return byte strings,
while under Python 3 they return unicode strings. Depending on your porting
strategy, this can be an issue.
If you want text files to return unicode strings in Python 2, you have two
possibilities:
Under Python 2.6 and higher, use io.open(). Since io.open()
is essentially the same function in both Python 2 and Python 3, it will
help iron out any issues that might arise.
If pre-2.6 compatibility is needed, then you should use codecs.open()
instead. This will make sure that you get back unicode strings in Python 2.
New-style classes have been around since Python 2.2. You need to make sure
you are subclassing from object to avoid odd edge cases involving method
resolution order, etc. This continues to be totally valid in Python 3 (although
unneeded as all classes implicitly inherit from object).
One of the biggest issues people have when porting code to Python 3 is handling
the bytes/string dichotomy. Because Python 2 allowed the str type to hold
textual data, people have over the years been rather loose in their delineation
of what str instances held text compared to bytes. In Python 3 you cannot
be so care-free anymore and need to properly handle the difference. The key
handling this issue to to make sure that every string literal in your
Python 2 code is either syntactically of functionally marked as either bytes or
text data. After this is done you then need to make sure your APIs are designed
to either handle a specific type or made to be properly polymorphic.
First thing you must do is designate every single string literal in Python 2
as either textual or bytes data. If you are only supporting Python 2.6 or
newer, this can be accomplished by marking bytes literals with a b prefix
and then designating textual data with a u prefix or using the
unicode_literals future statement.
If your project supports versions of Python pre-dating 2.6, then you should use
the six project and its b() function to denote bytes literals. For text
literals you can either use six’s u() function or use a u prefix.
In Python 2 it was very easy to accidentally create an API that accepted both
bytes and textual data. But in Python 3, thanks to the more strict handling of
disparate types, this loose usage of bytes and text together tends to fail.
Take the dict {b'a':'bytes',u'a':'text'} in Python 2.6. It creates the
dict {u'a':'text'} since b'a'==u'a'. But in Python 3 the equivalent
dict creates {b'a':'bytes','a':'text'}, i.e., no lost data. Similar
issues can crop up when transitioning Python 2 code to Python 3.
This means you need to choose what an API is going to accept and create and
consistently stick to that API in both Python 2 and 3.
In Python 3, mixing bytes and unicode is forbidden in most situations; it
will raise a TypeError where Python 2 would have attempted an implicit
coercion between types. However, there is one case where it doesn’t and
it can be very misleading:
>>> b"" == ""
False
This is because an equality comparison is required by the language to always
succeed (and return False for incompatible types). However, this also
means that code incorrectly ported to Python 3 can display buggy behaviour
if such comparisons are silently executed. To detect such situations,
Python 3 has a -b flag that will display a warning:
$ python3 -b
>>> b"" == ""
__main__:1: BytesWarning: Comparison between bytes and string
False
To turn the warning into an exception, use the -bb flag instead:
$ python3 -bb
>>> b"" == ""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BytesWarning: Comparison between bytes and string
Another potentially surprising change is the indexing behaviour of bytes
objects in Python 3:
>>> b"xyz"[0]
120
Indeed, Python 3 bytes objects (as well as bytearray objects)
are sequences of integers. But code converted from Python 2 will often
assume that indexing a bytestring produces another bytestring, not an
integer. To reconcile both behaviours, use slicing:
>>> b"xyz"[0:1]
b'x'
>>> n = 1
>>> b"xyz"[n:n+1]
b'y'
The only remaining gotcha is that an out-of-bounds slice returns an empty
bytes object instead of raising IndexError:
>>> b"xyz"[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of range
>>> b"xyz"[3:4]
b''
In Python 2, objects can specify both a string and unicode representation of
themselves. In Python 3, though, there is only a string representation. This
becomes an issue as people can inadvertently do things in their __str__()
methods which have unpredictable results (e.g., infinite recursion if you
happen to use the unicode(self).encode('utf8') idiom as the body of your
__str__() method).
There are two ways to solve this issue. One is to use a custom 2to3 fixer. The
blog post at http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
specifies how to do this. That will allow 2to3 to change all instances of def__unicode(self):... to def__str__(self):.... This does require you
define your __str__() method in Python 2 before your __unicode__()
method.
import sys
class UnicodeMixin(object):
"""Mixin class to handle defining the proper __str__/__unicode__
methods in Python 2 or 3."""
if sys.version_info[0] >= 3: # Python 3
def __str__(self):
return self.__unicode__()
else: # Python 2
def __str__(self):
return self.__unicode__().encode('utf8')
class Spam(UnicodeMixin):
def __unicode__(self):
return u'spam-spam-bacon-spam' # 2to3 will remove the 'u' prefix
But in Python 3, indexing directly on an exception is an error. You need to
make sure to only index on the BaseException.args attribute which is a
sequence containing all arguments passed to the __init__() method.
Even better is to use the documented attributes the exception provides.
2to3 will attempt to generate fixes for doctests that it comes across. It’s
not perfect, though. If you wrote a monolithic set of doctests (e.g., a single
docstring containing all of your doctests), you should at least consider
breaking the doctests up into smaller pieces to make it more manageable to fix.
Otherwise it might very well be worth your time and effort to port your tests
to unittest.
When you run your application’s test suite, run it using the -3 flag passed
to Python. This will cause various warnings to be raised during execution about
things that 2to3 cannot handle automatically (e.g., modules that have been
removed). Try to eliminate those warnings to make your code even more portable
to Python 3.
To manually convert source code using 2to3, you use the 2to3 script that
is installed with Python 2.6 and later.:
2to3 <directory or file to convert>
This will cause 2to3 to write out a diff with all of the fixers applied for the
converted source code. If you would like 2to3 to go ahead and apply the changes
you can pass it the -w flag:
2to3 -w <stuff to convert>
There are other flags available to control exactly which fixers are applied,
etc.
When a user installs your project for Python 3, you can have either
distutils or Distribute run 2to3 on your behalf.
For distutils, use the following idiom:
try: # Python 3
from distutils.command.build_py import build_py_2to3 as build_py
except ImportError: # Python 2
from distutils.command.build_py import build_py
setup(cmdclass = {'build_py': build_py},
# ...
)
For Distribute:
setup(use_2to3=True,
# ...
)
This will allow you to not have to distribute a separate Python 3 version of
your project. It does require, though, that when you perform development that
you at least build your project and use the built Python 3 source for testing.
At this point you should (hopefully) have your project converted in such a way
that it works in Python 3. Verify it by running your unit tests and making sure
nothing has gone awry. If you miss something then figure out how to fix it in
Python 3, backport to your Python 2 code, and run your code through 2to3 again
to verify the fix transforms properly.
While it may seem counter-intuitive, you can write Python code which is
source-compatible between Python 2 & 3. It does lead to code that is not
entirely idiomatic Python (e.g., having to extract the currently raised
exception from sys.exc_info()[1]), but it can be run under Python 2
and Python 3 without using 2to3 as a translation step (although the tool
should be used to help find potential portability problems). This allows you to
continue to have a rapid development process regardless of whether you are
developing under Python 2 or Python 3. Whether this approach or using
Python 2 and 2to3 works best for you will be a per-project decision.
All of the steps outlined in how to
port Python 2 code with 2to3 apply
to creating a Python 2/3 codebase. This includes trying only support Python 2.6
or newer (the __future__ statements work in Python 3 without issue),
eliminating warnings that are triggered by -3, etc.
You should even consider running 2to3 over your code (without committing the
changes). This will let you know where potential pain points are within your
code so that you can fix them properly before they become an issue.
The six project contains many things to help you write portable Python code.
You should make sure to read its documentation from beginning to end and use
any and all features it provides. That way you will minimize any mistakes you
might make in writing cross-version code.
One change between Python 2 and 3 that will require changing how you code (if
you support Python 2.5 and earlier) is
accessing the currently raised exception. In Python 2.5 and earlier the syntax
to access the current exception is:
try:
raise Exception()
except Exception, exc:
# Current exception is 'exc'
pass
This syntax changed in Python 3 (and backported to Python 2.6 and later)
to:
try:
raise Exception()
except Exception as exc:
# Current exception is 'exc'
# In Python 3, 'exc' is restricted to the block; Python 2.6 will "leak"
pass
Because of this syntax change you must change to capturing the current
exception to:
try:
raise Exception()
except Exception:
import sys
exc = sys.exc_info()[1]
# Current exception is 'exc'
pass
You can get more information about the raised exception from
sys.exc_info() than simply the current exception instance, but you most
likely don’t need it.
Note
In Python 3, the traceback is attached to the exception instance
through the __traceback__ attribute. If the instance is saved in
a local variable that persists outside of the except block, the
traceback will create a reference cycle with the current frame and its
dictionary of local variables. This will delay reclaiming dead
resources until the next cyclic garbage collection pass.
In Python 2, this problem only occurs if you save the traceback itself
(e.g. the third element of the tuple returned by sys.exc_info())
in a variable.
The authors of the following blog posts, wiki pages, and books deserve special
thanks for making public their tips for porting Python 2 code to Python 3 (and
thus helping provide information for this document):
Although changing the C-API was not one of Python 3.0’s objectives, the many
Python level changes made leaving 2.x’s API intact impossible. In fact, some
changes such as int() and long() unification are more obvious on
the C level. This document endeavors to document incompatibilities and how
they can be worked around.
Python 3.0’s str() (PyString_* functions in C) type is equivalent to
2.x’s unicode() (PyUnicode_*). The old 8-bit string type has become
bytes(). Python 2.6 and later provide a compatibility header,
bytesobject.h, mapping PyBytes names to PyString ones. For best
compatibility with 3.0, PyUnicode should be used for textual data and
PyBytes for binary data. It’s also important to remember that
PyBytes and PyUnicode in 3.0 are not interchangeable like
PyString and PyUnicode are in 2.x. The following example
shows best practices with regards to PyUnicode, PyString,
and PyBytes.
#include "stdlib.h"#include "Python.h"#include "bytesobject.h"/* text example */staticPyObject*say_hello(PyObject*self,PyObject*args){PyObject*name,*result;if(!PyArg_ParseTuple(args,"U:say_hello",&name))returnNULL;result=PyUnicode_FromFormat("Hello, %S!",name);returnresult;}/* just a forward */staticchar*do_encode(PyObject*);/* bytes example */staticPyObject*encode_object(PyObject*self,PyObject*args){char*encoded;PyObject*result,*myobj;if(!PyArg_ParseTuple(args,"O:encode_object",&myobj))returnNULL;encoded=do_encode(myobj);if(encoded==NULL)returnNULL;result=PyBytes_FromString(encoded);free(encoded);returnresult;}
In Python 3.0, there is only one integer type. It is called int() on the
Python level, but actually corresponds to 2.x’s long() type. In the
C-API, PyInt_* functions are replaced by their PyLong_* neighbors. The
best course of action here is using the PyInt_* functions aliased to
PyLong_* found in intobject.h. The abstract PyNumber_* APIs
can also be used in some cases.
Python 3.0 has a revamped extension module initialization system. (See
PEP 3121.) Instead of storing module state in globals, they should be stored
in an interpreter specific structure. Creating modules that act correctly in
both 2.x and 3.0 is tricky. The following simple example demonstrates how.
If you are writing a new extension module, you might consider Cython. It translates a Python-like language to C. The
extension modules it creates are compatible with Python 3.x and 2.x.
The curses library supplies a terminal-independent screen-painting and
keyboard-handling facility for text-based terminals; such terminals include
VT100s, the Linux console, and the simulated terminal provided by X11 programs
such as xterm and rxvt. Display terminals support various control codes to
perform common operations such as moving the cursor, scrolling the screen, and
erasing areas. Different terminals use widely differing codes, and often have
their own minor quirks.
In a world of X displays, one might ask “why bother”? It’s true that
character-cell display terminals are an obsolete technology, but there are
niches in which being able to do fancy things with them are still valuable. One
is on small-footprint or embedded Unixes that don’t carry an X server. Another
is for tools like OS installers and kernel configurators that may have to run
before X is available.
The curses library hides all the details of different terminals, and provides
the programmer with an abstraction of a display, containing multiple
non-overlapping windows. The contents of a window can be changed in various
ways– adding text, erasing it, changing its appearance–and the curses library
will automagically figure out what control codes need to be sent to the terminal
to produce the right output.
The curses library was originally written for BSD Unix; the later System V
versions of Unix from AT&T added many enhancements and new functions. BSD curses
is no longer maintained, having been replaced by ncurses, which is an
open-source implementation of the AT&T interface. If you’re using an
open-source Unix such as Linux or FreeBSD, your system almost certainly uses
ncurses. Since most current commercial Unix versions are based on System V
code, all the functions described here will probably be available. The older
versions of curses carried by some proprietary Unixes may not support
everything, though.
No one has made a Windows port of the curses module. On a Windows platform, try
the Console module written by Fredrik Lundh. The Console module provides
cursor-addressable text output, plus full support for mouse and keyboard input,
and is available from http://effbot.org/zone/console-index.htm.
Thy Python module is a fairly simple wrapper over the C functions provided by
curses; if you’re already familiar with curses programming in C, it’s really
easy to transfer that knowledge to Python. The biggest difference is that the
Python interface makes things simpler, by merging different C functions such as
addstr(), mvaddstr(), mvwaddstr(), into a single
addstr() method. You’ll see this covered in more detail later.
This HOWTO is simply an introduction to writing text-mode programs with curses
and Python. It doesn’t attempt to be a complete guide to the curses API; for
that, see the Python library guide’s section on ncurses, and the C manual pages
for ncurses. It will, however, give you the basic ideas.
Before doing anything, curses must be initialized. This is done by calling the
initscr() function, which will determine the terminal type, send any
required setup codes to the terminal, and create various internal data
structures. If successful, initscr() returns a window object representing
the entire screen; this is usually called stdscr, after the name of the
corresponding C variable.
importcursesstdscr=curses.initscr()
Usually curses applications turn off automatic echoing of keys to the screen, in
order to be able to read keys and only display them under certain circumstances.
This requires calling the noecho() function.
curses.noecho()
Applications will also commonly need to react to keys instantly, without
requiring the Enter key to be pressed; this is called cbreak mode, as opposed to
the usual buffered input mode.
curses.cbreak()
Terminals usually return special keys, such as the cursor keys or navigation
keys such as Page Up and Home, as a multibyte escape sequence. While you could
write your application to expect such sequences and process them accordingly,
curses can do it for you, returning a special value such as
curses.KEY_LEFT. To get curses to do the job, you’ll have to enable
keypad mode.
stdscr.keypad(1)
Terminating a curses application is much easier than starting one. You’ll need
to call
curses.nocbreak();stdscr.keypad(0);curses.echo()
to reverse the curses-friendly terminal settings. Then call the endwin()
function to restore the terminal to its original operating mode.
curses.endwin()
A common problem when debugging a curses application is to get your terminal
messed up when the application dies without restoring the terminal to its
previous state. In Python this commonly happens when your code is buggy and
raises an uncaught exception. Keys are no longer be echoed to the screen when
you type them, for example, which makes using the shell difficult.
In Python you can avoid these complications and make debugging much easier by
importing the module curses.wrapper. It supplies a wrapper()
function that takes a callable. It does the initializations described above,
and also initializes colors if color support is present. It then runs your
provided callable and finally deinitializes appropriately. The callable is
called inside a try-catch clause which catches exceptions, performs curses
deinitialization, and then passes the exception upwards. Thus, your terminal
won’t be left in a funny state on exception.
Windows are the basic abstraction in curses. A window object represents a
rectangular area of the screen, and supports various methods to display text,
erase it, allow the user to input strings, and so forth.
The stdscr object returned by the initscr() function is a window
object that covers the entire screen. Many programs may need only this single
window, but you might wish to divide the screen into smaller windows, in order
to redraw or clear them separately. The newwin() function creates a new
window of a given size, returning the new window object.
A word about the coordinate system used in curses: coordinates are always passed
in the order y,x, and the top-left corner of a window is coordinate (0,0).
This breaks a common convention for handling coordinates, where the x
coordinate usually comes first. This is an unfortunate difference from most
other computer applications, but it’s been part of curses since it was first
written, and it’s too late to change things now.
When you call a method to display or erase text, the effect doesn’t immediately
show up on the display. This is because curses was originally written with slow
300-baud terminal connections in mind; with these terminals, minimizing the time
required to redraw the screen is very important. This lets curses accumulate
changes to the screen, and display them in the most efficient manner. For
example, if your program displays some characters in a window, and then clears
the window, there’s no need to send the original characters because they’d never
be visible.
Accordingly, curses requires that you explicitly tell it to redraw windows,
using the refresh() method of window objects. In practice, this doesn’t
really complicate programming with curses much. Most programs go into a flurry
of activity, and then pause waiting for a keypress or some other action on the
part of the user. All you have to do is to be sure that the screen has been
redrawn before pausing to wait for user input, by simply calling
stdscr.refresh() or the refresh() method of some other relevant
window.
A pad is a special case of a window; it can be larger than the actual display
screen, and only a portion of it displayed at a time. Creating a pad simply
requires the pad’s height and width, while refreshing a pad requires giving the
coordinates of the on-screen area where a subsection of the pad will be
displayed.
pad=curses.newpad(100,100)# These loops fill the pad with letters; this is# explained in the next sectionforyinrange(0,100):forxinrange(0,100):try:pad.addch(y,x,ord('a')+(x*x+y*y)%26)exceptcurses.error:pass# Displays a section of the pad in the middle of the screenpad.refresh(0,0,5,5,20,75)
The refresh() call displays a section of the pad in the rectangle
extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
left corner of the displayed section is coordinate (0,0) on the pad. Beyond
that difference, pads are exactly like ordinary windows and support the same
methods.
If you have multiple windows and pads on screen there is a more efficient way to
go, which will prevent annoying screen flicker at refresh time. Use the
noutrefresh() method of each window to update the data structure
representing the desired state of the screen; then change the physical screen to
match the desired state in one go with the function doupdate(). The
normal refresh() method calls doupdate() as its last act.
From a C programmer’s point of view, curses may sometimes look like a twisty
maze of functions, all subtly different. For example, addstr() displays a
string at the current cursor location in the stdscr window, while
mvaddstr() moves to a given y,x coordinate first before displaying the
string. waddstr() is just like addstr(), but allows specifying a
window to use, instead of using stdscr by default. mvwaddstr() follows
similarly.
Fortunately the Python interface hides all these details; stdscr is a window
object like any other, and methods like addstr() accept multiple argument
forms. Usually there are four different forms.
Form
Description
str or ch
Display the string str or character ch at
the current position
str or ch, attr
Display the string str or character ch,
using attribute attr at the current
position
y, x, str or ch
Move to position y,x within the window, and
display str or ch
y, x, str or ch, attr
Move to position y,x within the window, and
display str or ch, using attribute attr
Attributes allow displaying text in highlighted forms, such as in boldface,
underline, reverse code, or in color. They’ll be explained in more detail in
the next subsection.
The addstr() function takes a Python string as the value to be displayed,
while the addch() functions take a character, which can be either a Python
string of length 1 or an integer. If it’s a string, you’re limited to
displaying characters between 0 and 255. SVr4 curses provides constants for
extension characters; these constants are integers greater than 255. For
example, ACS_PLMINUS is a +/- symbol, and ACS_ULCORNER is the
upper left corner of a box (handy for drawing borders).
Windows remember where the cursor was left after the last operation, so if you
leave out the y,x coordinates, the string or character will be displayed
wherever the last operation left off. You can also move the cursor with the
move(y,x) method. Because some terminals always display a flashing cursor,
you may want to ensure that the cursor is positioned in some location where it
won’t be distracting; it can be confusing to have the cursor blinking at some
apparently random location.
If your application doesn’t need a blinking cursor at all, you can call
curs_set(0) to make it invisible. Equivalently, and for compatibility with
older curses versions, there’s a leaveok(bool) function. When bool is
true, the curses library will attempt to suppress the flashing cursor, and you
won’t need to worry about leaving it in odd locations.
Characters can be displayed in different ways. Status lines in a text-based
application are commonly shown in reverse video; a text viewer may need to
highlight certain words. curses supports this by allowing you to specify an
attribute for each cell on the screen.
An attribute is a integer, each bit representing a different attribute. You can
try to display text with multiple attribute bits set, but curses doesn’t
guarantee that all the possible combinations are available, or that they’re all
visually distinct. That depends on the ability of the terminal being used, so
it’s safest to stick to the most commonly available attributes, listed here.
Attribute
Description
A_BLINK
Blinking text
A_BOLD
Extra bright or bold text
A_DIM
Half bright text
A_REVERSE
Reverse-video text
A_STANDOUT
The best highlighting mode available
A_UNDERLINE
Underlined text
So, to display a reverse-video status line on the top line of the screen, you
could code:
The curses library also supports color on those terminals that provide it, The
most common such terminal is probably the Linux console, followed by color
xterms.
To use color, you must call the start_color() function soon after calling
initscr(), to initialize the default color set (the
curses.wrapper.wrapper() function does this automatically). Once that’s
done, the has_colors() function returns TRUE if the terminal in use can
actually display color. (Note: curses uses the American spelling ‘color’,
instead of the Canadian/British spelling ‘colour’. If you’re used to the
British spelling, you’ll have to resign yourself to misspelling it for the sake
of these functions.)
The curses library maintains a finite number of color pairs, containing a
foreground (or text) color and a background color. You can get the attribute
value corresponding to a color pair with the color_pair() function; this
can be bitwise-OR’ed with other attributes such as A_REVERSE, but
again, such combinations are not guaranteed to work on all terminals.
An example, which displays a line of text using color pair 1:
As I said before, a color pair consists of a foreground and background color.
start_color() initializes 8 basic colors when it activates color mode.
They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and
7:white. The curses module defines named constants for each of these colors:
curses.COLOR_BLACK, curses.COLOR_RED, and so forth.
The init_pair(n,f,b) function changes the definition of color pair n, to
foreground color f and background color b. Color pair 0 is hard-wired to white
on black, and cannot be changed.
Let’s put all this together. To change color 1 to red text on a white
background, you would call:
When you change a color pair, any text already displayed using that color pair
will change to the new colors. You can also display new text in this color
with:
Very fancy terminals can change the definitions of the actual colors to a given
RGB value. This lets you change color 1, which is usually red, to purple or
blue or any other color you like. Unfortunately, the Linux console doesn’t
support this, so I’m unable to try it out, and can’t provide any examples. You
can check if your terminal can do this by calling can_change_color(),
which returns TRUE if the capability is there. If you’re lucky enough to have
such a talented terminal, consult your system’s man pages for more information.
The curses library itself offers only very simple input mechanisms. Python’s
support adds a text-input widget that makes up some of the lack.
The most common way to get input to a window is to use its getch() method.
getch() pauses and waits for the user to hit a key, displaying it if
echo() has been called earlier. You can optionally specify a coordinate
to which the cursor should be moved before pausing.
It’s possible to change this behavior with the method nodelay(). After
nodelay(1), getch() for the window becomes non-blocking and returns
curses.ERR (a value of -1) when no input is ready. There’s also a
halfdelay() function, which can be used to (in effect) set a timer on each
getch(); if no input becomes available within a specified
delay (measured in tenths of a second), curses raises an exception.
The getch() method returns an integer; if it’s between 0 and 255, it
represents the ASCII code of the key pressed. Values greater than 255 are
special keys such as Page Up, Home, or the cursor keys. You can compare the
value returned to constants such as curses.KEY_PPAGE,
curses.KEY_HOME, or curses.KEY_LEFT. Usually the main loop of
your program will look something like this:
while True:
c = stdscr.getch()
if c == ord('p'): PrintDocument()
elif c == ord('q'): break # Exit the while()
elif c == curses.KEY_HOME: x = y = 0
The curses.ascii module supplies ASCII class membership functions that
take either integer or 1-character-string arguments; these may be useful in
writing more readable tests for your command interpreters. It also supplies
conversion functions that take either integer or 1-character-string arguments
and return the same type. For example, curses.ascii.ctrl() returns the
control character corresponding to its argument.
There’s also a method to retrieve an entire string, getstr(). It isn’t
used very often, because its functionality is quite limited; the only editing
keys available are the backspace key and the Enter key, which terminates the
string. It can optionally be limited to a fixed number of characters.
curses.echo() # Enable echoing of characters
# Get a 15-character string, with the cursor on the top line
s = stdscr.getstr(0,0, 15)
The Python curses.textpad module supplies something better. With it, you
can turn a window into a text box that supports an Emacs-like set of
keybindings. Various methods of Textbox class support editing with
input validation and gathering the edit results either with or without trailing
spaces. See the library documentation on curses.textpad for the
details.
This HOWTO didn’t cover some advanced topics, such as screen-scraping or
capturing mouse events from an xterm instance. But the Python library page for
the curses modules is now pretty complete. You should browse it next.
If you’re in doubt about the detailed behavior of any of the ncurses entry
points, consult the manual pages for your curses implementation, whether it’s
ncurses or a proprietary Unix vendor’s. The manual pages will document any
quirks, and provide complete lists of all the functions, attributes, and
ACS_* characters available to you.
Because the curses API is so large, some functions aren’t supported in the
Python interface, not because they’re difficult to implement, but because no one
has needed them yet. Feel free to add them and then submit a patch. Also, we
don’t yet have support for the menu library associated with
ncurses; feel free to add that.
If you write an interesting little program, feel free to contribute it as
another demo. We can always use more of them!
Defines descriptors, summarizes the protocol, and shows how descriptors are
called. Examines a custom descriptor and several built-in python descriptors
including functions, properties, static methods, and class methods. Shows how
each works by giving a pure Python equivalent and a sample application.
Learning about descriptors not only provides access to a larger toolset, it
creates a deeper understanding of how Python works and an appreciation for the
elegance of its design.
In general, a descriptor is an object attribute with “binding behavior”, one
whose attribute access has been overridden by methods in the descriptor
protocol. Those methods are __get__(), __set__(), and
__delete__(). If any of those methods are defined for an object, it is
said to be a descriptor.
The default behavior for attribute access is to get, set, or delete the
attribute from an object’s dictionary. For instance, a.x has a lookup chain
starting with a.__dict__['x'], then type(a).__dict__['x'], and
continuing through the base classes of type(a) excluding metaclasses. If the
looked-up value is an object defining one of the descriptor methods, then Python
may override the default behavior and invoke the descriptor method instead.
Where this occurs in the precedence chain depends on which descriptor methods
were defined. Note that descriptors are only invoked for new style objects or
classes (a class is new style if it inherits from object or
type).
Descriptors are a powerful, general purpose protocol. They are the mechanism
behind properties, methods, static methods, class methods, and super().
They are used throughout Python itself to implement the new style classes
introduced in version 2.2. Descriptors simplify the underlying C-code and offer
a flexible set of new tools for everyday Python programs.
That is all there is to it. Define any of these methods and an object is
considered a descriptor and can override default behavior upon being looked up
as an attribute.
If an object defines both __get__() and __set__(), it is considered
a data descriptor. Descriptors that only define __get__() are called
non-data descriptors (they are typically used for methods but other uses are
possible).
Data and non-data descriptors differ in how overrides are calculated with
respect to entries in an instance’s dictionary. If an instance’s dictionary
has an entry with the same name as a data descriptor, the data descriptor
takes precedence. If an instance’s dictionary has an entry with the same
name as a non-data descriptor, the dictionary entry takes precedence.
To make a read-only data descriptor, define both __get__() and
__set__() with the __set__() raising an AttributeError when
called. Defining the __set__() method with an exception raising
placeholder is enough to make it a data descriptor.
A descriptor can be called directly by its method name. For example,
d.__get__(obj).
Alternatively, it is more common for a descriptor to be invoked automatically
upon attribute access. For example, obj.d looks up d in the dictionary
of obj. If d defines the method __get__(), then d.__get__(obj)
is invoked according to the precedence rules listed below.
The details of invocation depend on whether obj is an object or a class.
Either way, descriptors only work for new style objects and classes. A class is
new style if it is a subclass of object.
For objects, the machinery is in object.__getattribute__() which
transforms b.x into type(b).__dict__['x'].__get__(b,type(b)). The
implementation works through a precedence chain that gives data descriptors
priority over instance variables, instance variables priority over non-data
descriptors, and assigns lowest priority to __getattr__() if provided. The
full C implementation can be found in PyObject_GenericGetAttr() in
Objects/object.c.
For classes, the machinery is in type.__getattribute__() which transforms
B.x into B.__dict__['x'].__get__(None,B). In pure Python, it looks
like:
def __getattribute__(self, key):
"Emulate type_getattro() in Objects/typeobject.c"
v = object.__getattribute__(self, key)
if hasattr(v, '__get__'):
return v.__get__(None, self)
return v
data descriptors always override instance dictionaries.
non-data descriptors may be overridden by instance dictionaries.
The object returned by super() also has a custom __getattribute__()
method for invoking descriptors. The call super(B,obj).m() searches
obj.__class__.__mro__ for the base class A immediately following B
and then returns A.__dict__['m'].__get__(obj,A). If not a descriptor,
m is returned unchanged. If not in the dictionary, m reverts to a
search using object.__getattribute__().
Note, in Python 2.2, super(B,obj).m() would only invoke __get__() if
m was a data descriptor. In Python 2.3, non-data descriptors also get
invoked unless an old-style class is involved. The implementation details are
in super_getattro() in
Objects/typeobject.c
and a pure Python equivalent can be found in Guido’s Tutorial.
The details above show that the mechanism for descriptors is embedded in the
__getattribute__() methods for object, type, and
super(). Classes inherit this machinery when they derive from
object or if they have a meta-class providing similar functionality.
Likewise, classes can turn-off descriptor invocation by overriding
__getattribute__().
The following code creates a class whose objects are data descriptors which
print a message for each get or set. Overriding __getattribute__() is
alternate approach that could do this for every attribute. However, this
descriptor is useful for monitoring just a few chosen attributes:
class RevealAccess(object):
"""A data descriptor that sets and returns values
normally and prints a message logging their access.
"""
def __init__(self, initval=None, name='var'):
self.val = initval
self.name = name
def __get__(self, obj, objtype):
print('Retrieving', self.name)
return self.val
def __set__(self, obj, val):
print('Updating', self.name)
self.val = val
>>> class MyClass(object):
x = RevealAccess(10, 'var "x"')
y = 5
>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5
The protocol is simple and offers exciting possibilities. Several use cases are
so common that they have been packaged into individual function calls.
Properties, bound and unbound methods, static methods, and class methods are all
based on the descriptor protocol.
The documentation shows a typical use to define a managed attribute x:
classC(object):defgetx(self):returnself.__xdefsetx(self,value):self.__x=valuedefdelx(self):delself.__xx=property(getx,setx,delx,"I'm the 'x' property.")
To see how property() is implemented in terms of the descriptor protocol,
here is a pure Python equivalent:
classProperty(object):"Emulate PyProperty_Type() in Objects/descrobject.c"def__init__(self,fget=None,fset=None,fdel=None,doc=None):self.fget=fgetself.fset=fsetself.fdel=fdelself.__doc__=docdef__get__(self,obj,objtype=None):ifobjisNone:returnselfifself.fgetisNone:raiseAttributeError,"unreadable attribute"returnself.fget(obj)def__set__(self,obj,value):ifself.fsetisNone:raiseAttributeError,"can't set attribute"self.fset(obj,value)def__delete__(self,obj):ifself.fdelisNone:raiseAttributeError,"can't delete attribute"self.fdel(obj)
The property() builtin helps whenever a user interface has granted
attribute access and then subsequent changes require the intervention of a
method.
For instance, a spreadsheet class may grant access to a cell value through
Cell('b10').value. Subsequent improvements to the program require the cell
to be recalculated on every access; however, the programmer does not want to
affect existing client code accessing the attribute directly. The solution is
to wrap access to the value attribute in a property data descriptor:
classCell(object):...defgetvalue(self,obj):"Recalculate cell before returning value"self.recalc()returnobj._valuevalue=property(getvalue)
Python’s object oriented features are built upon a function based environment.
Using non-data descriptors, the two are merged seamlessly.
Class dictionaries store methods as functions. In a class definition, methods
are written using def and lambda, the usual tools for
creating functions. The only difference from regular functions is that the
first argument is reserved for the object instance. By Python convention, the
instance reference is called self but may be called this or any other
variable name.
To support method calls, functions include the __get__() method for
binding methods during attribute access. This means that all functions are
non-data descriptors which return bound or unbound methods depending whether
they are invoked from an object or a class. In pure python, it works like
this:
classFunction(object):...def__get__(self,obj,objtype=None):"Simulate func_descr_get() in Objects/funcobject.c"returntypes.MethodType(self,obj,objtype)
Running the interpreter shows how the function descriptor works in practice:
>>> class D(object):
def f(self, x):
return x
>>> d = D()
>>> D.__dict__['f'] # Stored internally as a function
<function f at 0x00C45070>
>>> D.f # Get from a class becomes an unbound method
<unbound method D.f>
>>> d.f # Get from an instance becomes a bound method
<bound method D.f of <__main__.D object at 0x00B18C90>>
The output suggests that bound and unbound methods are two different types.
While they could have been implemented that way, the actual C implementation of
PyMethod_Type in
Objects/classobject.c
is a single object with two different representations depending on whether the
im_self field is set or is NULL (the C equivalent of None).
Likewise, the effects of calling a method object depend on the im_self
field. If set (meaning bound), the original function (stored in the
im_func field) is called as expected with the first argument set to the
instance. If unbound, all of the arguments are passed unchanged to the original
function. The actual C implementation of instancemethod_call() is only
slightly more complex in that it includes some type checking.
Non-data descriptors provide a simple mechanism for variations on the usual
patterns of binding functions into methods.
To recap, functions have a __get__() method so that they can be converted
to a method when accessed as attributes. The non-data descriptor transforms a
obj.f(*args) call into f(obj,*args). Calling klass.f(*args)
becomes f(*args).
This chart summarizes the binding and its two most useful variants:
Transformation
Called from an
Object
Called from a
Class
function
f(obj, *args)
f(*args)
staticmethod
f(*args)
f(*args)
classmethod
f(type(obj), *args)
f(klass, *args)
Static methods return the underlying function without changes. Calling either
c.f or C.f is the equivalent of a direct lookup into
object.__getattribute__(c,"f") or object.__getattribute__(C,"f"). As a
result, the function becomes identically accessible from either an object or a
class.
Good candidates for static methods are methods that do not reference the
self variable.
For instance, a statistics package may include a container class for
experimental data. The class provides normal methods for computing the average,
mean, median, and other descriptive statistics that depend on the data. However,
there may be useful functions which are conceptually related but do not depend
on the data. For instance, erf(x) is handy conversion routine that comes up
in statistical work but does not directly depend on a particular dataset.
It can be called either from an object or the class: s.erf(1.5)-->.9332 or
Sample.erf(1.5)-->.9332.
Since staticmethods return the underlying function with no changes, the example
calls are unexciting:
Using the non-data descriptor protocol, a pure Python version of
staticmethod() would look like this:
classStaticMethod(object):"Emulate PyStaticMethod_Type() in Objects/funcobject.c"def__init__(self,f):self.f=fdef__get__(self,obj,objtype=None):returnself.f
Unlike static methods, class methods prepend the class reference to the
argument list before calling the function. This format is the same
for whether the caller is an object or a class:
This behavior is useful whenever the function only needs to have a class
reference and does not care about any underlying data. One use for classmethods
is to create alternate class constructors. In Python 2.3, the classmethod
dict.fromkeys() creates a new dictionary from a list of keys. The pure
Python equivalent is:
classDict:...deffromkeys(klass,iterable,value=None):"Emulate dict_fromkeys() in Objects/dictobject.c"d=klass()forkeyiniterable:d[key]=valuereturndfromkeys=classmethod(fromkeys)
Now a new dictionary of unique keys can be constructed like this:
Using the non-data descriptor protocol, a pure Python version of
classmethod() would look like this:
classClassMethod(object):"Emulate PyClassMethod_Type() in Objects/funcobject.c"def__init__(self,f):self.f=fdef__get__(self,obj,klass=None):ifklassisNone:klass=type(obj)defnewfunc(*args):returnself.f(klass,*args)returnnewfunc
In this document, we’ll take a tour of Python’s features suitable for
implementing programs in a functional style. After an introduction to the
concepts of functional programming, we’ll look at language features such as
iterators and generators and relevant library modules such as
itertools and functools.
This section explains the basic concept of functional programming; if you’re
just interested in learning about Python language features, skip to the next
section.
Programming languages support decomposing problems in several different ways:
Most programming languages are procedural: programs are lists of
instructions that tell the computer what to do with the program’s input. C,
Pascal, and even Unix shells are procedural languages.
In declarative languages, you write a specification that describes the
problem to be solved, and the language implementation figures out how to
perform the computation efficiently. SQL is the declarative language you’re
most likely to be familiar with; a SQL query describes the data set you want
to retrieve, and the SQL engine decides whether to scan tables or use indexes,
which subclauses should be performed first, etc.
Object-oriented programs manipulate collections of objects. Objects have
internal state and support methods that query or modify this internal state in
some way. Smalltalk and Java are object-oriented languages. C++ and Python
are languages that support object-oriented programming, but don’t force the
use of object-oriented features.
Functional programming decomposes a problem into a set of functions.
Ideally, functions only take inputs and produce outputs, and don’t have any
internal state that affects the output produced for a given input. Well-known
functional languages include the ML family (Standard ML, OCaml, and other
variants) and Haskell.
The designers of some computer languages choose to emphasize one
particular approach to programming. This often makes it difficult to
write programs that use a different approach. Other languages are
multi-paradigm languages that support several different approaches.
Lisp, C++, and Python are multi-paradigm; you can write programs or
libraries that are largely procedural, object-oriented, or functional
in all of these languages. In a large program, different sections
might be written using different approaches; the GUI might be
object-oriented while the processing logic is procedural or
functional, for example.
In a functional program, input flows through a set of functions. Each function
operates on its input and produces some output. Functional style discourages
functions with side effects that modify internal state or make other changes
that aren’t visible in the function’s return value. Functions that have no side
effects at all are called purely functional. Avoiding side effects means
not using data structures that get updated as a program runs; every function’s
output must only depend on its input.
Some languages are very strict about purity and don’t even have assignment
statements such as a=3 or c=a+b, but it’s difficult to avoid all
side effects. Printing to the screen or writing to a disk file are side
effects, for example. For example, in Python a call to the print() or
time.sleep() function both return no useful value; they’re only called for
their side effects of sending some text to the screen or pausing execution for a
second.
Python programs written in functional style usually won’t go to the extreme of
avoiding all I/O or all assignments; instead, they’ll provide a
functional-appearing interface but will use non-functional features internally.
For example, the implementation of a function will still use assignments to
local variables, but won’t modify global variables or have other side effects.
Functional programming can be considered the opposite of object-oriented
programming. Objects are little capsules containing some internal state along
with a collection of method calls that let you modify this state, and programs
consist of making the right set of state changes. Functional programming wants
to avoid state changes as much as possible and works with data flowing between
functions. In Python you might combine the two approaches by writing functions
that take and return instances representing objects in your application (e-mail
messages, transactions, etc.).
Functional design may seem like an odd constraint to work under. Why should you
avoid objects and side effects? There are theoretical and practical advantages
to the functional style:
A theoretical benefit is that it’s easier to construct a mathematical proof that
a functional program is correct.
For a long time researchers have been interested in finding ways to
mathematically prove programs correct. This is different from testing a program
on numerous inputs and concluding that its output is usually correct, or reading
a program’s source code and concluding that the code looks right; the goal is
instead a rigorous proof that a program produces the right result for all
possible inputs.
The technique used to prove programs correct is to write down invariants,
properties of the input data and of the program’s variables that are always
true. For each line of code, you then show that if invariants X and Y are true
before the line is executed, the slightly different invariants X’ and Y’ are
true after the line is executed. This continues until you reach the end of
the program, at which point the invariants should match the desired conditions
on the program’s output.
Functional programming’s avoidance of assignments arose because assignments are
difficult to handle with this technique; assignments can break invariants that
were true before the assignment without producing any new invariants that can be
propagated onward.
Unfortunately, proving programs correct is largely impractical and not relevant
to Python software. Even trivial programs require proofs that are several pages
long; the proof of correctness for a moderately complicated program would be
enormous, and few or none of the programs you use daily (the Python interpreter,
your XML parser, your web browser) could be proven correct. Even if you wrote
down or generated a proof, there would then be the question of verifying the
proof; maybe there’s an error in it, and you wrongly believe you’ve proved the
program correct.
A more practical benefit of functional programming is that it forces you to
break apart your problem into small pieces. Programs are more modular as a
result. It’s easier to specify and write a small function that does one thing
than a large function that performs a complicated transformation. Small
functions are also easier to read and to check for errors.
Testing and debugging a functional-style program is easier.
Debugging is simplified because functions are generally small and clearly
specified. When a program doesn’t work, each function is an interface point
where you can check that the data are correct. You can look at the intermediate
inputs and outputs to quickly isolate the function that’s responsible for a bug.
Testing is easier because each function is a potential subject for a unit test.
Functions don’t depend on system state that needs to be replicated before
running a test; instead you only have to synthesize the right input and then
check that the output matches expectations.
As you work on a functional-style program, you’ll write a number of functions
with varying inputs and outputs. Some of these functions will be unavoidably
specialized to a particular application, but others will be useful in a wide
variety of programs. For example, a function that takes a directory path and
returns all the XML files in the directory, or a function that takes a filename
and returns its contents, can be applied to many different situations.
Over time you’ll form a personal library of utilities. Often you’ll assemble
new programs by arranging existing functions in a new configuration and writing
a few functions specialized for the current task.
I’ll start by looking at a Python language feature that’s an important
foundation for writing functional-style programs: iterators.
An iterator is an object representing a stream of data; this object returns the
data one element at a time. A Python iterator must support a method called
__next__() that takes no arguments and always returns the next element of
the stream. If there are no more elements in the stream, __next__() must
raise the StopIteration exception. Iterators don’t have to be finite,
though; it’s perfectly reasonable to write an iterator that produces an infinite
stream of data.
The built-in iter() function takes an arbitrary object and tries to return
an iterator that will return the object’s contents or elements, raising
TypeError if the object doesn’t support iteration. Several of Python’s
built-in data types support iteration, the most common being lists and
dictionaries. An object is called an iterable object if you can get an
iterator for it.
You can experiment with the iteration interface manually:
Python expects iterable objects in several different contexts, the most
important being the for statement. In the statement forXinY, Y must
be an iterator or some object for which iter() can create an iterator.
These two statements are equivalent:
foriiniter(obj):print(i)foriinobj:print(i)
Iterators can be materialized as lists or tuples by using the list() or
tuple() constructor functions:
Built-in functions such as max() and min() can take a single
iterator argument and will return the largest or smallest element. The "in"
and "notin" operators also support iterators: Xiniterator is true if
X is found in the stream returned by the iterator. You’ll run into obvious
problems if the iterator is infinite; max(), min(), and "notin"
will never return, and if the element X never appears in the stream, the
"in" operator won’t return either.
Note that you can only go forward in an iterator; there’s no way to get the
previous element, reset the iterator, or make a copy of it. Iterator objects
can optionally provide these additional capabilities, but the iterator protocol
only specifies the next() method. Functions may therefore consume all of
the iterator’s output, and if you need to do something different with the same
stream, you’ll have to create a new iterator.
We’ve already seen how lists and tuples support iterators. In fact, any Python
sequence type, such as strings, will automatically support creation of an
iterator.
Calling iter() on a dictionary returns an iterator that will loop over the
dictionary’s keys:
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m:
... print(key, m[key])
Mar 3
Feb 2
Aug 8
Sep 9
Apr 4
Jun 6
Jul 7
Jan 1
May 5
Nov 11
Dec 12
Oct 10
Note that the order is essentially random, because it’s based on the hash
ordering of the objects in the dictionary.
Applying iter() to a dictionary always loops over the keys, but
dictionaries have methods that return other iterators. If you want to iterate
over values or key/value pairs, you can explicitly call the
values() or items() methods to get an appropriate iterator.
The dict() constructor can accept an iterator that returns a finite stream
of (key,value) tuples:
Files also support iteration by calling the readline() method until there
are no more lines in the file. This means you can read each line of a file like
this:
forlineinfile:# do something for each line...
Sets can take their contents from an iterable and let you iterate over the set’s
elements:
Two common operations on an iterator’s output are 1) performing some operation
for every element, 2) selecting a subset of elements that meet some condition.
For example, given a list of strings, you might want to strip off trailing
whitespace from each line or extract all the strings containing a given
substring.
List comprehensions and generator expressions (short form: “listcomps” and
“genexps”) are a concise notation for such operations, borrowed from the
functional programming language Haskell (http://www.haskell.org/). You can strip
all the whitespace from a stream of strings with the following code:
line_list = [' line 1\n', 'line 2 \n', ...]
# Generator expression -- returns iterator
stripped_iter = (line.strip() for line in line_list)
# List comprehension -- returns list
stripped_list = [line.strip() for line in line_list]
You can select only certain elements by adding an "if" condition:
With a list comprehension, you get back a Python list; stripped_list is a
list containing the resulting lines, not an iterator. Generator expressions
return an iterator that computes the values as necessary, not needing to
materialize all the values at once. This means that list comprehensions aren’t
useful if you’re working with iterators that return an infinite stream or a very
large amount of data. Generator expressions are preferable in these situations.
Generator expressions are surrounded by parentheses (“()”) and list
comprehensions are surrounded by square brackets (“[]”). Generator expressions
have the form:
Again, for a list comprehension only the outside brackets are different (square
brackets instead of parentheses).
The elements of the generated output will be the successive values of
expression. The if clauses are all optional; if present, expression
is only evaluated and added to the result when condition is true.
Generator expressions always have to be written inside parentheses, but the
parentheses signalling a function call also count. If you want to create an
iterator that will be immediately passed to a function you can write:
The for...in clauses contain the sequences to be iterated over. The
sequences do not have to be the same length, because they are iterated over from
left to right, not in parallel. For each element in sequence1,
sequence2 is looped over from the beginning. sequence3 is then looped
over for each resulting pair of elements from sequence1 and sequence2.
To put it another way, a list comprehension or generator expression is
equivalent to the following Python code:
for expr1 in sequence1:
if not (condition1):
continue # Skip this element
for expr2 in sequence2:
if not (condition2):
continue # Skip this element
...
for exprN in sequenceN:
if not (conditionN):
continue # Skip this element
# Output the value of
# the expression.
This means that when there are multiple for...in clauses but no if
clauses, the length of the resulting output will be equal to the product of the
lengths of all the sequences. If you have two lists of length 3, the output
list is 9 elements long:
>>> seq1 = 'abc'
>>> seq2 = (1,2,3)
>>> [(x,y) for x in seq1 for y in seq2]
[('a', 1), ('a', 2), ('a', 3),
('b', 1), ('b', 2), ('b', 3),
('c', 1), ('c', 2), ('c', 3)]
To avoid introducing an ambiguity into Python’s grammar, if expression is
creating a tuple, it must be surrounded with parentheses. The first list
comprehension below is a syntax error, while the second one is correct:
Generators are a special class of functions that simplify the task of writing
iterators. Regular functions compute a value and return it, but generators
return an iterator that returns a stream of values.
You’re doubtless familiar with how regular function calls work in Python or C.
When you call a function, it gets a private namespace where its local variables
are created. When the function reaches a return statement, the local
variables are destroyed and the value is returned to the caller. A later call
to the same function creates a new private namespace and a fresh set of local
variables. But, what if the local variables weren’t thrown away on exiting a
function? What if you could later resume the function where it left off? This
is what generators provide; they can be thought of as resumable functions.
Here’s the simplest example of a generator function:
defgenerate_ints(N):foriinrange(N):yieldi
Any function containing a yield keyword is a generator function; this is
detected by Python’s bytecode compiler which compiles the function
specially as a result.
When you call a generator function, it doesn’t return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
the yield expression, the generator outputs the value of i, similar to a
return statement. The big difference between yield and a return
statement is that on reaching a yield the generator’s state of execution is
suspended and local variables are preserved. On the next call to the
generator’s .__next__() method, the function will resume executing.
Here’s a sample usage of the generate_ints() generator:
You could equally write foriingenerate_ints(5), or a,b,c=generate_ints(3).
Inside a generator function, the return statement can only be used without a
value, and signals the end of the procession of values; after executing a
return the generator cannot return any further values. return with a
value, such as return5, is a syntax error inside a generator function. The
end of the generator’s results can also be indicated by raising
StopIteration manually, or by just letting the flow of execution fall off
the bottom of the function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting self.count to
0, and having the __next__() method increment self.count and return it.
However, for a moderately complicated generator, writing a corresponding class
can be much messier.
The test suite included with Python’s library, test_generators.py, contains
a number of more interesting examples. Here’s one generator that implements an
in-order traversal of a tree using generators recursively.
# A recursive generator that generates Tree leaves in in-order.definorder(t):ift:forxininorder(t.left):yieldxyieldt.labelforxininorder(t.right):yieldx
Two other examples in test_generators.py produce solutions for the N-Queens
problem (placing N queens on an NxN chess board so that no queen threatens
another) and the Knight’s Tour (finding a route that takes a knight to every
square of an NxN chessboard without visiting any square twice).
In Python 2.4 and earlier, generators only produced output. Once a generator’s
code was invoked to create an iterator, there was no way to pass any new
information into the function when its execution is resumed. You could hack
together this ability by making the generator look at a global variable or by
passing in some mutable object that callers then modify, but these approaches
are messy.
In Python 2.5 there’s a simple way to pass values into a generator.
yield became an expression, returning a value that can be assigned to
a variable or otherwise operated on:
val=(yieldi)
I recommend that you always put parentheses around a yield expression
when you’re doing something with the returned value, as in the above example.
The parentheses aren’t always necessary, but it’s easier to always add them
instead of having to remember when they’re needed.
(PEP 342 explains the exact rules, which are that a yield-expression must
always be parenthesized except when it occurs at the top-level expression on the
right-hand side of an assignment. This means you can write val=yieldi
but have to use parentheses when there’s an operation, as in val=(yieldi)+12.)
Values are sent into a generator by calling its send(value) method. This
method resumes the generator’s code and the yield expression returns the
specified value. If the regular __next__() method is called, the yield
returns None.
Here’s a simple counter that increments by 1 and allows changing the value of
the internal counter.
defcounter(maximum):i=0whilei<maximum:val=(yieldi)# If value provided, change counterifvalisnotNone:i=valelse:i+=1
Because yield will often be returning None, you should always check for
this case. Don’t just use its value in expressions unless you’re sure that the
send() method will be the only method used resume your generator function.
In addition to send(), there are two other new methods on generators:
throw(type,value=None,traceback=None) is used to raise an exception
inside the generator; the exception is raised by the yield expression
where the generator’s execution is paused.
close() raises a GeneratorExit exception inside the generator to
terminate the iteration. On receiving this exception, the generator’s code
must either raise GeneratorExit or StopIteration; catching the
exception and doing anything else is illegal and will trigger a
RuntimeError. close() will also be called by Python’s garbage
collector when the generator is garbage-collected.
If you need to run cleanup code when a GeneratorExit occurs, I suggest
using a try:...finally: suite instead of catching GeneratorExit.
The cumulative effect of these changes is to turn generators from one-way
producers of information into both producers and consumers.
Generators also become coroutines, a more generalized form of subroutines.
Subroutines are entered at one point and exited at another point (the top of the
function, and a return statement), but coroutines can be entered, exited,
and resumed at many different points (the yield statements).
>>> list(map(upper, ['sentence', 'fragment']))
['SENTENCE', 'FRAGMENT']
>>> [upper(s) for s in ['sentence', 'fragment']]
['SENTENCE', 'FRAGMENT']
You can of course achieve the same effect with a list comprehension.
filter(predicate,iter) returns an iterator over all the sequence elements
that meet a certain condition, and is similarly duplicated by list
comprehensions. A predicate is a function that returns the truth value of
some condition; for use with filter(), the predicate must take a single
value.
>>>defis_even(x):...return(x%2)==0
>>>list(filter(is_even,range(10)))[0,2,4,6,8]
This can also be written as a list comprehension:
>>>list(xforxinrange(10)ifis_even(x))[0,2,4,6,8]
enumerate(iter) counts off the elements in the iterable, returning 2-tuples
containing the count and each element.
>>> for item in enumerate(['subject', 'verb', 'object']):
... print(item)
(0, 'subject')
(1, 'verb')
(2, 'object')
enumerate() is often used when looping through a list and recording the
indexes at which certain conditions are met:
f = open('data.txt', 'r')
for i, line in enumerate(f):
if line.strip() == '':
print('Blank line at line #%i' % i)
sorted(iterable,[key=None],[reverse=False]) collects all the elements of
the iterable into a list, sorts the list, and returns the sorted result. The
key, and reverse arguments are passed through to the constructed list’s
.sort() method.
The any(iter) and all(iter) built-ins look at the truth values of an
iterable’s contents. any() returns True if any element in the iterable is
a true value, and all() returns True if all of the elements are true
values:
It doesn’t construct an in-memory list and exhaust all the input iterators
before returning; instead tuples are constructed and returned only if they’re
requested. (The technical term for this behaviour is lazy evaluation.)
This iterator is intended to be used with iterables that are all of the same
length. If the iterables are of different lengths, the resulting stream will be
the same length as the shortest iterable.
zip(['a','b'],(1,2,3))=>('a',1),('b',2)
You should avoid doing this, though, because an element may be taken from the
longer iterators and discarded. This means you can’t go on to use the iterators
further because you risk skipping a discarded element.
The itertools module contains a number of commonly-used iterators as well
as functions for combining several iterators. This section will introduce the
module’s contents by showing small examples.
The module’s functions fall into a few broad classes:
Functions that create a new iterator based on an existing iterator.
Functions for treating an iterator’s elements as function arguments.
Functions for selecting portions of an iterator’s output.
itertools.count(n) returns an infinite stream of integers, increasing by 1
each time. You can optionally supply the starting number, which defaults to 0:
itertools.cycle(iter) saves a copy of the contents of a provided iterable
and returns a new iterator that returns its elements from first to last. The
new iterator will repeat these elements infinitely.
itertools.chain(iterA,iterB,...) takes an arbitrary number of iterables as
input, and returns all the elements of the first iterator, then all the elements
of the second, and so on, until all of the iterables have been exhausted.
itertools.islice(iter,[start],stop,[step]) returns a stream that’s a
slice of the iterator. With a single stop argument, it will return the
first stop elements. If you supply a starting index, you’ll get
stop-start elements, and if you supply a value for step, elements will
be skipped accordingly. Unlike Python’s string and list slicing, you can’t use
negative values for start, stop, or step.
itertools.tee(iter,[n]) replicates an iterator; it returns n
independent iterators that will all return the contents of the source iterator.
If you don’t supply a value for n, the default is 2. Replicating iterators
requires saving some of the contents of the source iterator, so this can consume
significant memory if the iterator is large and one of the new iterators is
consumed more than the others.
The operator module contains a set of functions corresponding to Python’s
operators. Some examples are operator.add(a,b) (adds two values),
operator.ne(a,b) (same as a!=b), and operator.attrgetter('id')
(returns a callable that fetches the "id" attribute).
itertools.starmap(func,iter) assumes that the iterable will return a stream
of tuples, and calls f() using these tuples as the arguments:
itertools.takewhile(predicate,iter) returns elements for as long as the
predicate returns true. Once the predicate returns false, the iterator will
signal the end of its results.
The last function I’ll discuss, itertools.groupby(iter,key_func=None), is
the most complicated. key_func(elem) is a function that can compute a key
value for each element returned by the iterable. If you don’t supply a key
function, the key is simply each element itself.
groupby() collects all the consecutive elements from the underlying iterable
that have the same key value, and returns a stream of 2-tuples containing a key
value and an iterator for the elements with that key.
groupby() assumes that the underlying iterable’s contents will already be
sorted based on the key. Note that the returned iterators also use the
underlying iterable, so you have to consume the results of iterator-1 before
requesting iterator-2 and its corresponding key.
The functools module in Python 2.5 contains some higher-order functions.
A higher-order function takes one or more functions as input and returns a
new function. The most useful tool in this module is the
functools.partial() function.
For programs written in a functional style, you’ll sometimes want to construct
variants of existing functions that have some of the parameters filled in.
Consider a Python function f(a,b,c); you may wish to create a new function
g(b,c) that’s equivalent to f(1,b,c); you’re filling in a value for
one of f()‘s parameters. This is called “partial function application”.
The constructor for partial takes the arguments (function,arg1,arg2,...kwarg1=value1,kwarg2=value2). The resulting object is callable, so you
can just call it to invoke function with the filled-in arguments.
Here’s a small but realistic example:
import functools
def log (message, subsystem):
"Write the contents of 'message' to the specified subsystem."
print('%s: %s' % (subsystem, message))
...
server_log = functools.partial(log, subsystem='server')
server_log('Unable to open socket')
functools.reduce(func,iter,[initial_value]) cumulatively performs an
operation on all the iterable’s elements and, therefore, can’t be applied to
infinite iterables. (Note it is not in builtins, but in the
functools module.) func must be a function that takes two elements
and returns a single value. functools.reduce() takes the first two
elements A and B returned by the iterator and calculates func(A,B). It
then requests the third element, C, calculates func(func(A,B),C), combines
this result with the fourth element returned, and continues until the iterable
is exhausted. If the iterable returns no values at all, a TypeError
exception is raised. If the initial value is supplied, it’s used as a starting
point and func(initial_value,A) is the first calculation.
>>> import operator, functools
>>> functools.reduce(operator.concat, ['A', 'BB', 'C'])
'ABBC'
>>> functools.reduce(operator.concat, [])
Traceback (most recent call last):
...
TypeError: reduce() of empty sequence with no initial value
>>> functools.reduce(operator.mul, [1,2,3], 1)
6
>>> functools.reduce(operator.mul, [], 1)
1
If you use operator.add() with functools.reduce(), you’ll add up all the
elements of the iterable. This case is so common that there’s a special
built-in called sum() to compute it:
The operator module was mentioned earlier. It contains a set of
functions corresponding to Python’s operators. These functions are often useful
in functional-style code because they save you from writing trivial functions
that perform a single operation.
Some of the functions in this module are:
Math operations: add(), sub(), mul(), floordiv(), abs(), ...
Logical operations: not_(), truth().
Bitwise operations: and_(), or_(), invert().
Comparisons: eq(), ne(), lt(), le(), gt(), and ge().
Object identity: is_(), is_not().
Consult the operator module’s documentation for a complete list.
Collin Winter’s functional module
provides a number of more advanced tools for functional programming. It also
reimplements several Python built-ins, trying to make them more intuitive to
those used to functional programming in other languages.
This section contains an introduction to some of the most important functions in
functional; full documentation can be found at the project’s website.
compose(outer,inner,unpack=False)
The compose() function implements function composition. In other words, it
returns a wrapper around the outer and inner callables, such that the
return value from inner is fed directly to outer. That is,
The unpack keyword is provided to work around the fact that Python functions
are not always fully curried. By
default, it is expected that the inner function will return a single object
and that the outer function will take a single argument. Setting the
unpack argument causes compose to expect a tuple from inner which
will be expanded before being passed to outer. Put simply,
compose(f,g)(5,6)
is equivalent to:
f(g(5,6))
while
compose(f,g,unpack=True)(5,6)
is equivalent to:
f(*g(5,6))
Even though compose() only accepts two functions, it’s trivial to build up a
version that will compose any number of functions. We’ll use
functools.reduce(), compose() and partial() (the last of which is
provided by both functional and functools).
foldl() takes a binary function, a starting value (usually some kind of
‘zero’), and an iterable. The function is applied to the starting value and the
first element of the list, then the result of that and the second element of the
list, then the result of that and the third element of the list, and so on.
This means that a call such as:
foldl(f,0,[1,2,3])
is equivalent to:
f(f(f(0,1),2),3)
foldl() is roughly equivalent to the following recursive function:
If the function you need doesn’t exist, you need to write it. One way to write
small functions is to use the lambda statement. lambda takes a number
of parameters and an expression combining these parameters, and creates a small
function that returns the value of the expression:
Which alternative is preferable? That’s a style question; my usual course is to
avoid using lambda.
One reason for my preference is that lambda is quite limited in the
functions it can define. The result has to be computable as a single
expression, which means you can’t have multiway if...elif...else
comparisons or try...except statements. If you try to do too much in a
lambda statement, you’ll end up with an overly complicated expression that’s
hard to read. Quick, what’s the following code doing?
You can figure it out, but it takes time to disentangle the expression to figure
out what’s going on. Using a short nested def statements makes things a
little bit better:
The author would like to thank the following people for offering suggestions,
corrections and assistance with various drafts of this article: Ian Bicking,
Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
Version 0.1: posted June 30 2006.
Version 0.11: posted July 1 2006. Typo fixes.
Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one.
Typo fixes.
Version 0.21: Added more references suggested on the tutor mailing list.
Version 0.30: Adds a section on the functional module written by Collin
Winter; adds short section on the operator module; a few other edits.
Structure and Interpretation of Computer Programs, by Harold Abelson and
Gerald Jay Sussman with Julie Sussman. Full text at
http://mitpress.mit.edu/sicp/. In this classic textbook of computer science,
chapters 2 and 3 discuss the use of sequences and streams to organize the data
flow inside a program. The book uses Scheme for its examples, but many of the
design approaches described in these chapters are applicable to functional-style
Python code.
http://gnosis.cx/TPiP/: The first chapter of David Mertz’s book
Text Processing in Python discusses functional programming
for text processing, in the section titled “Utilizing Higher-Order Functions in
Text Processing”.
Mertz also wrote a 3-part series of articles on functional programming
for IBM’s DeveloperWorks site; see
part 1,
part 2, and
part 3,
Logging is a means of tracking events that happen when some software runs. The
software’s developer adds logging calls to their code to indicate that certain
events have occurred. An event is described by a descriptive message which can
optionally contain variable data (i.e. data that is potentially different for
each occurrence of the event). Events also have an importance which the
developer ascribes to the event; the importance can also be called the level
or severity.
Logging provides a set of convenience functions for simple logging usage. These
are debug(), info(), warning(), error() and
critical(). To determine when to use logging, see the table below, which
states, for each of a set of common tasks, the best tool to use for it.
Task you want to perform
The best tool for the task
Display console output for ordinary
usage of a command line script or
program
The logging functions are named after the level or severity of the events
they are used to track. The standard levels and their applicability are
described below (in increasing order of severity):
Level
When it’s used
DEBUG
Detailed information, typically of interest
only when diagnosing problems.
INFO
Confirmation that things are working as
expected.
WARNING
An indication that something unexpected
happened, or indicative of some problem in
the near future (e.g. ‘disk space low’).
The software is still working as expected.
ERROR
Due to a more serious problem, the software
has not been able to perform some function.
CRITICAL
A serious error, indicating that the program
itself may be unable to continue running.
The default level is WARNING, which means that only events of this level
and above will be tracked, unless the logging package is configured to do
otherwise.
Events that are tracked can be handled in different ways. The simplest way of
handling tracked events is to print them to the console. Another common way
is to write them to a disk file.
import logging
logging.warning('Watch out!') # will print a message to the console
logging.info('I told you so') # will not print anything
If you type these lines into a script and run it, you’ll see:
WARNING:root:Watchout!
printed out on the console. The INFO message doesn’t appear because the
default level is WARNING. The printed message includes the indication of
the level and the description of the event provided in the logging call, i.e.
‘Watch out!’. Don’t worry about the ‘root’ part for now: it will be explained
later. The actual output can be formatted quite flexibly if you need that;
formatting options will also be explained later.
A very common situation is that of recording logging events in a file, so let’s
look at that next:
import logging
logging.basicConfig(filename='example.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
And now if we open the file and look at what we have, we should find the log
messages:
This example also shows how you can set the logging level which acts as the
threshold for tracking. In this case, because we set the threshold to
DEBUG, all of the messages were printed.
If you want to set the logging level from a command-line option such as:
--log=INFO
and you have the value of the parameter passed for --log in some variable
loglevel, you can use:
getattr(logging,loglevel.upper())
to get the value which you’ll pass to basicConfig() via the level
argument. You may want to error check any user input value, perhaps as in the
following example:
# assuming loglevel is bound to the string value obtained from the
# command line argument. Convert to upper case to allow the user to
# specify --log=DEBUG or --log=debug
numeric_level = getattr(logging, loglevel.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError('Invalid log level: %s' % loglevel)
logging.basicConfig(level=numeric_level, ...)
The call to basicConfig() should come before any calls to debug(),
info() etc. As it’s intended as a one-off simple configuration facility,
only the first call will actually do anything: subsequent calls are effectively
no-ops.
If you run the above script several times, the messages from successive runs
are appended to the file example.log. If you want each run to start afresh,
not remembering the messages from earlier runs, you can specify the filemode
argument, by changing the call in the above example to:
which is hopefully what you were expecting to see. You can generalize this to
multiple modules, using the pattern in mylib.py. Note that for this simple
usage pattern, you won’t know, by looking in the log file, where in your
application your messages came from, apart from looking at the event
description. If you want to track the location of your messages, you’ll need
to refer to the documentation beyond the tutorial level – see
Advanced Logging Tutorial.
To log variable data, use a format string for the event description message and
append the variable data as arguments. For example:
import logging
logging.warning('%s before you %s', 'Look', 'leap!')
will display:
WARNING:root:Lookbeforeyouleap!
As you can see, merging of variable data into the event description message
uses the old, %-style of string formatting. This is for backwards
compatibility: the logging package pre-dates newer formatting options such as
str.format() and string.Template. These newer formatting
options are supported, but exploring them is outside the scope of this
tutorial.
To change the format which is used to display messages, you need to
specify the format you want to use:
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
logging.debug('This message should appear on the console')
logging.info('So should this')
logging.warning('And this, too')
Notice that the ‘root’ which appeared in earlier examples has disappeared. For
a full set of things that can appear in format strings, you can refer to the
documentation for LogRecord attributes, but for simple usage, you just
need the levelname (severity), message (event description, including
variable data) and perhaps to display when the event occurred. This is
described in the next section.
To display the date and time of an event, you would place ‘%(asctime)s’ in
your format string:
import logging
logging.basicConfig(format='%(asctime)s %(message)s')
logging.warning('is when this event was logged.')
which should print something like this:
2010-12-1211:41:42,612iswhenthiseventwaslogged.
The default format for date/time display (shown above) is ISO8601. If you need
more control over the formatting of the date/time, provide a datefmt
argument to basicConfig, as in this example:
import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')
which would display something like this:
12/12/201011:46:36AMiswhenthiseventwaslogged.
The format of the datefmt argument is the same as supported by
time.strftime().
That concludes the basic tutorial. It should be enough to get you up and
running with logging. There’s a lot more that the logging package offers, but
to get the best out of it, you’ll need to invest a little more of your time in
reading the following sections. If you’re ready for that, grab some of your
favourite beverage and carry on.
If your logging needs are simple, then use the above examples to incorporate
logging into your own scripts, and if you run into problems or don’t
understand something, please post a question on the comp.lang.python Usenet
group (available at http://groups.google.com/group/comp.lang.python) and you
should receive help before too long.
Still here? You can carry on reading the next few sections, which provide a
slightly more advanced/in-depth tutorial than the basic one above. After that,
you can take a look at the Logging Cookbook.
The logging library takes a modular approach and offers several categories
of components: loggers, handlers, filters, and formatters.
Loggers expose the interface that application code directly uses.
Handlers send the log records (created by loggers) to the appropriate
destination.
Filters provide a finer grained facility for determining which log records
to output.
Formatters specify the layout of log records in the final output.
Logging is performed by calling methods on instances of the Logger
class (hereafter called loggers). Each instance has a name, and they are
conceptually arranged in a namespace hierarchy using dots (periods) as
separators. For example, a logger named ‘scan’ is the parent of loggers
‘scan.text’, ‘scan.html’ and ‘scan.pdf’. Logger names can be anything you want,
and indicate the area of an application in which a logged message originates.
A good convention to use when naming loggers is to use a module-level logger,
in each module which uses logging, named as follows:
logger=logging.getLogger(__name__)
This means that logger names track the package/module hierarchy, and it’s
intuitively obvious where events are logged just from the logger name.
The root of the hierarchy of loggers is called the root logger. That’s the
logger used by the functions debug(), info(), warning(),
error() and critical(), which just call the same-named method of
the root logger. The functions and the methods have the same signatures. The
root logger’s name is printed as ‘root’ in the logged output.
It is, of course, possible to log messages to different destinations. Support
is included in the package for writing log messages to files, HTTP GET/POST
locations, email via SMTP, generic sockets, queues, or OS-specific logging
mechanisms such as syslog or the Windows NT event log. Destinations are served
by handler classes. You can create your own log destination class if
you have special requirements not met by any of the built-in handler classes.
By default, no destination is set for any logging messages. You can specify
a destination (such as console or file) by using basicConfig() as in the
tutorial examples. If you call the functions debug(), info(),
warning(), error() and critical(), they will check to see
if no destination is set; and if one is not set, they will set a destination
of the console (sys.stderr) and a default format for the displayed
message before delegating to the root logger to do the actual message output.
The default format set by basicConfig() for messages is:
severity:loggername:message
You can change this by passing a format string to basicConfig() with the
format keyword argument. For all options regarding how a format string is
constructed, see Formatter Objects.
Logger objects have a threefold job. First, they expose several
methods to application code so that applications can log messages at runtime.
Second, logger objects determine which log messages to act upon based upon
severity (the default filtering facility) or filter objects. Third, logger
objects pass along relevant log messages to all interested log handlers.
The most widely used methods on logger objects fall into two categories:
configuration and message sending.
These are the most common configuration methods:
Logger.setLevel() specifies the lowest-severity log message a logger
will handle, where debug is the lowest built-in severity level and critical
is the highest built-in severity. For example, if the severity level is
INFO, the logger will handle only INFO, WARNING, ERROR, and CRITICAL messages
and will ignore DEBUG messages.
You don’t need to always call these methods on every logger you create. See the
last two paragraphs in this section.
With the logger object configured, the following methods create log messages:
Logger.debug(), Logger.info(), Logger.warning(),
Logger.error(), and Logger.critical() all create log records with
a message and a level that corresponds to their respective method names. The
message is actually a format string, which may contain the standard string
substitution syntax of %s, %d, %f, and so on. The
rest of their arguments is a list of objects that correspond with the
substitution fields in the message. With regard to **kwargs, the
logging methods care only about a keyword of exc_info and use it to
determine whether to log exception information.
Logger.log() takes a log level as an explicit argument. This is a
little more verbose for logging messages than using the log level convenience
methods listed above, but this is how to log at custom log levels.
getLogger() returns a reference to a logger instance with the specified
name if it is provided, or root if not. The names are period-separated
hierarchical structures. Multiple calls to getLogger() with the same name
will return a reference to the same logger object. Loggers that are further
down in the hierarchical list are children of loggers higher up in the list.
For example, given a logger with a name of foo, loggers with names of
foo.bar, foo.bar.baz, and foo.bam are all descendants of foo.
Loggers have a concept of effective level. If a level is not explicitly set
on a logger, the level of its parent is used instead as its effective level.
If the parent has no explicit level set, its parent is examined, and so on -
all ancestors are searched until an explicitly set level is found. The root
logger always has an explicit level set (WARNING by default). When deciding
whether to process an event, the effective level of the logger is used to
determine whether the event is passed to the logger’s handlers.
Child loggers propagate messages up to the handlers associated with their
ancestor loggers. Because of this, it is unnecessary to define and configure
handlers for all the loggers an application uses. It is sufficient to
configure handlers for a top-level logger and create child loggers as needed.
(You can, however, turn off propagation by setting the propagate
attribute of a logger to False.)
Handler objects are responsible for dispatching the
appropriate log messages (based on the log messages’ severity) to the handler’s
specified destination. Logger objects can add zero or more handler objects to
themselves with an addHandler() method. As an example scenario, an
application may want to send all log messages to a log file, all log messages
of error or higher to stdout, and all messages of critical to an email address.
This scenario requires three individual handlers where each handler is
responsible for sending messages of a specific severity to a specific location.
There are very few methods in a handler for application developers to concern
themselves with. The only handler methods that seem relevant for application
developers who are using the built-in handler objects (that is, not creating
custom handlers) are the following configuration methods:
The Handler.setLevel() method, just as in logger objects, specifies the
lowest severity that will be dispatched to the appropriate destination. Why
are there two setLevel() methods? The level set in the logger
determines which severity of messages it will pass to its handlers. The level
set in each handler determines which messages that handler will send on.
setFormatter() selects a Formatter object for this handler to use.
addFilter() and removeFilter() respectively configure and
deconfigure filter objects on handlers.
Application code should not directly instantiate and use instances of
Handler. Instead, the Handler class is a base class that
defines the interface that all handlers should have and establishes some
default behavior that child classes can use (or override).
Formatter objects configure the final order, structure, and contents of the log
message. Unlike the base logging.Handler class, application code may
instantiate formatter classes, although you could likely subclass the formatter
if your application needs special behavior. The constructor takes three
optional arguments – a message format string, a date format string and a style
indicator.
If there is no message format string, the default is to use the
raw message. If there is no date format string, the default date format is:
%Y-%m-%d%H:%M:%S
with the milliseconds tacked on at the end. The style is one of %, ‘{‘
or ‘$’. If one of these is not specified, then ‘%’ will be used.
If the style is ‘%’, the message format string uses
%(<dictionarykey>)s styled string substitution; the possible keys are
documented in LogRecord attributes. If the style is ‘{‘, the message
format string is assumed to be compatible with str.format() (using
keyword arguments), while if the style is ‘$’ then the message format string
should conform to what is expected by string.Template.substitute().
Changed in version 3.2:
Changed in version 3.2: Added the style parameter.
The following message format string will log the time in a human-readable
format, the severity of the message, and the contents of the message, in that
order:
'%(asctime)s - %(levelname)s - %(message)s'
Formatters use a user-configurable function to convert the creation time of a
record to a tuple. By default, time.localtime() is used; to change this
for a particular formatter instance, set the converter attribute of the
instance to a function with the same signature as time.localtime() or
time.gmtime(). To change it for all formatters, for example if you want
all logging times to be shown in GMT, set the converter attribute in the
Formatter class (to time.gmtime for GMT display).
Creating loggers, handlers, and formatters explicitly using Python
code that calls the configuration methods listed above.
Creating a logging config file and reading it using the fileConfig()
function.
Creating a dictionary of configuration information and passing it
to the dictConfig() function.
For the reference documentation on the last two options, see
Configuration functions. The following example configures a very simple
logger, a console handler, and a simple formatter using Python code:
The following Python module creates a logger, handler, and formatter nearly
identical to those in the example listed above, with the only difference being
the names of the objects:
You can see that the config file approach has a few advantages over the Python
code approach, mainly separation of configuration and code and the ability of
noncoders to easily modify the logging properties.
Note that the class names referenced in config files need to be either relative
to the logging module, or absolute values which can be resolved using normal
import mechanisms. Thus, you could use either
WatchedFileHandler (relative to the logging module) or
mypackage.mymodule.MyHandler (for a class defined in package mypackage
and module mymodule, where mypackage is available on the Python import
path).
In Python 3.2, a new means of configuring logging has been introduced, using
dictionaries to hold configuration information. This provides a superset of the
functionality of the config-file-based approach outlined above, and is the
recommended configuration method for new applications and deployments. Because
a Python dictionary is used to hold configuration information, and since you
can populate that dictionary using different means, you have more options for
configuration. For example, you can use a configuration file in JSON format,
or, if you have access to YAML processing functionality, a file in YAML
format, to populate the configuration dictionary. Or, of course, you can
construct the dictionary in Python code, receive it in pickled form over a
socket, or use whatever approach makes sense for your application.
Here’s an example of the same configuration as above, in YAML format for
the new dictionary-based approach:
If no logging configuration is provided, it is possible to have a situation
where a logging event needs to be output, but no handlers can be found to
output the event. The behaviour of the logging package in these
circumstances is dependent on the Python version.
For versions of Python prior to 3.2, the behaviour is as follows:
If logging.raiseExceptions is False (production mode), the event is
silently dropped.
If logging.raiseExceptions is True (development mode), a message
‘No handlers could be found for logger X.Y.Z’ is printed once.
In Python 3.2 and later, the behaviour is as follows:
The event is output using a ‘handler of last resort’, stored in
logging.lastResort. This internal handler is not associated with any
logger, and acts like a StreamHandler which writes the
event description message to the current value of sys.stderr (therefore
respecting any redirections which may be in effect). No formatting is
done on the message - just the bare event description message is printed.
The handler’s level is set to WARNING, so all events at this and
greater severities will be output.
To obtain the pre-3.2 behaviour, logging.lastResort can be set to None.
When developing a library which uses logging, you should take care to
document how the library uses logging - for example, the names of loggers
used. Some consideration also needs to be given to its logging configuration.
If the using application does not use logging, and library code makes logging
calls, then (as described in the previous section) events of severity
WARNING and greater will be printed to sys.stderr. This is regarded as
the best default behaviour.
If for some reason you don’t want these messages printed in the absence of
any logging configuration, you can attach a do-nothing handler to the top-level
logger for your library. This avoids the message being printed, since a handler
will be always be found for the library’s events: it just doesn’t produce any
output. If the library user configures logging for application use, presumably
that configuration will add some handlers, and if levels are suitably
configured then logging calls made in library code will send output to those
handlers, as normal.
A do-nothing handler is included in the logging package:
NullHandler (since Python 3.1). An instance of this handler
could be added to the top-level logger of the logging namespace used by the
library (if you want to prevent your library’s logged events being output to
sys.stderr in the absence of logging configuration). If all logging by a
library foo is done using loggers with names matching ‘foo.x’, ‘foo.x.y’,
etc. then the code:
should have the desired effect. If an organisation produces a number of
libraries, then the logger name specified can be ‘orgname.foo’ rather than
just ‘foo’.
PLEASE NOTE: It is strongly advised that you do not add any handlers other
thanNullHandlerto your library’s loggers. This is
because the configuration of handlers is the prerogative of the application
developer who uses your library. The application developer knows their target
audience and what handlers are most appropriate for their application: if you
add handlers ‘under the hood’, you might well interfere with their ability to
carry out unit tests and deliver logs which suit their requirements.
The numeric values of logging levels are given in the following table. These are
primarily of interest if you want to define your own levels, and need them to
have specific values relative to the predefined levels. If you define a level
with the same numeric value, it overwrites the predefined value; the predefined
name is lost.
Level
Numeric value
CRITICAL
50
ERROR
40
WARNING
30
INFO
20
DEBUG
10
NOTSET
0
Levels can also be associated with loggers, being set either by the developer or
through loading a saved logging configuration. When a logging method is called
on a logger, the logger compares its own level with the level associated with
the method call. If the logger’s level is higher than the method call’s, no
logging message is actually generated. This is the basic mechanism controlling
the verbosity of logging output.
Logging messages are encoded as instances of the LogRecord
class. When a logger decides to actually log an event, a
LogRecord instance is created from the logging message.
Logging messages are subjected to a dispatch mechanism through the use of
handlers, which are instances of subclasses of the Handler
class. Handlers are responsible for ensuring that a logged message (in the form
of a LogRecord) ends up in a particular location (or set of locations)
which is useful for the target audience for that message (such as end users,
support desk staff, system administrators, developers). Handlers are passed
LogRecord instances intended for particular destinations. Each logger
can have zero, one or more handlers associated with it (via the
addHandler() method of Logger). In addition to any
handlers directly associated with a logger, all handlers associated with all
ancestors of the logger are called to dispatch the message (unless the
propagate flag for a logger is set to a false value, at which point the
passing to ancestor handlers stops).
Just as for loggers, handlers can have levels associated with them. A handler’s
level acts as a filter in the same way as a logger’s level does. If a handler
decides to actually dispatch an event, the emit() method is used
to send the message to its destination. Most user-defined subclasses of
Handler will need to override this emit().
Defining your own levels is possible, but should not be necessary, as the
existing levels have been chosen on the basis of practical experience.
However, if you are convinced that you need custom levels, great care should
be exercised when doing this, and it is possibly a very bad idea to define
custom levels if you are developing a library. That’s because if multiple
library authors all define their own custom levels, there is a chance that
the logging output from such multiple libraries used together will be
difficult for the using developer to control and/or interpret, because a
given numeric value might mean different things for different libraries.
In addition to the base Handler class, many useful subclasses are
provided:
StreamHandler instances send messages to streams (file-like
objects).
FileHandler instances send messages to disk files.
BaseRotatingHandler is the base class for handlers that
rotate log files at a certain point. It is not meant to be instantiated
directly. Instead, use RotatingFileHandler or
TimedRotatingFileHandler.
RotatingFileHandler instances send messages to disk
files, with support for maximum log file sizes and log file rotation.
TimedRotatingFileHandler instances send messages to
disk files, rotating the log file at certain timed intervals.
SocketHandler instances send messages to TCP/IP
sockets.
SMTPHandler instances send messages to a designated
email address.
SysLogHandler instances send messages to a Unix
syslog daemon, possibly on a remote machine.
NTEventLogHandler instances send messages to a
Windows NT/2000/XP event log.
MemoryHandler instances send messages to a buffer
in memory, which is flushed whenever specific criteria are met.
HTTPHandler instances send messages to an HTTP
server using either GET or POST semantics.
WatchedFileHandler instances watch the file they are
logging to. If the file changes, it is closed and reopened using the file
name. This handler is only useful on Unix-like systems; Windows does not
support the underlying mechanism used.
NullHandler instances do nothing with error messages. They are used
by library developers who want to use logging, but want to avoid the ‘No
handlers could be found for logger XXX’ message which can be displayed if
the library user has not configured logging. See Configuring Logging for a Library for
more information.
Logged messages are formatted for presentation through instances of the
Formatter class. They are initialized with a format string suitable for
use with the % operator and a dictionary.
For formatting multiple messages in a batch, instances of
BufferingFormatter can be used. In addition to the format string (which
is applied to each message in the batch), there is provision for header and
trailer format strings.
When filtering based on logger level and/or handler level is not enough,
instances of Filter can be added to both Logger and
Handler instances (through their addFilter() method). Before
deciding to process a message further, both loggers and handlers consult all
their filters for permission. If any filter returns a false value, the message
is not processed further.
The basic Filter functionality allows filtering by specific logger
name. If this feature is used, messages sent to the named logger and its
children are allowed through the filter, and all others dropped.
The logging package is designed to swallow exceptions which occur while logging
in production. This is so that errors which occur while handling logging events
- such as logging misconfiguration, network or other similar errors - do not
cause the application using logging to terminate prematurely.
SystemExit and KeyboardInterrupt exceptions are never
swallowed. Other exceptions which occur during the emit() method of a
Handler subclass are passed to its handleError() method.
The default implementation of handleError() in Handler checks
to see if a module-level variable, raiseExceptions, is set. If set, a
traceback is printed to sys.stderr. If not set, the exception is swallowed.
Note: The default value of raiseExceptions is True. This is because
during development, you typically want to be notified of any exceptions that
occur. It’s advised that you set raiseExceptions to False for production
usage.
In the preceding sections and examples, it has been assumed that the message
passed when logging the event is a string. However, this is not the only
possibility. You can pass an arbitrary object as a message, and its
__str__() method will be called when the logging system needs to convert
it to a string representation. In fact, if you want to, you can avoid
computing a string representation altogether - for example, the
SocketHandler emits an event by pickling it and sending it over the
wire.
Formatting of message arguments is deferred until it cannot be avoided.
However, computing the arguments passed to the logging method can also be
expensive, and you may want to avoid doing it if the logger will just throw
away your event. To decide what to do, you can call the isEnabledFor()
method which takes a level argument and returns true if the event would be
created by the Logger for that level of call. You can write code like this:
if logger.isEnabledFor(logging.DEBUG):
logger.debug('Message with %s, %s', expensive_func1(),
expensive_func2())
so that if the logger’s threshold is set above DEBUG, the calls to
expensive_func1() and expensive_func2() are never made.
There are other optimizations which can be made for specific applications which
need more precise control over what logging information is collected. Here’s a
list of things you can do to avoid processing during logging which you don’t
need:
What you don’t want to collect
How to avoid collecting it
Information about where calls were made from.
Set logging._srcfile to None.
Threading information.
Set logging.logThreads to 0.
Process information.
Set logging.logProcesses to 0.
Also note that the core logging module only includes the basic handlers. If
you don’t import logging.handlers and logging.config, they won’t
take up any memory.
Multiple calls to logging.getLogger('someLogger') return a reference to the
same logger object. This is true not only within the same module, but also
across modules as long as it is in the same Python interpreter process. It is
true for references to the same object; additionally, application code can
define and configure a parent logger in one module and create (but not
configure) a child logger in a separate module, and all logger calls to the
child will pass up to the parent. Here is a main module:
import logging
import auxiliary_module
# create logger with 'spam_application'
logger = logging.getLogger('spam_application')
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('spam.log')
fh.setLevel(logging.DEBUG)
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(logging.ERROR)
# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
ch.setFormatter(formatter)
# add the handlers to the logger
logger.addHandler(fh)
logger.addHandler(ch)
logger.info('creating an instance of auxiliary_module.Auxiliary')
a = auxiliary_module.Auxiliary()
logger.info('created an instance of auxiliary_module.Auxiliary')
logger.info('calling auxiliary_module.Auxiliary.do_something')
a.do_something()
logger.info('finished auxiliary_module.Auxiliary.do_something')
logger.info('calling auxiliary_module.some_function()')
auxiliary_module.some_function()
logger.info('done with auxiliary_module.some_function()')
Here is the auxiliary module:
import logging
# create logger
module_logger = logging.getLogger('spam_application.auxiliary')
class Auxiliary:
def __init__(self):
self.logger = logging.getLogger('spam_application.auxiliary.Auxiliary')
self.logger.info('creating an instance of Auxiliary')
def do_something(self):
self.logger.info('doing something')
a = 1 + 1
self.logger.info('done doing something')
def some_function():
module_logger.info('received a call to "some_function"')
The output looks like this:
2005-03-23 23:47:11,663 - spam_application - INFO -
creating an instance of auxiliary_module.Auxiliary
2005-03-23 23:47:11,665 - spam_application.auxiliary.Auxiliary - INFO -
creating an instance of Auxiliary
2005-03-23 23:47:11,665 - spam_application - INFO -
created an instance of auxiliary_module.Auxiliary
2005-03-23 23:47:11,668 - spam_application - INFO -
calling auxiliary_module.Auxiliary.do_something
2005-03-23 23:47:11,668 - spam_application.auxiliary.Auxiliary - INFO -
doing something
2005-03-23 23:47:11,669 - spam_application.auxiliary.Auxiliary - INFO -
done doing something
2005-03-23 23:47:11,670 - spam_application - INFO -
finished auxiliary_module.Auxiliary.do_something
2005-03-23 23:47:11,671 - spam_application - INFO -
calling auxiliary_module.some_function()
2005-03-23 23:47:11,672 - spam_application.auxiliary - INFO -
received a call to 'some_function'
2005-03-23 23:47:11,673 - spam_application - INFO -
done with auxiliary_module.some_function()
Loggers are plain Python objects. The addHandler() method has no minimum
or maximum quota for the number of handlers you may add. Sometimes it will be
beneficial for an application to log all messages of all severities to a text
file while simultaneously logging errors or above to the console. To set this
up, simply configure the appropriate handlers. The logging calls in the
application code will remain unchanged. Here is a slight modification to the
previous simple module-based configuration example:
import logging
logger = logging.getLogger('simple_example')
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('spam.log')
fh.setLevel(logging.DEBUG)
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(logging.ERROR)
# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
fh.setFormatter(formatter)
# add the handlers to logger
logger.addHandler(ch)
logger.addHandler(fh)
# 'application' code
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
Notice that the ‘application’ code does not care about multiple handlers. All
that changed was the addition and configuration of a new handler named fh.
The ability to create new handlers with higher- or lower-severity filters can be
very helpful when writing and testing an application. Instead of using many
print statements for debugging, use logger.debug: Unlike the print
statements, which you will have to delete or comment out later, the logger.debug
statements can remain intact in the source code and remain dormant until you
need them again. At that time, the only change that needs to happen is to
modify the severity level of the logger and/or handler to debug.
Let’s say you want to log to console and file with different message formats and
in differing circumstances. Say you want to log messages with levels of DEBUG
and higher to file, and those messages at level INFO and higher to the console.
Let’s also assume that the file should contain timestamps, but the console
messages should not. Here’s how you can achieve this:
import logging
# set up logging to file - see previous section for more details
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
datefmt='%m-%d %H:%M',
filename='/temp/myapp.log',
filemode='w')
# define a Handler which writes INFO messages or higher to the sys.stderr
console = logging.StreamHandler()
console.setLevel(logging.INFO)
# set a format which is simpler for console use
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
# tell the handler to use this format
console.setFormatter(formatter)
# add the handler to the root logger
logging.getLogger('').addHandler(console)
# Now, we can log to the root logger, or any other logger. First the root...
logging.info('Jackdaws love my big sphinx of quartz.')
# Now, define a couple of other loggers which might represent areas in your
# application:
logger1 = logging.getLogger('myapp.area1')
logger2 = logging.getLogger('myapp.area2')
logger1.debug('Quick zephyrs blow, vexing daft Jim.')
logger1.info('How quickly daft jumping zebras vex.')
logger2.warning('Jail zesty vixen who grabbed pay from quack.')
logger2.error('The five boxing wizards jump quickly.')
Here is an example of a module using the logging configuration server:
import logging
import logging.config
import time
import os
# read initial config file
logging.config.fileConfig('logging.conf')
# create and start listener on port 9999
t = logging.config.listen(9999)
t.start()
logger = logging.getLogger('simpleExample')
try:
# loop through logging calls to see the difference
# new configurations make, until Ctrl+C is pressed
while True:
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
time.sleep(5)
except KeyboardInterrupt:
# cleanup
logging.config.stopListening()
t.join()
And here is a script that takes a filename and sends that file to the server,
properly preceded with the binary-encoded length, as the new logging
configuration:
#!/usr/bin/env python
import socket, sys, struct
with open(sys.argv[1], 'rb') as f:
data_to_send = f.read()
HOST = 'localhost'
PORT = 9999
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print('connecting...')
s.connect((HOST, PORT))
print('sending config...')
s.send(struct.pack('>L', len(data_to_send)))
s.send(data_to_send)
s.close()
print('complete')
Sometimes you have to get your logging handlers to do their work without
blocking the thread you’re logging from. This is common in Web applications,
though of course it also occurs in other scenarios.
A common culprit which demonstrates sluggish behaviour is the
SMTPHandler: sending emails can take a long time, for a
number of reasons outside the developer’s control (for example, a poorly
performing mail or network infrastructure). But almost any network-based
handler can block: Even a SocketHandler operation may do a
DNS query under the hood which is too slow (and this query can be deep in the
socket library code, below the Python layer, and outside your control).
One solution is to use a two-part approach. For the first part, attach only a
QueueHandler to those loggers which are accessed from
performance-critical threads. They simply write to their queue, which can be
sized to a large enough capacity or initialized with no upper bound to their
size. The write to the queue will typically be accepted quickly, though you
will probably need to catch the queue.Full exception as a precaution
in your code. If you are a library developer who has performance-critical
threads in their code, be sure to document this (together with a suggestion to
attach only QueueHandlers to your loggers) for the benefit of other
developers who will use your code.
The second part of the solution is QueueListener, which has been
designed as the counterpart to QueueHandler. A
QueueListener is very simple: it’s passed a queue and some handlers,
and it fires up an internal thread which listens to its queue for LogRecords
sent from QueueHandlers (or any other source of LogRecords, for that
matter). The LogRecords are removed from the queue and passed to the
handlers for processing.
The advantage of having a separate QueueListener class is that you
can use the same instance to service multiple QueueHandlers. This is more
resource-friendly than, say, having threaded versions of the existing handler
classes, which would eat up one thread per handler for no particular benefit.
An example of using these two classes follows (imports omitted):
que = queue.Queue(-1) # no limit on size
queue_handler = QueueHandler(que)
handler = logging.StreamHandler()
listener = QueueListener(que, handler)
root = logging.getLogger()
root.addHandler(queue_handler)
formatter = logging.Formatter('%(threadName)s: %(message)s')
handler.setFormatter(formatter)
listener.start()
# The log output will display the thread which generated
# the event (the main thread) rather than the internal
# thread which monitors the internal queue. This is what
# you want to happen.
root.warning('Look out!')
listener.stop()
which, when run, will produce:
MainThread:Lookout!
Sending and receiving logging events across a network¶
Let’s say you want to send logging events across a network, and handle them at
the receiving end. A simple way of doing this is attaching a
SocketHandler instance to the root logger at the sending end:
import logging, logging.handlers
rootLogger = logging.getLogger('')
rootLogger.setLevel(logging.DEBUG)
socketHandler = logging.handlers.SocketHandler('localhost',
logging.handlers.DEFAULT_TCP_LOGGING_PORT)
# don't bother with a formatter, since a socket handler sends the event as
# an unformatted pickle
rootLogger.addHandler(socketHandler)
# Now, we can log to the root logger, or any other logger. First the root...
logging.info('Jackdaws love my big sphinx of quartz.')
# Now, define a couple of other loggers which might represent areas in your
# application:
logger1 = logging.getLogger('myapp.area1')
logger2 = logging.getLogger('myapp.area2')
logger1.debug('Quick zephyrs blow, vexing daft Jim.')
logger1.info('How quickly daft jumping zebras vex.')
logger2.warning('Jail zesty vixen who grabbed pay from quack.')
logger2.error('The five boxing wizards jump quickly.')
At the receiving end, you can set up a receiver using the socketserver
module. Here is a basic working example:
import pickle
import logging
import logging.handlers
import socketserver
import struct
class LogRecordStreamHandler(socketserver.StreamRequestHandler):
"""Handler for a streaming logging request.
This basically logs the record using whatever logging policy is
configured locally.
"""
def handle(self):
"""
Handle multiple requests - each expected to be a 4-byte length,
followed by the LogRecord in pickle format. Logs the record
according to whatever policy is configured locally.
"""
while True:
chunk = self.connection.recv(4)
if len(chunk) < 4:
break
slen = struct.unpack('>L', chunk)[0]
chunk = self.connection.recv(slen)
while len(chunk) < slen:
chunk = chunk + self.connection.recv(slen - len(chunk))
obj = self.unPickle(chunk)
record = logging.makeLogRecord(obj)
self.handleLogRecord(record)
def unPickle(self, data):
return pickle.loads(data)
def handleLogRecord(self, record):
# if a name is specified, we use the named logger rather than the one
# implied by the record.
if self.server.logname is not None:
name = self.server.logname
else:
name = record.name
logger = logging.getLogger(name)
# N.B. EVERY record gets logged. This is because Logger.handle
# is normally called AFTER logger-level filtering. If you want
# to do filtering, do it at the client end to save wasting
# cycles and network bandwidth!
logger.handle(record)
class LogRecordSocketReceiver(socketserver.ThreadingTCPServer):
"""
Simple TCP socket-based logging receiver suitable for testing.
"""
allow_reuse_address = 1
def __init__(self, host='localhost',
port=logging.handlers.DEFAULT_TCP_LOGGING_PORT,
handler=LogRecordStreamHandler):
socketserver.ThreadingTCPServer.__init__(self, (host, port), handler)
self.abort = 0
self.timeout = 1
self.logname = None
def serve_until_stopped(self):
import select
abort = 0
while not abort:
rd, wr, ex = select.select([self.socket.fileno()],
[], [],
self.timeout)
if rd:
self.handle_request()
abort = self.abort
def main():
logging.basicConfig(
format='%(relativeCreated)5d %(name)-15s %(levelname)-8s %(message)s')
tcpserver = LogRecordSocketReceiver()
print('About to start TCP server...')
tcpserver.serve_until_stopped()
if __name__ == '__main__':
main()
First run the server, and then the client. On the client side, nothing is
printed on the console; on the server side, you should see something like:
Note that there are some security issues with pickle in some scenarios. If
these affect you, you can use an alternative serialization scheme by overriding
the makePickle() method and implementing your alternative there, as
well as adapting the above script to use your alternative serialization.
Adding contextual information to your logging output¶
Sometimes you want logging output to contain contextual information in
addition to the parameters passed to the logging call. For example, in a
networked application, it may be desirable to log client-specific information
in the log (e.g. remote client’s username, or IP address). Although you could
use the extra parameter to achieve this, it’s not always convenient to pass
the information in this way. While it might be tempting to create
Logger instances on a per-connection basis, this is not a good idea
because these instances are not garbage collected. While this is not a problem
in practice, when the number of Logger instances is dependent on the
level of granularity you want to use in logging an application, it could
be hard to manage if the number of Logger instances becomes
effectively unbounded.
Using LoggerAdapters to impart contextual information¶
An easy way in which you can pass contextual information to be output along
with logging event information is to use the LoggerAdapter class.
This class is designed to look like a Logger, so that you can call
debug(), info(), warning(), error(),
exception(), critical() and log(). These methods have the
same signatures as their counterparts in Logger, so you can use the
two types of instances interchangeably.
When you create an instance of LoggerAdapter, you pass it a
Logger instance and a dict-like object which contains your contextual
information. When you call one of the logging methods on an instance of
LoggerAdapter, it delegates the call to the underlying instance of
Logger passed to its constructor, and arranges to pass the contextual
information in the delegated call. Here’s a snippet from the code of
LoggerAdapter:
The process() method of LoggerAdapter is where the contextual
information is added to the logging output. It’s passed the message and
keyword arguments of the logging call, and it passes back (potentially)
modified versions of these to use in the call to the underlying logger. The
default implementation of this method leaves the message alone, but inserts
an ‘extra’ key in the keyword argument whose value is the dict-like object
passed to the constructor. Of course, if you had passed an ‘extra’ keyword
argument in the call to the adapter, it will be silently overwritten.
The advantage of using ‘extra’ is that the values in the dict-like object are
merged into the LogRecord instance’s __dict__, allowing you to use
customized strings with your Formatter instances which know about
the keys of the dict-like object. If you need a different method, e.g. if you
want to prepend or append the contextual information to the message string,
you just need to subclass LoggerAdapter and override process()
to do what you need. Here’s an example script which uses this class, which
also illustrates what dict-like behaviour is needed from an arbitrary
‘dict-like’ object for use in the constructor:
import logging
class ConnInfo:
"""
An example class which shows how an arbitrary class can be used as
the 'extra' context information repository passed to a LoggerAdapter.
"""
def __getitem__(self, name):
"""
To allow this instance to look like a dict.
"""
from random import choice
if name == 'ip':
result = choice(['127.0.0.1', '192.168.0.1'])
elif name == 'user':
result = choice(['jim', 'fred', 'sheila'])
else:
result = self.__dict__.get(name, '?')
return result
def __iter__(self):
"""
To allow iteration over keys, which will be merged into
the LogRecord dict before formatting and output.
"""
keys = ['ip', 'user']
keys.extend(self.__dict__.keys())
return keys.__iter__()
if __name__ == '__main__':
from random import choice
levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL)
a1 = logging.LoggerAdapter(logging.getLogger('a.b.c'),
{ 'ip' : '123.231.231.123', 'user' : 'sheila' })
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s')
a1.debug('A debug message')
a1.info('An info message with %s', 'some parameters')
a2 = logging.LoggerAdapter(logging.getLogger('d.e.f'), ConnInfo())
for x in range(10):
lvl = choice(levels)
lvlname = logging.getLevelName(lvl)
a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters')
When this script is run, the output should look something like this:
You can also add contextual information to log output using a user-defined
Filter. Filter instances are allowed to modify the LogRecords
passed to them, including adding additional attributes which can then be output
using a suitable format string, or if needed a custom Formatter.
For example in a web application, the request being processed (or at least,
the interesting parts of it) can be stored in a threadlocal
(threading.local) variable, and then accessed from a Filter to
add, say, information from the request - say, the remote IP address and remote
user’s username - to the LogRecord, using the attribute names ‘ip’ and
‘user’ as in the LoggerAdapter example above. In that case, the same format
string can be used to get similar output to that shown above. Here’s an example
script:
import logging
from random import choice
class ContextFilter(logging.Filter):
"""
This is a filter which injects contextual information into the log.
Rather than use actual contextual information, we just use random
data in this demo.
"""
USERS = ['jim', 'fred', 'sheila']
IPS = ['123.231.231.123', '127.0.0.1', '192.168.0.1']
def filter(self, record):
record.ip = choice(ContextFilter.IPS)
record.user = choice(ContextFilter.USERS)
return True
if __name__ == '__main__':
levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL)
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s')
a1 = logging.getLogger('a.b.c')
a2 = logging.getLogger('d.e.f')
f = ContextFilter()
a1.addFilter(f)
a2.addFilter(f)
a1.debug('A debug message')
a1.info('An info message with %s', 'some parameters')
for x in range(10):
lvl = choice(levels)
lvlname = logging.getLevelName(lvl)
a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters')
Although logging is thread-safe, and logging to a single file from multiple
threads in a single process is supported, logging to a single file from
multiple processes is not supported, because there is no standard way to
serialize access to a single file across multiple processes in Python. If you
need to log to a single file from multiple processes, one way of doing this is
to have all the processes log to a SocketHandler, and have a separate
process which implements a socket server which reads from the socket and logs
to file. (If you prefer, you can dedicate one thread in one of the existing
processes to perform this function.) The following section documents this
approach in more detail and includes a working socket receiver which can be
used as a starting point for you to adapt in your own applications.
If you are using a recent version of Python which includes the
multiprocessing module, you could write your own handler which uses the
Lock class from this module to serialize access to the file from
your processes. The existing FileHandler and subclasses do not make
use of multiprocessing at present, though they may do so in the future.
Note that at present, the multiprocessing module does not provide
working lock functionality on all platforms (see
http://bugs.python.org/issue3770).
Alternatively, you can use a Queue and a QueueHandler to send
all logging events to one of the processes in your multi-process application.
The following example script demonstrates how you can do this; in the example
a separate listener process listens for events sent by other processes and logs
them according to its own logging configuration. Although the example only
demonstrates one way of doing it (for example, you may want to use a listener
thread rather than a separate listener process – the implementation would be
analogous) it does allow for completely different logging configurations for
the listener and the other processes in your application, and can be used as
the basis for code meeting your own specific requirements:
# You'll need these imports in your own code
import logging
import logging.handlers
import multiprocessing
# Next two import lines for this demo only
from random import choice, random
import time
#
# Because you'll want to define the logging configurations for listener and workers, the
# listener and worker process functions take a configurer parameter which is a callable
# for configuring logging for that process. These functions are also passed the queue,
# which they use for communication.
#
# In practice, you can configure the listener however you want, but note that in this
# simple example, the listener does not apply level or filter logic to received records.
# In practice, you would probably want to do this logic in the worker processes, to avoid
# sending events which would be filtered out between processes.
#
# The size of the rotated files is made small so you can see the results easily.
def listener_configurer():
root = logging.getLogger()
h = logging.handlers.RotatingFileHandler('mptest.log', 'a', 300, 10)
f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s')
h.setFormatter(f)
root.addHandler(h)
# This is the listener process top-level loop: wait for logging events
# (LogRecords)on the queue and handle them, quit when you get a None for a
# LogRecord.
def listener_process(queue, configurer):
configurer()
while True:
try:
record = queue.get()
if record is None: # We send this as a sentinel to tell the listener to quit.
break
logger = logging.getLogger(record.name)
logger.handle(record) # No level or filter logic applied - just do it!
except (KeyboardInterrupt, SystemExit):
raise
except:
import sys, traceback
print >> sys.stderr, 'Whoops! Problem:'
traceback.print_exc(file=sys.stderr)
# Arrays used for random selections in this demo
LEVELS = [logging.DEBUG, logging.INFO, logging.WARNING,
logging.ERROR, logging.CRITICAL]
LOGGERS = ['a.b.c', 'd.e.f']
MESSAGES = [
'Random message #1',
'Random message #2',
'Random message #3',
]
# The worker configuration is done at the start of the worker process run.
# Note that on Windows you can't rely on fork semantics, so each process
# will run the logging configuration code when it starts.
def worker_configurer(queue):
h = logging.handlers.QueueHandler(queue) # Just the one handler needed
root = logging.getLogger()
root.addHandler(h)
root.setLevel(logging.DEBUG) # send all messages, for demo; no other level or filter logic applied.
# This is the worker process top-level loop, which just logs ten events with
# random intervening delays before terminating.
# The print messages are just so you know it's doing something!
def worker_process(queue, configurer):
configurer(queue)
name = multiprocessing.current_process().name
print('Worker started: %s' % name)
for i in range(10):
time.sleep(random())
logger = logging.getLogger(choice(LOGGERS))
level = choice(LEVELS)
message = choice(MESSAGES)
logger.log(level, message)
print('Worker finished: %s' % name)
# Here's where the demo gets orchestrated. Create the queue, create and start
# the listener, create ten workers and start them, wait for them to finish,
# then send a None to the queue to tell the listener to finish.
def main():
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process,
args=(queue, listener_configurer))
listener.start()
workers = []
for i in range(10):
worker = multiprocessing.Process(target=worker_process,
args=(queue, worker_configurer))
workers.append(worker)
worker.start()
for w in workers:
w.join()
queue.put_nowait(None)
listener.join()
if __name__ == '__main__':
main()
A variant of the above script keeps the logging in the main process, in a
separate thread:
import logging
import logging.config
import logging.handlers
from multiprocessing import Process, Queue
import random
import threading
import time
def logger_thread(q):
while True:
record = q.get()
if record is None:
break
logger = logging.getLogger(record.name)
logger.handle(record)
def worker_process(q):
qh = logging.handlers.QueueHandler(q)
root = logging.getLogger()
root.setLevel(logging.DEBUG)
root.addHandler(qh)
levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
logging.CRITICAL]
loggers = ['foo', 'foo.bar', 'foo.bar.baz',
'spam', 'spam.ham', 'spam.ham.eggs']
for i in range(100):
lvl = random.choice(levels)
logger = logging.getLogger(random.choice(loggers))
logger.log(lvl, 'Message no. %d', i)
if __name__ == '__main__':
q = Queue()
d = {
'version': 1,
'formatters': {
'detailed': {
'class': 'logging.Formatter',
'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
}
},
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'level': 'INFO',
},
'file': {
'class': 'logging.FileHandler',
'filename': 'mplog.log',
'mode': 'w',
'formatter': 'detailed',
},
'foofile': {
'class': 'logging.FileHandler',
'filename': 'mplog-foo.log',
'mode': 'w',
'formatter': 'detailed',
},
'errors': {
'class': 'logging.FileHandler',
'filename': 'mplog-errors.log',
'mode': 'w',
'level': 'ERROR',
'formatter': 'detailed',
},
},
'loggers': {
'foo': {
'handlers' : ['foofile']
}
},
'root': {
'level': 'DEBUG',
'handlers': ['console', 'file', 'errors']
},
}
workers = []
for i in range(5):
wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,))
workers.append(wp)
wp.start()
logging.config.dictConfig(d)
lp = threading.Thread(target=logger_thread, args=(q,))
lp.start()
# At this point, the main process could do some useful work of its own
# Once it's done that, it can wait for the workers to terminate...
for wp in workers:
wp.join()
# And now tell the logging thread to finish up, too
q.put(None)
lp.join()
This variant shows how you can e.g. apply configuration for particular loggers
- e.g. the foo logger has a special handler which stores all events in the
foo subsystem in a file mplog-foo.log. This will be used by the logging
machinery in the main process (even though the logging events are generated in
the worker processes) to direct the messages to the appropriate destinations.
Sometimes you want to let a log file grow to a certain size, then open a new
file and log to that. You may want to keep a certain number of these files, and
when that many files have been created, rotate the files so that the number of
files and the size of the files both remain bounded. For this usage pattern, the
logging package provides a RotatingFileHandler:
import glob
import logging
import logging.handlers
LOG_FILENAME = 'logging_rotatingfile_example.out'
# Set up a specific logger with our desired output level
my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)
# Add the log message handler to the logger
handler = logging.handlers.RotatingFileHandler(
LOG_FILENAME, maxBytes=20, backupCount=5)
my_logger.addHandler(handler)
# Log some messages
for i in range(20):
my_logger.debug('i = %d' % i)
# See what files are created
logfiles = glob.glob('%s*' % LOG_FILENAME)
for filename in logfiles:
print(filename)
The result should be 6 separate files, each with part of the log history for the
application:
The most current file is always logging_rotatingfile_example.out,
and each time it reaches the size limit it is renamed with the suffix
.1. Each of the existing backup files is renamed to increment the suffix
(.1 becomes .2, etc.) and the .6 file is erased.
Obviously this example sets the log length much much too small as an extreme
example. You would want to set maxBytes to an appropriate value.
You can use a QueueHandler subclass to send messages to other kinds
of queues, for example a ZeroMQ ‘publish’ socket. In the example below,the
socket is created separately and passed to the handler (as its ‘queue’):
import zmq # using pyzmq, the Python binding for ZeroMQ
import json # for serializing records portably
ctx = zmq.Context()
sock = zmq.Socket(ctx, zmq.PUB) # or zmq.PUSH, or other suitable value
sock.bind('tcp://*:5556') # or wherever
class ZeroMQSocketHandler(QueueHandler):
def enqueue(self, record):
data = json.dumps(record.__dict__)
self.queue.send(data)
handler = ZeroMQSocketHandler(sock)
Of course there are other ways of organizing this, for example passing in the
data needed by the handler to create the socket:
This document is an introductory tutorial to using regular expressions in Python
with the re module. It provides a gentler introduction than the
corresponding section in the Library Reference.
Regular expressions (called REs, or regexes, or regex patterns) are essentially
a tiny, highly specialized programming language embedded inside Python and made
available through the re module. Using this little language, you specify
the rules for the set of possible strings that you want to match; this set might
contain English sentences, or e-mail addresses, or TeX commands, or anything you
like. You can then ask questions such as “Does this string match the pattern?”,
or “Is there a match for the pattern anywhere in this string?”. You can also
use REs to modify a string or to split it apart in various ways.
Regular expression patterns are compiled into a series of bytecodes which are
then executed by a matching engine written in C. For advanced use, it may be
necessary to pay careful attention to how the engine will execute a given RE,
and write the RE in a certain way in order to produce bytecode that runs faster.
Optimization isn’t covered in this document, because it requires that you have a
good understanding of the matching engine’s internals.
The regular expression language is relatively small and restricted, so not all
possible string processing tasks can be done using regular expressions. There
are also tasks that can be done with regular expressions, but the expressions
turn out to be very complicated. In these cases, you may be better off writing
Python code to do the processing; while Python code will be slower than an
elaborate regular expression, it will also probably be more understandable.
We’ll start by learning about the simplest possible regular expressions. Since
regular expressions are used to operate on strings, we’ll begin with the most
common task: matching characters.
For a detailed explanation of the computer science underlying regular
expressions (deterministic and non-deterministic finite automata), you can refer
to almost any textbook on writing compilers.
Most letters and characters will simply match themselves. For example, the
regular expression test will match the string test exactly. (You can
enable a case-insensitive mode that would let this RE match Test or TEST
as well; more about this later.)
There are exceptions to this rule; some characters are special
metacharacters, and don’t match themselves. Instead, they signal that
some out-of-the-ordinary thing should be matched, or they affect other portions
of the RE by repeating them or changing their meaning. Much of this document is
devoted to discussing various metacharacters and what they do.
Here’s a complete list of the metacharacters; their meanings will be discussed
in the rest of this HOWTO.
. ^ $ * + ? { } [ ] \ | ( )
The first metacharacters we’ll look at are [ and ]. They’re used for
specifying a character class, which is a set of characters that you wish to
match. Characters can be listed individually, or a range of characters can be
indicated by giving two characters and separating them by a '-'. For
example, [abc] will match any of the characters a, b, or c; this
is the same as [a-c], which uses a range to express the same set of
characters. If you wanted to match only lowercase letters, your RE would be
[a-z].
Metacharacters are not active inside classes. For example, [akm$] will
match any of the characters 'a', 'k', 'm', or '$'; '$' is
usually a metacharacter, but inside a character class it’s stripped of its
special nature.
You can match the characters not listed within the class by complementing
the set. This is indicated by including a '^' as the first character of the
class; '^' outside a character class will simply match the '^'
character. For example, [^5] will match any character except '5'.
Perhaps the most important metacharacter is the backslash, \. As in Python
string literals, the backslash can be followed by various characters to signal
various special sequences. It’s also used to escape all the metacharacters so
you can still match them in patterns; for example, if you need to match a [
or \, you can precede them with a backslash to remove their special
meaning: \[ or \\.
Some of the special sequences beginning with '\' represent predefined sets
of characters that are often useful, such as the set of digits, the set of
letters, or the set of anything that isn’t whitespace. The following predefined
special sequences are a subset of those available. The equivalent classes are
for bytes patterns. For a complete list of sequences and expanded class
definitions for Unicode string patterns, see the last part of
Regular Expression Syntax.
\d
Matches any decimal digit; this is equivalent to the class [0-9].
\D
Matches any non-digit character; this is equivalent to the class [^0-9].
\s
Matches any whitespace character; this is equivalent to the class [\t\n\r\f\v].
\S
Matches any non-whitespace character; this is equivalent to the class [^\t\n\r\f\v].
\w
Matches any alphanumeric character; this is equivalent to the class
[a-zA-Z0-9_].
\W
Matches any non-alphanumeric character; this is equivalent to the class
[^a-zA-Z0-9_].
These sequences can be included inside a character class. For example,
[\s,.] is a character class that will match any whitespace character, or
',' or '.'.
The final metacharacter in this section is .. It matches anything except a
newline character, and there’s an alternate mode (re.DOTALL) where it will
match even a newline. '.' is often used where you want to match “any
character”.
Being able to match varying sets of characters is the first thing regular
expressions can do that isn’t already possible with the methods available on
strings. However, if that was the only additional capability of regexes, they
wouldn’t be much of an advance. Another capability is that you can specify that
portions of the RE must be repeated a certain number of times.
The first metacharacter for repeating things that we’ll look at is *. *
doesn’t match the literal character *; instead, it specifies that the
previous character can be matched zero or more times, instead of exactly once.
For example, ca*t will match ct (0 a characters), cat (1 a),
caaat (3 a characters), and so forth. The RE engine has various
internal limitations stemming from the size of C’s int type that will
prevent it from matching over 2 billion a characters; you probably don’t
have enough memory to construct a string that large, so you shouldn’t run into
that limit.
Repetitions such as * are greedy; when repeating a RE, the matching
engine will try to repeat it as many times as possible. If later portions of the
pattern don’t match, the matching engine will then back up and try again with
few repetitions.
A step-by-step example will make this more obvious. Let’s consider the
expression a[bcd]*b. This matches the letter 'a', zero or more letters
from the class [bcd], and finally ends with a 'b'. Now imagine matching
this RE against the string abcbd.
Step
Matched
Explanation
1
a
The a in the RE matches.
2
abcbd
The engine matches [bcd]*,
going as far as it can, which
is to the end of the string.
3
Failure
The engine tries to match
b, but the current position
is at the end of the string, so
it fails.
4
abcb
Back up, so that [bcd]*
matches one less character.
5
Failure
Try b again, but the
current position is at the last
character, which is a 'd'.
6
abc
Back up again, so that
[bcd]* is only matching
bc.
6
abcb
Try b again. This time
the character at the
current position is 'b', so
it succeeds.
The end of the RE has now been reached, and it has matched abcb. This
demonstrates how the matching engine goes as far as it can at first, and if no
match is found it will then progressively back up and retry the rest of the RE
again and again. It will back up until it has tried zero matches for
[bcd]*, and if that subsequently fails, the engine will conclude that the
string doesn’t match the RE at all.
Another repeating metacharacter is +, which matches one or more times. Pay
careful attention to the difference between * and +; * matches
zero or more times, so whatever’s being repeated may not be present at all,
while + requires at least one occurrence. To use a similar example,
ca+t will match cat (1 a), caaat (3 a‘s), but won’t match
ct.
There are two more repeating qualifiers. The question mark character, ?,
matches either once or zero times; you can think of it as marking something as
being optional. For example, home-?brew matches either homebrew or
home-brew.
The most complicated repeated qualifier is {m,n}, where m and n are
decimal integers. This qualifier means there must be at least m repetitions,
and at most n. For example, a/{1,3}b will match a/b, a//b, and
a///b. It won’t match ab, which has no slashes, or a////b, which
has four.
You can omit either m or n; in that case, a reasonable value is assumed for
the missing value. Omitting m is interpreted as a lower limit of 0, while
omitting n results in an upper bound of infinity — actually, the upper bound
is the 2-billion limit mentioned earlier, but that might as well be infinity.
Readers of a reductionist bent may notice that the three other qualifiers can
all be expressed using this notation. {0,} is the same as *, {1,}
is equivalent to +, and {0,1} is the same as ?. It’s better to use
*, +, or ? when you can, simply because they’re shorter and easier
to read.
Now that we’ve looked at some simple regular expressions, how do we actually use
them in Python? The re module provides an interface to the regular
expression engine, allowing you to compile REs into objects and then perform
matches with them.
Regular expressions are compiled into pattern objects, which have
methods for various operations such as searching for pattern matches or
performing string substitutions.
>>> import re
>>> p = re.compile('ab*')
>>> p
<_sre.SRE_Pattern object at 0x...>
re.compile() also accepts an optional flags argument, used to enable
various special features and syntax variations. We’ll go over the available
settings later, but for now a single example will do:
>>> p = re.compile('ab*', re.IGNORECASE)
The RE is passed to re.compile() as a string. REs are handled as strings
because regular expressions aren’t part of the core Python language, and no
special syntax was created for expressing them. (There are applications that
don’t need REs at all, so there’s no need to bloat the language specification by
including them.) Instead, the re module is simply a C extension module
included with Python, just like the socket or zlib modules.
Putting REs in strings keeps the Python language simpler, but has one
disadvantage which is the topic of the next section.
As stated earlier, regular expressions use the backslash character ('\') to
indicate special forms or to allow special characters to be used without
invoking their special meaning. This conflicts with Python’s usage of the same
character for the same purpose in string literals.
Let’s say you want to write a RE that matches the string \section, which
might be found in a LaTeX file. To figure out what to write in the program
code, start with the desired string to be matched. Next, you must escape any
backslashes and other metacharacters by preceding them with a backslash,
resulting in the string \\section. The resulting string that must be passed
to re.compile() must be \\section. However, to express this as a
Python string literal, both backslashes must be escaped again.
In short, to match a literal backslash, one has to write '\\\\' as the RE
string, because the regular expression must be \\, and each backslash must
be expressed as \\ inside a regular Python string literal. In REs that
feature backslashes repeatedly, this leads to lots of repeated backslashes and
makes the resulting strings difficult to understand.
The solution is to use Python’s raw string notation for regular expressions;
backslashes are not handled in any special way in a string literal prefixed with
'r', so r"\n" is a two-character string containing '\' and 'n',
while "\n" is a one-character string containing a newline. Regular
expressions will often be written in Python code using this raw string notation.
Once you have an object representing a compiled regular expression, what do you
do with it? Pattern objects have several methods and attributes.
Only the most significant ones will be covered here; consult the re docs
for a complete listing.
Method/Attribute
Purpose
match()
Determine if the RE matches at the beginning
of the string.
search()
Scan through a string, looking for any
location where this RE matches.
findall()
Find all substrings where the RE matches, and
returns them as a list.
finditer()
Find all substrings where the RE matches, and
returns them as an iterator.
match() and search() return None if no match can be found. If
they’re successful, a MatchObject instance is returned, containing
information about the match: where it starts and ends, the substring it matched,
and more.
You can learn about this by interactively experimenting with the re
module. If you have tkinter available, you may also want to look at
Tools/demo/redemo.py, a demonstration program included with the
Python distribution. It allows you to enter REs and strings, and displays
whether the RE matches or fails. redemo.py can be quite useful when
trying to debug a complicated RE. Phil Schwartz’s Kodos is also an interactive tool for developing and
testing RE patterns.
This HOWTO uses the standard Python interpreter for its examples. First, run the
Python interpreter, import the re module, and compile a RE:
>>> import re
>>> p = re.compile('[a-z]+')
>>> p
<_sre.SRE_Pattern object at 0x...>
Now, you can try matching various strings against the RE [a-z]+. An empty
string shouldn’t match at all, since + means ‘one or more repetitions’.
match() should return None in this case, which will cause the
interpreter to print no output. You can explicitly print the result of
match() to make this clear.
>>>p.match("")>>>print(p.match(""))None
Now, let’s try it on a string that it should match, such as tempo. In this
case, match() will return a MatchObject, so you should store the
result in a variable for later use.
>>> m = p.match('tempo')
>>> m
<_sre.SRE_Match object at 0x...>
Now you can query the MatchObject for information about the matching
string. MatchObject instances also have several methods and
attributes; the most important ones are:
Method/Attribute
Purpose
group()
Return the string matched by the RE
start()
Return the starting position of the match
end()
Return the ending position of the match
span()
Return a tuple containing the (start, end)
positions of the match
Trying these methods will soon clarify their meaning:
group() returns the substring that was matched by the RE. start()
and end() return the starting and ending index of the match. span()
returns both start and end indexes in a single tuple. Since the match()
method only checks if the RE matches at the start of a string, start()
will always be zero. However, the search() method of patterns
scans through the string, so the match may not start at zero in that
case.
findall() has to create the entire list before it can be returned as the
result. The finditer() method returns a sequence of MatchObject
instances as an iterator:
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable_iterator object at 0x...>
>>> for match in iterator:
... print(match.span())
...
(0, 2)
(22, 24)
(29, 31)
You don’t have to create a pattern object and call its methods; the
re module also provides top-level functions called match(),
search(), findall(), sub(), and so forth. These functions
take the same arguments as the corresponding pattern method, with
the RE string added as the first argument, and still return either None or a
MatchObject instance.
>>> print(re.match(r'From\s+', 'Fromage amk'))
None
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
<_sre.SRE_Match object at 0x...>
Under the hood, these functions simply create a pattern object for you
and call the appropriate method on it. They also store the compiled object in a
cache, so future calls using the same RE are faster.
Should you use these module-level functions, or should you get the
pattern and call its methods yourself? That choice depends on how
frequently the RE will be used, and on your personal coding style. If the RE is
being used at only one point in the code, then the module functions are probably
more convenient. If a program contains a lot of regular expressions, or re-uses
the same ones in several locations, then it might be worthwhile to collect all
the definitions in one place, in a section of code that compiles all the REs
ahead of time. To take an example from the standard library, here’s an extract
from the now deprecated xmllib.py:
Compilation flags let you modify some aspects of how regular expressions work.
Flags are available in the re module under two names, a long name such as
IGNORECASE and a short, one-letter form such as I. (If you’re
familiar with Perl’s pattern modifiers, the one-letter forms use the same
letters; the short form of re.VERBOSE is re.X, for example.)
Multiple flags can be specified by bitwise OR-ing them; re.I|re.M sets
both the I and M flags, for example.
Here’s a table of the available flags, followed by a more detailed explanation
of each one.
Flag
Meaning
DOTALL, S
Make . match any character, including
newlines
IGNORECASE, I
Do case-insensitive matches
LOCALE, L
Do a locale-aware match
MULTILINE, M
Multi-line matching, affecting ^ and
$
VERBOSE, X
Enable verbose REs, which can be organized
more cleanly and understandably.
ASCII, A
Makes several escapes like \w, \b,
\s and \d match only on ASCII
characters with the respective property.
I
IGNORECASE
Perform case-insensitive matching; character class and literal strings will
match letters by ignoring case. For example, [A-Z] will match lowercase
letters, too, and Spam will match Spam, spam, or spAM. This
lowercasing doesn’t take the current locale into account; it will if you also
set the LOCALE flag.
L
LOCALE
Make \w, \W, \b, and \B, dependent on the current locale.
Locales are a feature of the C library intended to help in writing programs that
take account of language differences. For example, if you’re processing French
text, you’d want to be able to write \w+ to match words, but \w only
matches the character class [A-Za-z]; it won’t match 'é' or 'ç'. If
your system is configured properly and a French locale is selected, certain C
functions will tell the program that 'é' should also be considered a letter.
Setting the LOCALE flag when compiling a regular expression will cause
the resulting compiled object to use these C functions for \w; this is
slower, but also enables \w+ to match French words as you’d expect.
M
MULTILINE
(^ and $ haven’t been explained yet; they’ll be introduced in section
More Metacharacters.)
Usually ^ matches only at the beginning of the string, and $ matches
only at the end of the string and immediately before the newline (if any) at the
end of the string. When this flag is specified, ^ matches at the beginning
of the string and at the beginning of each line within the string, immediately
following each newline. Similarly, the $ metacharacter matches either at
the end of the string and at the end of each line (immediately preceding each
newline).
S
DOTALL
Makes the '.' special character match any character at all, including a
newline; without this flag, '.' will match anything except a newline.
A
ASCII
Make \w, \W, \b, \B, \s and \S perform ASCII-only
matching instead of full Unicode matching. This is only meaningful for
Unicode patterns, and is ignored for byte patterns.
X
VERBOSE
This flag allows you to write regular expressions that are more readable by
granting you more flexibility in how you can format them. When this flag has
been specified, whitespace within the RE string is ignored, except when the
whitespace is in a character class or preceded by an unescaped backslash; this
lets you organize and indent the RE more clearly. This flag also lets you put
comments within a RE that will be ignored by the engine; comments are marked by
a '#' that’s neither in a character class or preceded by an unescaped
backslash.
For example, here’s a RE that uses re.VERBOSE; see how much easier it
is to read?
charref = re.compile(r"""
&[#] # Start of a numeric entity reference
(
0[0-7]+ # Octal form
| [0-9]+ # Decimal form
| x[0-9a-fA-F]+ # Hexadecimal form
)
; # Trailing semicolon
""", re.VERBOSE)
Without the verbose setting, the RE would look like this:
In the above example, Python’s automatic concatenation of string literals has
been used to break up the RE into smaller pieces, but it’s still more difficult
to understand than the version using re.VERBOSE.
So far we’ve only covered a part of the features of regular expressions. In
this section, we’ll cover some new metacharacters, and how to use groups to
retrieve portions of the text that was matched.
There are some metacharacters that we haven’t covered yet. Most of them will be
covered in this section.
Some of the remaining metacharacters to be discussed are zero-width
assertions. They don’t cause the engine to advance through the string;
instead, they consume no characters at all, and simply succeed or fail. For
example, \b is an assertion that the current position is located at a word
boundary; the position isn’t changed by the \b at all. This means that
zero-width assertions should never be repeated, because if they match once at a
given location, they can obviously be matched an infinite number of times.
|
Alternation, or the “or” operator. If A and B are regular expressions,
A|B will match any string that matches either A or B. | has very
low precedence in order to make it work reasonably when you’re alternating
multi-character strings. Crow|Servo will match either Crow or Servo,
not Cro, a 'w' or an 'S', and ervo.
To match a literal '|', use \|, or enclose it inside a character class,
as in [|].
^
Matches at the beginning of lines. Unless the MULTILINE flag has been
set, this will only match at the beginning of the string. In MULTILINE
mode, this also matches immediately after each newline within the string.
For example, if you wish to match the word From only at the beginning of a
line, the RE to use is ^From.
>>> print(re.search('^From', 'From Here to Eternity'))
<_sre.SRE_Match object at 0x...>
>>> print(re.search('^From', 'Reciting From Memory'))
None
$
Matches at the end of a line, which is defined as either the end of the string,
or any location followed by a newline character.
>>> print(re.search('}$', '{block}'))
<_sre.SRE_Match object at 0x...>
>>> print(re.search('}$', '{block} '))
None
>>> print(re.search('}$', '{block}\n'))
<_sre.SRE_Match object at 0x...>
To match a literal '$', use \$ or enclose it inside a character class,
as in [$].
\A
Matches only at the start of the string. When not in MULTILINE mode,
\A and ^ are effectively the same. In MULTILINE mode, they’re
different: \A still matches only at the beginning of the string, but ^
may match at any location inside the string that follows a newline character.
\Z
Matches only at the end of the string.
\b
Word boundary. This is a zero-width assertion that matches only at the
beginning or end of a word. A word is defined as a sequence of alphanumeric
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric character.
The following example matches class only when it’s a complete word; it won’t
match when it’s contained inside another word.
>>> p = re.compile(r'\bclass\b')
>>> print(p.search('no class at all'))
<_sre.SRE_Match object at 0x...>
>>> print(p.search('the declassified algorithm'))
None
>>> print(p.search('one subclass is'))
None
There are two subtleties you should remember when using this special sequence.
First, this is the worst collision between Python’s string literals and regular
expression sequences. In Python’s string literals, \b is the backspace
character, ASCII value 8. If you’re not using raw strings, then Python will
convert the \b to a backspace, and your RE won’t match as you expect it to.
The following example looks the same as our previous RE, but omits the 'r'
in front of the RE string.
>>> p = re.compile('\bclass\b')
>>> print(p.search('no class at all'))
None
>>> print(p.search('\b' + 'class' + '\b') )
<_sre.SRE_Match object at 0x...>
Second, inside a character class, where there’s no use for this assertion,
\b represents the backspace character, for compatibility with Python’s
string literals.
\B
Another zero-width assertion, this is the opposite of \b, only matching when
the current position is not at a word boundary.
Frequently you need to obtain more information than just whether the RE matched
or not. Regular expressions are often used to dissect strings by writing a RE
divided into several subgroups which match different components of interest.
For example, an RFC-822 header line is divided into a header name and a value,
separated by a ':', like this:
From: author@example.com
User-Agent: Thunderbird 1.5.0.9 (X11/20061227)
MIME-Version: 1.0
To: editor@example.com
This can be handled by writing a regular expression which matches an entire
header line, and has one group which matches the header name, and another group
which matches the header’s value.
Groups are marked by the '(', ')' metacharacters. '(' and ')'
have much the same meaning as they do in mathematical expressions; they group
together the expressions contained inside them, and you can repeat the contents
of a group with a repeating qualifier, such as *, +, ?, or
{m,n}. For example, (ab)* will match zero or more repetitions of
ab.
>>> p = re.compile('(ab)*')
>>> print(p.match('ababababab').span())
(0, 10)
Groups indicated with '(', ')' also capture the starting and ending
index of the text that they match; this can be retrieved by passing an argument
to group(), start(), end(), and span(). Groups are
numbered starting with 0. Group 0 is always present; it’s the whole RE, so
MatchObject methods all have group 0 as their default argument. Later
we’ll see how to express groups that don’t capture the span of text that they
match.
>>> p = re.compile('(a)b')
>>> m = p.match('ab')
>>> m.group()
'ab'
>>> m.group(0)
'ab'
Subgroups are numbered from left to right, from 1 upward. Groups can be nested;
to determine the number, just count the opening parenthesis characters, going
from left to right.
>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
group() can be passed multiple group numbers at a time, in which case it
will return a tuple containing the corresponding values for those groups.
>>> m.group(2,1,2)
('b', 'abc', 'b')
The groups() method returns a tuple containing the strings for all the
subgroups, from 1 up to however many there are.
>>> m.groups()
('abc', 'b')
Backreferences in a pattern allow you to specify that the contents of an earlier
capturing group must also be found at the current location in the string. For
example, \1 will succeed if the exact contents of group 1 can be found at
the current position, and fails otherwise. Remember that Python’s string
literals also use a backslash followed by numbers to allow including arbitrary
characters in a string, so be sure to use a raw string when incorporating
backreferences in a RE.
For example, the following RE detects doubled words in a string.
>>> p = re.compile(r'(\b\w+)\s+\1')
>>> p.search('Paris in the the spring').group()
'the the'
Backreferences like this aren’t often useful for just searching through a string
— there are few text formats which repeat data in this way — but you’ll soon
find out that they’re very useful when performing string substitutions.
Elaborate REs may use many groups, both to capture substrings of interest, and
to group and structure the RE itself. In complex REs, it becomes difficult to
keep track of the group numbers. There are two features which help with this
problem. Both of them use a common syntax for regular expression extensions, so
we’ll look at that first.
Perl 5 added several additional features to standard regular expressions, and
the Python re module supports most of them. It would have been
difficult to choose new single-keystroke metacharacters or new special sequences
beginning with \ to represent the new features without making Perl’s regular
expressions confusingly different from standard REs. If you chose & as a
new metacharacter, for example, old expressions would be assuming that & was
a regular character and wouldn’t have escaped it by writing \& or [&].
The solution chosen by the Perl developers was to use (?...) as the
extension syntax. ? immediately after a parenthesis was a syntax error
because the ? would have nothing to repeat, so this didn’t introduce any
compatibility problems. The characters immediately after the ? indicate
what extension is being used, so (?=foo) is one thing (a positive lookahead
assertion) and (?:foo) is something else (a non-capturing group containing
the subexpression foo).
Python adds an extension syntax to Perl’s extension syntax. If the first
character after the question mark is a P, you know that it’s an extension
that’s specific to Python. Currently there are two such extensions:
(?P<name>...) defines a named group, and (?P=name) is a backreference to
a named group. If future versions of Perl 5 add similar features using a
different syntax, the re module will be changed to support the new
syntax, while preserving the Python-specific syntax for compatibility’s sake.
Now that we’ve looked at the general extension syntax, we can return to the
features that simplify working with groups in complex REs. Since groups are
numbered from left to right and a complex expression may use many groups, it can
become difficult to keep track of the correct numbering. Modifying such a
complex RE is annoying, too: insert a new group near the beginning and you
change the numbers of everything that follows it.
Sometimes you’ll want to use a group to collect a part of a regular expression,
but aren’t interested in retrieving the group’s contents. You can make this fact
explicit by using a non-capturing group: (?:...), where you can replace the
... with any other regular expression.
Except for the fact that you can’t retrieve the contents of what the group
matched, a non-capturing group behaves exactly the same as a capturing group;
you can put anything inside it, repeat it with a repetition metacharacter such
as *, and nest it within other groups (capturing or non-capturing).
(?:...) is particularly useful when modifying an existing pattern, since you
can add new groups without changing how all the other groups are numbered. It
should be mentioned that there’s no performance difference in searching between
capturing and non-capturing groups; neither form is any faster than the other.
A more significant feature is named groups: instead of referring to them by
numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions:
(?P<name>...). name is, obviously, the name of the group. Named groups
also behave exactly like capturing groups, and additionally associate a name
with a group. The MatchObject methods that deal with capturing groups
all accept either integers that refer to the group by number or strings that
contain the desired group’s name. Named groups are still given numbers, so you
can retrieve information about a group in two ways:
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'
Named groups are handy because they let you use easily-remembered names, instead
of having to remember numbers. Here’s an example RE from the imaplib
module:
It’s obviously much easier to retrieve m.group('zonem'), instead of having
to remember to retrieve group 9.
The syntax for backreferences in an expression such as (...)\1 refers to the
number of the group. There’s naturally a variant that uses the group name
instead of the number. This is another Python extension: (?P=name) indicates
that the contents of the group called name should again be matched at the
current point. The regular expression for finding doubled words,
(\b\w+)\s+\1 can also be written as (?P<word>\b\w+)\s+(?P=word):
>>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
>>> p.search('Paris in the the spring').group()
'the the'
Another zero-width assertion is the lookahead assertion. Lookahead assertions
are available in both positive and negative form, and look like this:
(?=...)
Positive lookahead assertion. This succeeds if the contained regular
expression, represented here by ..., successfully matches at the current
location, and fails otherwise. But, once the contained expression has been
tried, the matching engine doesn’t advance at all; the rest of the pattern is
tried right where the assertion started.
(?!...)
Negative lookahead assertion. This is the opposite of the positive assertion;
it succeeds if the contained expression doesn’t match at the current position
in the string.
To make this concrete, let’s look at a case where a lookahead is useful.
Consider a simple pattern to match a filename and split it apart into a base
name and an extension, separated by a .. For example, in news.rc,
news is the base name, and rc is the filename’s extension.
The pattern to match this is quite simple:
.*[.].*$
Notice that the . needs to be treated specially because it’s a
metacharacter; I’ve put it inside a character class. Also notice the trailing
$; this is added to ensure that all the rest of the string must be included
in the extension. This regular expression matches foo.bar and
autoexec.bat and sendmail.cf and printers.conf.
Now, consider complicating the problem a bit; what if you want to match
filenames where the extension is not bat? Some incorrect attempts:
.*[.][^b].*$ The first attempt above tries to exclude bat by requiring
that the first character of the extension is not a b. This is wrong,
because the pattern also doesn’t match foo.bar.
.*[.]([^b]..|.[^a].|..[^t])$
The expression gets messier when you try to patch up the first solution by
requiring one of the following cases to match: the first character of the
extension isn’t b; the second character isn’t a; or the third character
isn’t t. This accepts foo.bar and rejects autoexec.bat, but it
requires a three-letter extension and won’t accept a filename with a two-letter
extension such as sendmail.cf. We’ll complicate the pattern again in an
effort to fix it.
.*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$
In the third attempt, the second and third letters are all made optional in
order to allow matching extensions shorter than three characters, such as
sendmail.cf.
The pattern’s getting really complicated now, which makes it hard to read and
understand. Worse, if the problem changes and you want to exclude both bat
and exe as extensions, the pattern would get even more complicated and
confusing.
A negative lookahead cuts through all this confusion:
.*[.](?!bat$).*$ The negative lookahead means: if the expression bat
doesn’t match at this point, try the rest of the pattern; if bat$ does
match, the whole pattern will fail. The trailing $ is required to ensure
that something like sample.batch, where the extension only starts with
bat, will be allowed.
Excluding another filename extension is now easy; simply add it as an
alternative inside the assertion. The following pattern excludes filenames that
end in either bat or exe:
Up to this point, we’ve simply performed searches against a static string.
Regular expressions are also commonly used to modify strings in various ways,
using the following pattern methods:
Method/Attribute
Purpose
split()
Split the string into a list, splitting it
wherever the RE matches
sub()
Find all substrings where the RE matches, and
replace them with a different string
subn()
Does the same thing as sub(), but
returns the new string and the number of
replacements
The split() method of a pattern splits a string apart
wherever the RE matches, returning a list of the pieces. It’s similar to the
split() method of strings but provides much more generality in the
delimiters that you can split by; split() only supports splitting by
whitespace or by a fixed string. As you’d expect, there’s a module-level
re.split() function, too.
.split(string[, maxsplit=0])
Split string by the matches of the regular expression. If capturing
parentheses are used in the RE, then their contents will also be returned as
part of the resulting list. If maxsplit is nonzero, at most maxsplit splits
are performed.
You can limit the number of splits made, by passing a value for maxsplit.
When maxsplit is nonzero, at most maxsplit splits will be made, and the
remainder of the string is returned as the final element of the list. In the
following example, the delimiter is any sequence of non-alphanumeric characters.
>>> p = re.compile(r'\W+')
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
>>> p.split('This is a test, short and sweet, of split().', 3)
['This', 'is', 'a', 'test, short and sweet, of split().']
Sometimes you’re not only interested in what the text between delimiters is, but
also need to know what the delimiter was. If capturing parentheses are used in
the RE, then their values are also returned as part of the list. Compare the
following calls:
>>> p = re.compile(r'\W+')
>>> p2 = re.compile(r'(\W+)')
>>> p.split('This... is a test.')
['This', 'is', 'a', 'test', '']
>>> p2.split('This... is a test.')
['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', '']
The module-level function re.split() adds the RE to be used as the first
argument, but is otherwise the same.
Another common task is to find all the matches for a pattern, and replace them
with a different string. The sub() method takes a replacement value,
which can be either a string or a function, and the string to be processed.
.sub(replacement, string[, count=0])
Returns the string obtained by replacing the leftmost non-overlapping
occurrences of the RE in string by the replacement replacement. If the
pattern isn’t found, string is returned unchanged.
The optional argument count is the maximum number of pattern occurrences to be
replaced; count must be a non-negative integer. The default value of 0 means
to replace all occurrences.
Here’s a simple example of using the sub() method. It replaces colour
names with the word colour:
>>> p = re.compile( '(blue|white|red)')
>>> p.sub( 'colour', 'blue socks and red shoes')
'colour socks and colour shoes'
>>> p.sub( 'colour', 'blue socks and red shoes', count=1)
'colour socks and red shoes'
The subn() method does the same work, but returns a 2-tuple containing the
new string value and the number of replacements that were performed:
>>> p = re.compile( '(blue|white|red)')
>>> p.subn( 'colour', 'blue socks and red shoes')
('colour socks and colour shoes', 2)
>>> p.subn( 'colour', 'no colours at all')
('no colours at all', 0)
Empty matches are replaced only when they’re not adjacent to a previous match.
>>> p = re.compile('x*')
>>> p.sub('-', 'abxd')
'-a-b-d-'
If replacement is a string, any backslash escapes in it are processed. That
is, \n is converted to a single newline character, \r is converted to a
carriage return, and so forth. Unknown escapes such as \j are left alone.
Backreferences, such as \6, are replaced with the substring matched by the
corresponding group in the RE. This lets you incorporate portions of the
original text in the resulting replacement string.
This example matches the word section followed by a string enclosed in
{, }, and changes section to subsection:
There’s also a syntax for referring to named groups as defined by the
(?P<name>...) syntax. \g<name> will use the substring matched by the
group named name, and \g<number> uses the corresponding group number.
\g<2> is therefore equivalent to \2, but isn’t ambiguous in a
replacement string such as \g<2>0. (\20 would be interpreted as a
reference to group 20, not a reference to group 2 followed by the literal
character '0'.) The following substitutions are all equivalent, but use all
three variations of the replacement string.
replacement can also be a function, which gives you even more control. If
replacement is a function, the function is called for every non-overlapping
occurrence of pattern. On each call, the function is passed a
MatchObject argument for the match and can use this information to
compute the desired replacement string and return it.
In the following example, the replacement function translates decimals into
hexadecimal:
>>> def hexrepl( match ):
... "Return the hex string for a decimal number"
... value = int( match.group() )
... return hex(value)
...
>>> p = re.compile(r'\d+')
>>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.')
'Call 0xffd2 for printing, 0xc000 for user code.'
When using the module-level re.sub() function, the pattern is passed as
the first argument. The pattern may be provided as an object or as a string; if
you need to specify regular expression flags, you must either use a
pattern object as the first parameter, or use embedded modifiers in the
pattern string, e.g. sub("(?i)b+","x","bbbbBBBB") returns 'xx'.
Regular expressions are a powerful tool for some applications, but in some ways
their behaviour isn’t intuitive and at times they don’t behave the way you may
expect them to. This section will point out some of the most common pitfalls.
Sometimes using the re module is a mistake. If you’re matching a fixed
string, or a single character class, and you’re not using any re features
such as the IGNORECASE flag, then the full power of regular expressions
may not be required. Strings have several methods for performing operations with
fixed strings and they’re usually much faster, because the implementation is a
single small C loop that’s been optimized for the purpose, instead of the large,
more generalized regular expression engine.
One example might be replacing a single fixed string with another one; for
example, you might replace word with deed. re.sub() seems like the
function to use for this, but consider the replace() method. Note that
replace() will also replace word inside words, turning swordfish
into sdeedfish, but the naive RE word would have done that, too. (To
avoid performing the substitution on parts of words, the pattern would have to
be \bword\b, in order to require that word have a word boundary on
either side. This takes the job beyond replace()‘s abilities.)
Another common task is deleting every occurrence of a single character from a
string or replacing it with another single character. You might do this with
something like re.sub('\n','',S), but translate() is capable of
doing both tasks and will be faster than any regular expression operation can
be.
In short, before turning to the re module, consider whether your problem
can be solved with a faster and simpler string method.
The match() function only checks if the RE matches at the beginning of the
string while search() will scan forward through the string for a match.
It’s important to keep this distinction in mind. Remember, match() will
only report a successful match which will start at 0; if the match wouldn’t
start at zero, match() will not report it.
Sometimes you’ll be tempted to keep using re.match(), and just add .*
to the front of your RE. Resist this temptation and use re.search()
instead. The regular expression compiler does some analysis of REs in order to
speed up the process of looking for a match. One such analysis figures out what
the first character of a match must be; for example, a pattern starting with
Crow must match starting with a 'C'. The analysis lets the engine
quickly scan through the string looking for the starting character, only trying
the full match if a 'C' is found.
Adding .* defeats this optimization, requiring scanning to the end of the
string and then backtracking to find a match for the rest of the RE. Use
re.search() instead.
When repeating a regular expression, as in a*, the resulting action is to
consume as much of the pattern as possible. This fact often bites you when
you’re trying to match a pair of balanced delimiters, such as the angle brackets
surrounding an HTML tag. The naive pattern for matching a single HTML tag
doesn’t work because of the greedy nature of .*.
The RE matches the '<' in <html>, and the .* consumes the rest of
the string. There’s still more left in the RE, though, and the > can’t
match at the end of the string, so the regular expression engine has to
backtrack character by character until it finds a match for the >. The
final match extends from the '<' in <html> to the '>' in
</title>, which isn’t what you want.
In this case, the solution is to use the non-greedy qualifiers *?, +?,
??, or {m,n}?, which match as little text as possible. In the above
example, the '>' is tried immediately after the first '<' matches, and
when it fails, the engine advances a character at a time, retrying the '>'
at every step. This produces just the right result:
>>> print(re.match('<.*?>', s).group())
<html>
(Note that parsing HTML or XML with regular expressions is painful.
Quick-and-dirty patterns will handle common cases, but HTML and XML have special
cases that will break the obvious regular expression; by the time you’ve written
a regular expression that handles all of the possible cases, the patterns will
be very complicated. Use an HTML or XML parser module for such tasks.)
By now you’ve probably noticed that regular expressions are a very compact
notation, but they’re not terribly readable. REs of moderate complexity can
become lengthy collections of backslashes, parentheses, and metacharacters,
making them difficult to read and understand.
For such REs, specifying the re.VERBOSE flag when compiling the regular
expression can be helpful, because it allows you to format the regular
expression more clearly.
The re.VERBOSE flag has several effects. Whitespace in the regular
expression that isn’t inside a character class is ignored. This means that an
expression such as dog|cat is equivalent to the less readable dog|cat,
but [ab] will still match the characters 'a', 'b', or a space. In
addition, you can also put comments inside a RE; comments extend from a #
character to the next newline. When used with triple-quoted strings, this
enables REs to be formatted more neatly:
pat = re.compile(r"""
\s* # Skip leading whitespace
(?P<header>[^:]+) # Header name
\s* : # Whitespace, and a colon
(?P<value>.*?) # The header's value -- *? used to
# lose the following trailing whitespace
\s*$ # Trailing whitespace to end-of-line
""", re.VERBOSE)
Regular expressions are a complicated topic. Did this document help you
understand them? Were there parts that were unclear, or Problems you
encountered that weren’t covered here? If so, please send suggestions for
improvements to the author.
The most complete book on regular expressions is almost certainly Jeffrey
Friedl’s Mastering Regular Expressions, published by O’Reilly. Unfortunately,
it exclusively concentrates on Perl and Java’s flavours of regular expressions,
and doesn’t contain any Python material at all, so it won’t be useful as a
reference for programming in Python. (The first edition covered Python’s
now-removed regex module, which won’t help you much.) Consider checking
it out from your library.
Sockets are used nearly everywhere, but are one of the most severely
misunderstood technologies around. This is a 10,000 foot overview of sockets.
It’s not really a tutorial - you’ll still have work to do in getting things
operational. It doesn’t cover the fine points (and there are a lot of them), but
I hope it will give you enough background to begin using them decently.
Sockets are used nearly everywhere, but are one of the most severely
misunderstood technologies around. This is a 10,000 foot overview of sockets.
It’s not really a tutorial - you’ll still have work to do in getting things
working. It doesn’t cover the fine points (and there are a lot of them), but I
hope it will give you enough background to begin using them decently.
I’m only going to talk about INET sockets, but they account for at least 99% of
the sockets in use. And I’ll only talk about STREAM sockets - unless you really
know what you’re doing (in which case this HOWTO isn’t for you!), you’ll get
better behavior and performance from a STREAM socket than anything else. I will
try to clear up the mystery of what a socket is, as well as some hints on how to
work with blocking and non-blocking sockets. But I’ll start by talking about
blocking sockets. You’ll need to know how they work before dealing with
non-blocking sockets.
Part of the trouble with understanding these things is that “socket” can mean a
number of subtly different things, depending on context. So first, let’s make a
distinction between a “client” socket - an endpoint of a conversation, and a
“server” socket, which is more like a switchboard operator. The client
application (your browser, for example) uses “client” sockets exclusively; the
web server it’s talking to uses both “server” sockets and “client” sockets.
Of the various forms of IPC,
sockets are by far the most popular. On any given platform, there are
likely to be other forms of IPC that are faster, but for
cross-platform communication, sockets are about the only game in town.
They were invented in Berkeley as part of the BSD flavor of Unix. They spread
like wildfire with the Internet. With good reason — the combination of sockets
with INET makes talking to arbitrary machines around the world unbelievably easy
(at least compared to other schemes).
Roughly speaking, when you clicked on the link that brought you to this page,
your browser did something like the following:
#create an INET, STREAMing sockets=socket.socket(socket.AF_INET,socket.SOCK_STREAM)#now connect to the web server on port 80# - the normal http ports.connect(("www.mcmillan-inc.com",80))
When the connect completes, the socket s can be used to send
in a request for the text of the page. The same socket will read the
reply, and then be destroyed. That’s right, destroyed. Client sockets
are normally only used for one exchange (or a small set of sequential
exchanges).
What happens in the web server is a bit more complex. First, the web server
creates a “server socket”:
#create an INET, STREAMing socketserversocket=socket.socket(socket.AF_INET,socket.SOCK_STREAM)#bind the socket to a public host,# and a well-known portserversocket.bind((socket.gethostname(),80))#become a server socketserversocket.listen(5)
A couple things to notice: we used socket.gethostname() so that the socket
would be visible to the outside world. If we had used s.bind(('',80)) or
s.bind(('localhost',80)) or s.bind(('127.0.0.1',80)) we would still
have a “server” socket, but one that was only visible within the same machine.
A second thing to note: low number ports are usually reserved for “well known”
services (HTTP, SNMP etc). If you’re playing around, use a nice high number (4
digits).
Finally, the argument to listen tells the socket library that we want it to
queue up as many as 5 connect requests (the normal max) before refusing outside
connections. If the rest of the code is written properly, that should be plenty.
Now that we have a “server” socket, listening on port 80, we can enter the
mainloop of the web server:
whileTrue:#accept connections from outside(clientsocket,address)=serversocket.accept()#now do something with the clientsocket#in this case, we'll pretend this is a threaded serverct=client_thread(clientsocket)ct.run()
There’s actually 3 general ways in which this loop could work - dispatching a
thread to handle clientsocket, create a new process to handle
clientsocket, or restructure this app to use non-blocking sockets, and
mulitplex between our “server” socket and any active clientsockets using
select. More about that later. The important thing to understand now is
this: this is all a “server” socket does. It doesn’t send any data. It doesn’t
receive any data. It just produces “client” sockets. Each clientsocket is
created in response to some other “client” socket doing a connect() to the
host and port we’re bound to. As soon as we’ve created that clientsocket, we
go back to listening for more connections. The two “clients” are free to chat it
up - they are using some dynamically allocated port which will be recycled when
the conversation ends.
If you need fast IPC between two processes on one machine, you should look into
whatever form of shared memory the platform offers. A simple protocol based
around shared memory and locks or semaphores is by far the fastest technique.
If you do decide to use sockets, bind the “server” socket to 'localhost'. On
most platforms, this will take a shortcut around a couple of layers of network
code and be quite a bit faster.
The first thing to note, is that the web browser’s “client” socket and the web
server’s “client” socket are identical beasts. That is, this is a “peer to peer”
conversation. Or to put it another way, as the designer, you will have to
decide what the rules of etiquette are for a conversation. Normally, the
connecting socket starts the conversation, by sending in a request, or
perhaps a signon. But that’s a design decision - it’s not a rule of sockets.
Now there are two sets of verbs to use for communication. You can use send
and recv, or you can transform your client socket into a file-like beast and
use read and write. The latter is the way Java presents its sockets.
I’m not going to talk about it here, except to warn you that you need to use
flush on sockets. These are buffered “files”, and a common mistake is to
write something, and then read for a reply. Without a flush in
there, you may wait forever for the reply, because the request may still be in
your output buffer.
Now we come the major stumbling block of sockets - send and recv operate
on the network buffers. They do not necessarily handle all the bytes you hand
them (or expect from them), because their major focus is handling the network
buffers. In general, they return when the associated network buffers have been
filled (send) or emptied (recv). They then tell you how many bytes they
handled. It is your responsibility to call them again until your message has
been completely dealt with.
When a recv returns 0 bytes, it means the other side has closed (or is in
the process of closing) the connection. You will not receive any more data on
this connection. Ever. You may be able to send data successfully; I’ll talk
about that some on the next page.
A protocol like HTTP uses a socket for only one transfer. The client sends a
request, then reads a reply. That’s it. The socket is discarded. This means that
a client can detect the end of the reply by receiving 0 bytes.
But if you plan to reuse your socket for further transfers, you need to realize
that there is no EOT on a socket. I repeat: if a socket
send or recv returns after handling 0 bytes, the connection has been
broken. If the connection has not been broken, you may wait on a recv
forever, because the socket will not tell you that there’s nothing more to
read (for now). Now if you think about that a bit, you’ll come to realize a
fundamental truth of sockets: messages must either be fixed length (yuck), or
be delimited (shrug), or indicate how long they are (much better), or end by
shutting down the connection. The choice is entirely yours, (but some ways are
righter than others).
Assuming you don’t want to end the connection, the simplest solution is a fixed
length message:
class mysocket:
"""demonstration class only
- coded for clarity, not efficiency
"""
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
def connect(self, host, port):
self.sock.connect((host, port))
def mysend(self, msg):
totalsent = 0
while totalsent < MSGLEN:
sent = self.sock.send(msg[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent = totalsent + sent
def myreceive(self):
msg = ''
while len(msg) < MSGLEN:
chunk = self.sock.recv(MSGLEN-len(msg))
if chunk == '':
raise RuntimeError("socket connection broken")
msg = msg + chunk
return msg
The sending code here is usable for almost any messaging scheme - in Python you
send strings, and you can use len() to determine its length (even if it has
embedded \0 characters). It’s mostly the receiving code that gets more
complex. (And in C, it’s not much worse, except you can’t use strlen if the
message has embedded \0s.)
The easiest enhancement is to make the first character of the message an
indicator of message type, and have the type determine the length. Now you have
two recvs - the first to get (at least) that first character so you can
look up the length, and the second in a loop to get the rest. If you decide to
go the delimited route, you’ll be receiving in some arbitrary chunk size, (4096
or 8192 is frequently a good match for network buffer sizes), and scanning what
you’ve received for a delimiter.
One complication to be aware of: if your conversational protocol allows multiple
messages to be sent back to back (without some kind of reply), and you pass
recv an arbitrary chunk size, you may end up reading the start of a
following message. You’ll need to put that aside and hold onto it, until it’s
needed.
Prefixing the message with it’s length (say, as 5 numeric characters) gets more
complex, because (believe it or not), you may not get all 5 characters in one
recv. In playing around, you’ll get away with it; but in high network loads,
your code will very quickly break unless you use two recv loops - the first
to determine the length, the second to get the data part of the message. Nasty.
This is also when you’ll discover that send does not always manage to get
rid of everything in one pass. And despite having read this, you will eventually
get bit by it!
In the interests of space, building your character, (and preserving my
competitive position), these enhancements are left as an exercise for the
reader. Lets move on to cleaning up.
It is perfectly possible to send binary data over a socket. The major problem is
that not all machines use the same formats for binary data. For example, a
Motorola chip will represent a 16 bit integer with the value 1 as the two hex
bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
Socket libraries have calls for converting 16 and 32 bit integers - ntohl,htonl,ntohs,htons where “n” means network and “h” means host, “s” means
short and “l” means long. Where network order is host order, these do
nothing, but where the machine is byte-reversed, these swap the bytes around
appropriately.
In these days of 32 bit machines, the ascii representation of binary data is
frequently smaller than the binary representation. That’s because a surprising
amount of the time, all those longs have the value 0, or maybe 1. The string “0”
would be two bytes, while binary is four. Of course, this doesn’t fit well with
fixed-length messages. Decisions, decisions.
Strictly speaking, you’re supposed to use shutdown on a socket before you
close it. The shutdown is an advisory to the socket at the other end.
Depending on the argument you pass it, it can mean “I’m not going to send
anymore, but I’ll still listen”, or “I’m not listening, good riddance!”. Most
socket libraries, however, are so used to programmers neglecting to use this
piece of etiquette that normally a close is the same as shutdown();close(). So in most situations, an explicit shutdown is not needed.
One way to use shutdown effectively is in an HTTP-like exchange. The client
sends a request and then does a shutdown(1). This tells the server “This
client is done sending, but can still receive.” The server can detect “EOF” by
a receive of 0 bytes. It can assume it has the complete request. The server
sends a reply. If the send completes successfully then, indeed, the client
was still receiving.
Python takes the automatic shutdown a step further, and says that when a socket
is garbage collected, it will automatically do a close if it’s needed. But
relying on this is a very bad habit. If your socket just disappears without
doing a close, the socket at the other end may hang indefinitely, thinking
you’re just being slow. Pleaseclose your sockets when you’re done.
Probably the worst thing about using blocking sockets is what happens when the
other side comes down hard (without doing a close). Your socket is likely to
hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
before giving up on a connection. If you’re using threads, the entire thread is
essentially dead. There’s not much you can do about it. As long as you aren’t
doing something dumb, like holding a lock while doing a blocking read, the
thread isn’t really consuming much in the way of resources. Do not try to kill
the thread - part of the reason that threads are more efficient than processes
is that they avoid the overhead associated with the automatic recycling of
resources. In other words, if you do manage to kill the thread, your whole
process is likely to be screwed up.
If you’ve understood the preceding, you already know most of what you need to
know about the mechanics of using sockets. You’ll still use the same calls, in
much the same ways. It’s just that, if you do it right, your app will be almost
inside-out.
In Python, you use socket.setblocking(0) to make it non-blocking. In C, it’s
more complex, (for one thing, you’ll need to choose between the BSD flavor
O_NONBLOCK and the almost indistinguishable Posix flavor O_NDELAY, which
is completely different from TCP_NODELAY), but it’s the exact same idea. You
do this after creating the socket, but before using it. (Actually, if you’re
nuts, you can switch back and forth.)
The major mechanical difference is that send, recv, connect and
accept can return without having done anything. You have (of course) a
number of choices. You can check return code and error codes and generally drive
yourself crazy. If you don’t believe me, try it sometime. Your app will grow
large, buggy and suck CPU. So let’s skip the brain-dead solutions and do it
right.
Use select.
In C, coding select is fairly complex. In Python, it’s a piece of cake, but
it’s close enough to the C version that if you understand select in Python,
you’ll have little trouble with it in C:
You pass select three lists: the first contains all sockets that you might
want to try reading; the second all the sockets you might want to try writing
to, and the last (normally left empty) those that you want to check for errors.
You should note that a socket can go into more than one list. The select
call is blocking, but you can give it a timeout. This is generally a sensible
thing to do - give it a nice long timeout (say a minute) unless you have good
reason to do otherwise.
In return, you will get three lists. They contain the sockets that are actually
readable, writable and in error. Each of these lists is a subset (possibly
empty) of the corresponding list you passed in.
If a socket is in the output readable list, you can be
as-close-to-certain-as-we-ever-get-in-this-business that a recv on that
socket will return something. Same idea for the writable list. You’ll be able
to send something. Maybe not all you want to, but something is better than
nothing. (Actually, any reasonably healthy socket will return as writable - it
just means outbound network buffer space is available.)
If you have a “server” socket, put it in the potential_readers list. If it comes
out in the readable list, your accept will (almost certainly) work. If you
have created a new socket to connect to someone else, put it in the
potential_writers list. If it shows up in the writable list, you have a decent
chance that it has connected.
One very nasty problem with select: if somewhere in those input lists of
sockets is one which has died a nasty death, the select will fail. You then
need to loop through every single damn socket in all those lists and do a
select([sock],[],[],0) until you find the bad one. That timeout of 0 means
it won’t take long, but it’s ugly.
Actually, select can be handy even with blocking sockets. It’s one way of
determining whether you will block - the socket returns as readable when there’s
something in the buffers. However, this still doesn’t help with the problem of
determining whether the other end is done, or just busy with something else.
Portability alert: On Unix, select works both with the sockets and
files. Don’t try this on Windows. On Windows, select works with sockets
only. Also note that in C, many of the more advanced socket options are done
differently on Windows. In fact, on Windows I usually use threads (which work
very, very well) with my sockets. Face it, if you want any kind of performance,
your code will look very different on Windows than on Unix.
There’s no question that the fastest sockets code uses non-blocking sockets and
select to multiplex them. You can put together something that will saturate a
LAN connection without putting any strain on the CPU. The trouble is that an app
written this way can’t do much of anything else - it needs to be ready to
shuffle bytes around at all times.
Assuming that your app is actually supposed to do something more than that,
threading is the optimal solution, (and using non-blocking sockets will be
faster than using blocking sockets). Unfortunately, threading support in Unixes
varies both in API and quality. So the normal Unix solution is to fork a
subprocess to deal with each connection. The overhead for this is significant
(and don’t do this on Windows - the overhead of process creation is enormous
there). It also means that unless each subprocess is completely independent,
you’ll need to use another form of IPC, say a pipe, or shared memory and
semaphores, to communicate between the parent and child processes.
Finally, remember that even though blocking sockets are somewhat slower than
non-blocking, in many cases they are the “right” solution. After all, if your
app is driven by the data it receives over a socket, there’s not much sense in
complicating the logic just so your app can wait on select instead of
recv.
Python lists have a built-in list.sort() method that modifies the list
in-place. There is also a sorted() built-in function that builds a new
sorted list from an iterable.
In this document, we explore the various techniques for sorting data using Python.
A simple ascending sort is very easy: just call the sorted() function. It
returns a new sorted list:
>>>sorted([5,2,3,1,4])[1,2,3,4,5]
You can also use the list.sort() method. It modifies the list
in-place (and returns None to avoid confusion). Usually it’s less convenient
than sorted() - but if you don’t need the original list, it’s slightly
more efficient.
>>>a=[5,2,3,1,4]>>>a.sort()>>>a[1,2,3,4,5]
Another difference is that the list.sort() method is only defined for
lists. In contrast, the sorted() function accepts any iterable.
Both list.sort() and sorted() have key parameter to specify a
function to be called on each list element prior to making comparisons.
For example, here’s a case-insensitive string comparison:
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
The value of the key parameter should be a function that takes a single argument
and returns a key to use for sorting purposes. This technique is fast because
the key function is called exactly once for each input record.
A common pattern is to sort complex objects using some of the object’s indices
as keys. For example:
The key-function patterns shown above are very common, so Python provides
convenience functions to make accessor functions easier and faster. The
operator module has itemgetter(),
attrgetter(), and an methodcaller() function.
Using those functions, the above examples become simpler and faster:
Both list.sort() and sorted() accept a reverse parameter with a
boolean value. This is using to flag descending sorts. For example, to get the
student data in reverse age order:
Notice how the two records for blue retain their original order so that
('blue',1) is guaranteed to precede ('blue',2).
This wonderful property lets you build complex sorts in a series of sorting
steps. For example, to sort the student data by descending grade and then
ascending age, do the age sort first and then sort again using grade:
>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
The Timsort algorithm used in Python
does multiple sorts efficiently because it can take advantage of any ordering
already present in a dataset.
This idiom is called Decorate-Sort-Undecorate after its three steps:
First, the initial list is decorated with new values that control the sort order.
Second, the decorated list is sorted.
Finally, the decorations are removed, creating a list that contains only the
initial values in the new order.
For example, to sort the student data by grade using the DSU approach:
>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
>>> decorated.sort()
>>> [student for grade, i, student in decorated] # undecorate
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
This idiom works because tuples are compared lexicographically; the first items
are compared; if they are the same then the second items are compared, and so
on.
It is not strictly necessary in all cases to include the index i in the
decorated list, but including it gives two benefits:
The sort is stable – if two items have the same key, their order will be
preserved in the sorted list.
The original items do not have to be comparable because the ordering of the
decorated tuples will be determined by at most the first two items. So for
example the original list could contain complex numbers which cannot be sorted
directly.
Another name for this idiom is
Schwartzian transform,
after Randal L. Schwartz, who popularized it among Perl programmers.
Now that Python sorting provides key-functions, this technique is not often needed.
Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
there was no sorted() builtin and list.sort() took no keyword
arguments. Instead, all of the Py2.x versions supported a cmp parameter to
handle user specified comparison functions.
In Py3.0, the cmp parameter was removed entirely (as part of a larger effort to
simplify and unify the language, eliminating the conflict between rich
comparisons and the __cmp__() magic method).
In Py2.x, sort allowed an optional function which can be called for doing the
comparisons. That function should take two arguments to be compared and then
return a negative value for less-than, return zero if they are equal, or return
a positive value for greater-than. For example, we can do:
When porting code from Python 2.x to 3.x, the situation can arise when you have
the user supplying a comparison function and you need to convert that to a key
function. The following wrapper makes that easy to do:
The reverse parameter still maintains sort stability (so that records with
equal keys retain the original order). Interestingly, that effect can be
simulated without the parameter by using the builtin reversed() function
twice:
The sort routines are guaranteed to use __lt__() when making comparisons
between two objects. So, it is easy to add a standard sort order to a class by
defining an __lt__() method:
Key functions need not depend directly on the objects being sorted. A key
function can also access external resources. For instance, if the student grades
are stored in a dictionary, they can be used to sort a separate list of student
names:
In 1968, the American Standard Code for Information Interchange, better known by
its acronym ASCII, was standardized. ASCII defined numeric codes for various
characters, with the numeric values running from 0 to 127. For example, the
lowercase letter ‘a’ is assigned 97 as its code value.
ASCII was an American-developed standard, so it only defined unaccented
characters. There was an ‘e’, but no ‘é’ or ‘Í’. This meant that languages
which required accented characters couldn’t be faithfully represented in ASCII.
(Actually the missing accents matter for English, too, which contains words such
as ‘naïve’ and ‘café’, and some publications have house styles which require
spellings such as ‘coöperate’.)
For a while people just wrote programs that didn’t display accents. I remember
looking at Apple ][ BASIC programs, published in French-language publications in
the mid-1980s, that had lines like these:
PRINT"FICHIER EST COMPLETE."PRINT"CARACTERE NON ACCEPTE."
Those messages should contain accents, and they just look wrong to someone who
can read French.
In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
hold values ranging from 0 to 255. ASCII codes only went up to 127, so some
machines assigned values between 128 and 255 to accented characters. Different
machines had different codes, however, which led to problems exchanging files.
Eventually various commonly used sets of values for the 128–255 range emerged.
Some were true standards, defined by the International Standards Organization,
and some were de facto conventions that were invented by one company or
another and managed to catch on.
255 characters aren’t very many. For example, you can’t fit both the accented
characters used in Western Europe and the Cyrillic alphabet used for Russian
into the 128–255 range because there are more than 127 such characters.
You could write files using different codes (all your Russian files in a coding
system called KOI8, all your French files in a different coding system called
Latin1), but what if you wanted to write a French document that quotes some
Russian text? In the 1980s people began to want to solve this problem, and the
Unicode standardization effort began.
Unicode started out using 16-bit characters instead of 8-bit characters. 16
bits means you have 2^16 = 65,536 distinct values available, making it possible
to represent many different characters from many different alphabets; an initial
goal was to have Unicode contain the alphabets for every single human language.
It turns out that even 16 bits isn’t enough to meet that goal, and the modern
Unicode specification uses a wider range of codes, 0 through 1,114,111 (0x10ffff
in base 16).
There’s a related ISO standard, ISO 10646. Unicode and ISO 10646 were
originally separate efforts, but the specifications were merged with the 1.1
revision of Unicode.
(This discussion of Unicode’s history is highly simplified. I don’t think the
average Python programmer needs to worry about the historical details; consult
the Unicode consortium site listed in the References for more information.)
A character is the smallest possible component of a text. ‘A’, ‘B’, ‘C’,
etc., are all different characters. So are ‘È’ and ‘Í’. Characters are
abstractions, and vary depending on the language or context you’re talking
about. For example, the symbol for ohms (Ω) is usually drawn much like the
capital letter omega (Ω) in the Greek alphabet (they may even be the same in
some fonts), but these are two different characters that have different
meanings.
The Unicode standard describes how characters are represented by code
points. A code point is an integer value, usually denoted in base 16. In the
standard, a code point is written using the notation U+12ca to mean the
character with value 0x12ca (4,810 decimal). The Unicode standard contains a lot
of tables listing characters and their corresponding code points:
Strictly, these definitions imply that it’s meaningless to say ‘this is
character U+12ca’. U+12ca is a code point, which represents some particular
character; in this case, it represents the character ‘ETHIOPIC SYLLABLE WI’. In
informal contexts, this distinction between code points and characters will
sometimes be forgotten.
A character is represented on a screen or on paper by a set of graphical
elements that’s called a glyph. The glyph for an uppercase A, for example,
is two diagonal strokes and a horizontal stroke, though the exact details will
depend on the font being used. Most Python code doesn’t need to worry about
glyphs; figuring out the correct glyph to display is generally the job of a GUI
toolkit or a terminal’s font renderer.
To summarize the previous section: a Unicode string is a sequence of code
points, which are numbers from 0 through 0x10ffff (1,114,111 decimal). This
sequence needs to be represented as a set of bytes (meaning, values
from 0 through 255) in memory. The rules for translating a Unicode string
into a sequence of bytes are called an encoding.
The first encoding you might think of is an array of 32-bit integers. In this
representation, the string “Python” would look like this:
This representation is straightforward but using it presents a number of
problems.
It’s not portable; different processors order the bytes differently.
It’s very wasteful of space. In most texts, the majority of the code points
are less than 127, or less than 255, so a lot of space is occupied by zero
bytes. The above string takes 24 bytes compared to the 6 bytes needed for an
ASCII representation. Increased RAM usage doesn’t matter too much (desktop
computers have megabytes of RAM, and strings aren’t usually that large), but
expanding our usage of disk and network bandwidth by a factor of 4 is
intolerable.
It’s not compatible with existing C functions such as strlen(), so a new
family of wide string functions would need to be used.
Many Internet standards are defined in terms of textual data, and can’t
handle content with embedded zero bytes.
Generally people don’t use this encoding, instead choosing other
encodings that are more efficient and convenient. UTF-8 is probably
the most commonly supported encoding; it will be discussed below.
Encodings don’t have to handle every possible Unicode character, and most
encodings don’t. The rules for converting a Unicode string into the ASCII
encoding, for example, are simple; for each code point:
If the code point is < 128, each byte is the same as the value of the code
point.
If the code point is 128 or greater, the Unicode string can’t be represented
in this encoding. (Python raises a UnicodeEncodeError exception in this
case.)
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points
0–255 are identical to the Latin-1 values, so converting to this encoding simply
requires converting code points to byte values; if a code point larger than 255
is encountered, the string can’t be encoded into Latin-1.
Encodings don’t have to be simple one-to-one mappings like Latin-1. Consider
IBM’s EBCDIC, which was used on IBM mainframes. Letter values weren’t in one
block: ‘a’ through ‘i’ had values from 129 to 137, but ‘j’ through ‘r’ were 145
through 153. If you wanted to use EBCDIC as an encoding, you’d probably use
some sort of lookup table to perform the conversion, but this is largely an
internal detail.
UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode
Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the
encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than
UTF-8.) UTF-8 uses the following rules:
If the code point is <128, it’s represented by the corresponding byte value.
If the code point is between 128 and 0x7ff, it’s turned into two byte values
between 128 and 255.
Code points >0x7ff are turned into three- or four-byte sequences, where each
byte of the sequence is between 128 and 255.
UTF-8 has several convenient properties:
It can handle any Unicode code point.
A Unicode string is turned into a string of bytes containing no embedded zero
bytes. This avoids byte-ordering issues, and means UTF-8 strings can be
processed by C functions such as strcpy() and sent through protocols that
can’t handle zero bytes.
A string of ASCII text is also valid UTF-8 text.
UTF-8 is fairly compact; the majority of code points are turned into two
bytes, and values less than 128 occupy only a single byte.
If bytes are corrupted or lost, it’s possible to determine the start of the
next UTF-8-encoded code point and resynchronize. It’s also unlikely that
random 8-bit data will look like valid UTF-8.
The Unicode Consortium site at <http://www.unicode.org> has character charts, a
glossary, and PDF versions of the Unicode specification. Be prepared for some
difficult reading. <http://www.unicode.org/history/> is a chronology of the
origin and development of Unicode.
Another good introductory article was written by Joel Spolsky
<http://www.joelonsoftware.com/articles/Unicode.html>.
If this introduction didn’t make things clear to you, you should try reading this
alternate article before continuing.
Since Python 3.0, the language features a str type that contain Unicode
characters, meaning any string created using "unicoderocks!", 'unicoderocks!', or the triple-quoted string syntax is stored as Unicode.
To insert a Unicode character that is not part ASCII, e.g., any letters with
accents, one can use escape sequences in their string literals as such:
>>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name
'\u0394'
>>> "\u0394" # Using a 16-bit hex value
'\u0394'
>>> "\U00000394" # Using a 32-bit hex value
'\u0394'
In addition, one can create a string using the decode() method of
bytes. This method takes an encoding, such as UTF-8, and, optionally,
an errors argument.
The errors argument specifies the response when the input string can’t be
converted according to the encoding’s rules. Legal values for this argument are
‘strict’ (raise a UnicodeDecodeError exception), ‘replace’ (use U+FFFD,
‘REPLACEMENT CHARACTER’), or ‘ignore’ (just leave the character out of the
Unicode result). The following examples show the differences:
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
>>> b'\x80abc'.decode("utf-8", "replace")
'?abc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
(In this code example, the Unicode replacement character has been replaced by
a question mark because it may not be displayed on some systems.)
Encodings are specified as strings containing the encoding’s name. Python 3.2
comes with roughly 100 different encodings; see the Python Library Reference at
Standard Encodings for a list. Some encodings have multiple names; for
example, ‘latin-1’, ‘iso_8859_1’ and ‘8859’ are all synonyms for the same
encoding.
One-character Unicode strings can also be created with the chr()
built-in function, which takes integers and returns a Unicode string of length 1
that contains the corresponding code point. The reverse operation is the
built-in ord() function that takes a one-character Unicode string and
returns the code point value:
Another important str method is .encode([encoding],[errors='strict']),
which returns a bytes representation of the Unicode string, encoded in the
requested encoding. The errors parameter is the same as the parameter of
the decode() method, with one additional possibility; as well as ‘strict’,
‘ignore’, and ‘replace’ (which in this case inserts a question mark instead of
the unencodable character), you can also pass ‘xmlcharrefreplace’ which uses
XML’s character references. The following example shows the different results:
>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd޴'
The low-level routines for registering and accessing the available encodings are
found in the codecs module. However, the encoding and decoding functions
returned by this module are usually more low-level than is comfortable, so I’m
not going to describe the codecs module here. If you need to implement a
completely new encoding, you’ll need to learn about the codecs module
interfaces, but implementing encodings is a specialized task that also won’t be
covered here. Consult the Python documentation to learn more about this module.
In Python source code, specific Unicode code points can be written using the
\u escape sequence, which is followed by four hex digits giving the code
point. The \U escape sequence is similar, but expects eight hex digits,
not four:
Using escape sequences for code points greater than 127 is fine in small doses,
but becomes an annoyance if you’re using many accented characters, as you would
in a program with messages in French or some other accent-using language. You
can also assemble strings using the chr() built-in function, but this is
even more tedious.
Ideally, you’d want to be able to write literals in your language’s natural
encoding. You could then edit Python source code with your favorite editor
which would display the accented characters naturally, and have the right
characters used at runtime.
Python supports writing source code in UTF-8 by default, but you can use almost
any encoding if you declare the encoding being used. This is done by including
a special comment as either the first or second line of the source file:
The syntax is inspired by Emacs’s notation for specifying variables local to a
file. Emacs supports many different variables, but Python only supports
‘coding’. The -*- symbols indicate to Emacs that the comment is special;
they have no significance to Python but are a convention. Python looks for
coding:name or coding=name in the comment.
If you don’t include such a comment, the default encoding used will be UTF-8 as
already mentioned.
The Unicode specification includes a database of information about code points.
For each code point that’s defined, the information includes the character’s
name, its category, the numeric value if applicable (Unicode has characters
representing the Roman numerals and fractions such as one-third and
four-fifths). There are also properties related to the code point’s use in
bidirectional text and other display-related properties.
The following program displays some information about several characters, and
prints the numeric value of one particular character:
import unicodedata
u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
for i, c in enumerate(u):
print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
print(unicodedata.name(c))
# Get numeric value of second character
print(unicodedata.numeric(u[1]))
The category codes are abbreviations describing the nature of the character.
These are grouped into categories such as “Letter”, “Number”, “Punctuation”, or
“Symbol”, which in turn are broken up into subcategories. To take the codes
from the above output, 'Ll' means ‘Letter, lowercase’, 'No' means
“Number, other”, 'Mn' is “Mark, nonspacing”, and 'So' is “Symbol,
other”. See
<http://www.unicode.org/reports/tr44/#General_Category_Values> for a
list of category codes.
Marc-André Lemburg gave a presentation at EuroPython 2002 titled “Python and
Unicode”. A PDF version of his slides is available at
<http://downloads.egenix.com/python/Unicode-EPC2002-Talk.pdf>, and is an
excellent overview of the design of Python’s Unicode features (based on Python
2, where the Unicode string type is called unicode and literals start with
u).
Once you’ve written some code that works with Unicode data, the next problem is
input/output. How do you get Unicode strings into your program, and how do you
convert Unicode into a form suitable for storage or transmission?
It’s possible that you may not need to do anything depending on your input
sources and output destinations; you should check whether the libraries used in
your application support Unicode natively. XML parsers often return Unicode
data, for example. Many relational databases also support Unicode-valued
columns and can return Unicode values from an SQL query.
Unicode data is usually converted to a particular encoding before it gets
written to disk or sent over a socket. It’s possible to do all the work
yourself: open a file, read an 8-bit byte string from it, and convert the string
with str(bytes,encoding). However, the manual approach is not recommended.
One problem is the multi-byte nature of encodings; one Unicode character can be
represented by several bytes. If you want to read the file in arbitrary-sized
chunks (say, 1K or 4K), you need to write error-handling code to catch the case
where only part of the bytes encoding a single Unicode character are read at the
end of a chunk. One solution would be to read the entire file into memory and
then perform the decoding, but that prevents you from working with files that
are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
(More, really, since for at least a moment you’d need to have both the encoded
string and its Unicode version in memory.)
The solution would be to use the low-level decoding interface to catch the case
of partial coding sequences. The work of implementing this has already been
done for you: the built-in open() function can return a file-like object
that assumes the file’s contents are in a specified encoding and accepts Unicode
parameters for methods such as .read() and .write(). This works through
open()‘s encoding and errors parameters which are interpreted just
like those in string objects’ encode() and decode() methods.
Reading Unicode from a file is therefore simple:
with open('unicode.rst', encoding='utf-8') as f:
for line in f:
print(repr(line))
It’s also possible to open files in update mode, allowing both reading and
writing:
with open('test', encoding='utf-8', mode='w+') as f:
f.write('\u4500 blah blah blah\n')
f.seek(0)
print(repr(f.readline()[:1]))
The Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
written as the first character of a file in order to assist with autodetection
of the file’s byte ordering. Some encodings, such as UTF-16, expect a BOM to be
present at the start of a file; when such an encoding is used, the BOM will be
automatically written as the first character and will be silently dropped when
the file is read. There are variants of these encodings, such as ‘utf-16-le’
and ‘utf-16-be’ for little-endian and big-endian encodings, that specify one
particular byte ordering and don’t skip the BOM.
In some areas, it is also convention to use a “BOM” at the start of UTF-8
encoded files; the name is misleading since UTF-8 is not byte-order dependent.
The mark simply announces that the file is encoded in UTF-8. Use the
‘utf-8-sig’ codec to automatically skip the mark if present for reading such
files.
Most of the operating systems in common use today support filenames that contain
arbitrary Unicode characters. Usually this is implemented by converting the
Unicode string into some encoding that varies depending on the system. For
example, Mac OS X uses UTF-8 while Windows uses a configurable encoding; on
Windows, Python uses the name “mbcs” to refer to whatever the currently
configured encoding is. On Unix systems, there will only be a filesystem
encoding if you’ve set the LANG or LC_CTYPE environment variables; if
you haven’t, the default encoding is ASCII.
The sys.getfilesystemencoding() function returns the encoding to use on
your current system, in case you want to do the encoding manually, but there’s
not much reason to bother. When opening a file for reading or writing, you can
usually just provide the Unicode string as the filename, and it will be
automatically converted to the right encoding for you:
filename = 'filename\u4500abc'
with open(filename, 'w') as f:
f.write('blah\n')
Functions in the os module such as os.stat() will also accept Unicode
filenames.
Function os.listdir(), which returns filenames, raises an issue: should it return
the Unicode version of filenames, or should it return byte strings containing
the encoded versions? os.listdir() will do both, depending on whether you
provided the directory path as a byte string or a Unicode string. If you pass a
Unicode string as the path, filenames will be decoded using the filesystem’s
encoding and a list of Unicode strings will be returned, while passing a byte
path will return the byte string versions of the filenames. For example,
assuming the default filesystem encoding is UTF-8, running the following
program:
fn = 'filename\u4500abc'
f = open(fn, 'w')
f.close()
import os
print(os.listdir(b'.'))
print(os.listdir('.'))
The first list contains UTF-8-encoded filenames, and the second list contains
the Unicode versions.
Note that in most occasions, the Unicode APIs should be used. The bytes APIs
should only be used on systems where undecodable file names can be present,
i.e. Unix systems.
This section provides some suggestions on writing software that deals with
Unicode.
The most important tip is:
Software should only work with Unicode strings internally, converting to a
particular encoding on output.
If you attempt to write processing functions that accept both Unicode and byte
strings, you will find your program vulnerable to bugs wherever you combine the
two different kinds of strings. There is no automatic encoding or decoding if
you do e.g. str+bytes, a TypeError is raised for this expression.
When using data coming from a web browser or some other untrusted source, a
common technique is to check for illegal characters in a string before using the
string in a generated command line or storing it in a database. If you’re doing
this, be careful to check the string once it’s in the form that will be used or
stored; it’s possible for encodings to be used to disguise characters. This is
especially true if the input data also specifies the encoding; many encodings
leave the commonly checked-for characters alone, but Python includes some
encodings such as 'base64' that modify every single character.
For example, let’s say you have a content management system that takes a Unicode
filename, and you want to disallow paths with a ‘/’ character. You might write
this code:
defread_file(filename,encoding):if'/'infilename:raiseValueError("'/' not allowed in filenames")unicode_name=filename.decode(encoding)withopen(unicode_name,'r')asf:# ... return contents of file ...
However, if an attacker could specify the 'base64' encoding, they could pass
'L2V0Yy9wYXNzd2Q=', which is the base-64 encoded form of the string
'/etc/passwd', to read a system file. The above code looks for '/'
characters in the encoded form and misses the dangerous character in the
resulting decoded form.
Thanks to the following people who have noted errors or offered suggestions on
this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
HOWTO Fetch Internet Resources Using The urllib Package¶
A tutorial on Basic Authentication, with examples in Python.
urllib.request is a Python module for fetching URLs
(Uniform Resource Locators). It offers a very simple interface, in the form of
the urlopen function. This is capable of fetching URLs using a variety of
different protocols. It also offers a slightly more complex interface for
handling common situations - like basic authentication, cookies, proxies and so
on. These are provided by objects called handlers and openers.
urllib.request supports fetching URLs for many “URL schemes” (identified by the string
before the ”:” in URL - for example “ftp” is the URL scheme of
“ftp://python.org/”) using their associated network protocols (e.g. FTP, HTTP).
This tutorial focuses on the most common case, HTTP.
For straightforward situations urlopen is very easy to use. But as soon as you
encounter errors or non-trivial cases when opening HTTP URLs, you will need some
understanding of the HyperText Transfer Protocol. The most comprehensive and
authoritative reference to HTTP is RFC 2616. This is a technical document and
not intended to be easy to read. This HOWTO aims to illustrate using urllib,
with enough detail about HTTP to help you through. It is not intended to replace
the urllib.request docs, but is supplementary to them.
The simplest way to use urllib.request is as follows:
import urllib.request
response = urllib.request.urlopen('http://python.org/')
html = response.read()
Many uses of urllib will be that simple (note that instead of an ‘http:’ URL we
could have used an URL starting with ‘ftp:’, ‘file:’, etc.). However, it’s the
purpose of this tutorial to explain the more complicated cases, concentrating on
HTTP.
HTTP is based on requests and responses - the client makes requests and servers
send responses. urllib.request mirrors this with a Request object which represents
the HTTP request you are making. In its simplest form you create a Request
object that specifies the URL you want to fetch. Calling urlopen with this
Request object returns a response object for the URL requested. This response is
a file-like object, which means you can for example call .read() on the
response:
In the case of HTTP, there are two extra things that Request objects allow you
to do: First, you can pass data to be sent to the server. Second, you can pass
extra information (“metadata”) about the data or the about request itself, to
the server - this information is sent as HTTP “headers”. Let’s look at each of
these in turn.
Sometimes you want to send data to a URL (often the URL will refer to a CGI
(Common Gateway Interface) script [1] or other web application). With HTTP,
this is often done using what’s known as a POST request. This is often what
your browser does when you submit a HTML form that you filled in on the web. Not
all POSTs have to come from forms: you can use a POST to transmit arbitrary data
to your own application. In the common case of HTML forms, the data needs to be
encoded in a standard way, and then passed to the Request object as the data
argument. The encoding is done using a function from the urllib.parse
library.
Note that other encodings are sometimes required (e.g. for file upload from HTML
forms - see HTML Specification, Form Submission for more
details).
If you do not pass the data argument, urllib uses a GET request. One
way in which GET and POST requests differ is that POST requests often have
“side-effects”: they change the state of the system in some way (for example by
placing an order with the website for a hundredweight of tinned spam to be
delivered to your door). Though the HTTP standard makes it clear that POSTs are
intended to always cause side-effects, and GET requests never to cause
side-effects, nothing prevents a GET request from having side-effects, nor a
POST requests from having no side-effects. Data can also be passed in an HTTP
GET request by encoding it in the URL itself.
We’ll discuss here one particular HTTP header, to illustrate how to add headers
to your HTTP request.
Some websites [2] dislike being browsed by programs, or send different versions
to different browsers [3] . By default urllib identifies itself as
Python-urllib/x.y (where x and y are the major and minor version
numbers of the Python release,
e.g. Python-urllib/2.5), which may confuse the site, or just plain
not work. The way a browser identifies itself is through the
User-Agent header [4]. When you create a Request object you can
pass a dictionary of headers in. The following example makes the same
request as above, but identifies itself as a version of Internet
Explorer [5].
urlopen raises URLError when it cannot handle a response (though as
usual with Python APIs, built-in exceptions such as ValueError,
TypeError etc. may also be raised).
HTTPError is the subclass of URLError raised in the specific case of
HTTP URLs.
The exception classes are exported from the urllib.error module.
Often, URLError is raised because there is no network connection (no route to
the specified server), or the specified server doesn’t exist. In this case, the
exception raised will have a ‘reason’ attribute, which is a tuple containing an
error code and a text error message.
Every HTTP response from the server contains a numeric “status code”. Sometimes
the status code indicates that the server is unable to fulfil the request. The
default handlers will handle some of these responses for you (for example, if
the response is a “redirection” that requests the client fetch the document from
a different URL, urllib will handle that for you). For those it can’t handle,
urlopen will raise an HTTPError. Typical errors include ‘404’ (page not
found), ‘403’ (request forbidden), and ‘401’ (authentication required).
See section 10 of RFC 2616 for a reference on all the HTTP error codes.
The HTTPError instance raised will have an integer ‘code’ attribute, which
corresponds to the error sent by the server.
Because the default handlers handle redirects (codes in the 300 range), and
codes in the 100-299 range indicate success, you will usually only see error
codes in the 400-599 range.
http.server.BaseHTTPRequestHandler.responses is a useful dictionary of
response codes in that shows all the response codes used by RFC 2616. The
dictionary is reproduced here for convenience
# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),
200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),
300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),
400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this server.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),
500: ('Internal Server Error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service Unavailable',
'The server cannot process the request due to a high load'),
504: ('Gateway Timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
}
When an error is raised the server responds by returning an HTTP error code
and an error page. You can use the HTTPError instance as a response on the
page returned. This means that as well as the code attribute, it also has read,
geturl, and info, methods as returned by the urllib.response module:
>>> req = urllib.request.Request('http://www.python.org/fish.html')
>>> try:
>>> urllib.request.urlopen(req)
>>> except urllib.error.HTTPError as e:
>>> print(e.code)
>>> print(e.read())
>>>
404
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<?xml-stylesheet href="./css/ht2html.css"
type="text/css"?>
<html><head><title>Error 404: File Not Found</title>
...... etc...
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request(someurl)
try:
response = urlopen(req)
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
# everything is fine
Note
The exceptHTTPErrormust come first, otherwise exceptURLError
will also catch an HTTPError.
from urllib.request import Request, urlopen
from urllib.error import URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError as e:
if hasattr(e, 'reason'):
print('We failed to reach a server.')
print('Reason: ', e.reason)
elif hasattr(e, 'code'):
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
else:
# everything is fine
The response returned by urlopen (or the HTTPError instance) has two
useful methods info() and geturl() and is defined in the module
urllib.response..
geturl - this returns the real URL of the page fetched. This is useful
because urlopen (or the opener object used) may have followed a
redirect. The URL of the page fetched may not be the same as the URL requested.
info - this returns a dictionary-like object that describes the page
fetched, particularly the headers sent by the server. It is currently an
http.client.HTTPMessage instance.
Typical headers include ‘Content-length’, ‘Content-type’, and so on. See the
Quick Reference to HTTP Headers
for a useful listing of HTTP headers with brief explanations of their meaning
and use.
When you fetch a URL you use an opener (an instance of the perhaps
confusingly-named urllib.request.OpenerDirector). Normally we have been using
the default opener - via urlopen - but you can create custom
openers. Openers use handlers. All the “heavy lifting” is done by the
handlers. Each handler knows how to open URLs for a particular URL scheme (http,
ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
redirections or HTTP cookies.
You will want to create openers if you want to fetch URLs with specific handlers
installed, for example to get an opener that handles cookies, or to get an
opener that does not handle redirections.
To create an opener, instantiate an OpenerDirector, and then call
.add_handler(some_handler_instance) repeatedly.
Alternatively, you can use build_opener, which is a convenience function for
creating opener objects with a single function call. build_opener adds
several handlers by default, but provides a quick way to add more and/or
override the default handlers.
Other sorts of handlers you might want to can handle proxies, authentication,
and other common but slightly specialised situations.
install_opener can be used to make an opener object the (global) default
opener. This means that calls to urlopen will use the opener you have
installed.
Opener objects have an open method, which can be called directly to fetch
urls in the same way as the urlopen function: there’s no need to call
install_opener, except as a convenience.
To illustrate creating and installing a handler we will use the
HTTPBasicAuthHandler. For a more detailed discussion of this subject –
including an explanation of how Basic Authentication works - see the Basic
Authentication Tutorial.
When authentication is required, the server sends a header (as well as the 401
error code) requesting authentication. This specifies the authentication scheme
and a ‘realm’. The header looks like : Www-authenticate:SCHEMErealm="REALM".
e.g.
Www-authenticate:Basicrealm="cPanel Users"
The client should then retry the request with the appropriate name and password
for the realm included as a header in the request. This is ‘basic
authentication’. In order to simplify this process we can create an instance of
HTTPBasicAuthHandler and an opener to use this handler.
The HTTPBasicAuthHandler uses an object called a password manager to handle
the mapping of URLs and realms to passwords and usernames. If you know what the
realm is (from the authentication header sent by the server), then you can use a
HTTPPasswordMgr. Frequently one doesn’t care what the realm is. In that
case, it is convenient to use HTTPPasswordMgrWithDefaultRealm. This allows
you to specify a default username and password for a URL. This will be supplied
in the absence of you providing an alternative combination for a specific
realm. We indicate this by providing None as the realm argument to the
add_password method.
The top-level URL is the first URL that requires authentication. URLs “deeper”
than the URL you pass to .add_password() will also match.
# create a password managerpassword_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm()# Add the username and password.# If we knew the realm, we could use it instead of None.top_level_url="http://example.com/foo/"password_mgr.add_password(None,top_level_url,username,password)handler=urllib.request.HTTPBasicAuthHandler(password_mgr)# create "opener" (OpenerDirector instance)opener=urllib.request.build_opener(handler)# use the opener to fetch a URLopener.open(a_url)# Install the opener.# Now all calls to urllib.request.urlopen use our opener.urllib.request.install_opener(opener)
Note
In the above example we only supplied our HTTPBasicAuthHandler to
build_opener. By default openers have the handlers for normal situations
– ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler,
FileHandler, HTTPErrorProcessor.
top_level_url is in fact either a full URL (including the ‘http:’ scheme
component and the hostname and optionally the port number)
e.g. “http://example.com/” or an “authority” (i.e. the hostname,
optionally including the port number) e.g. “example.com” or “example.com:8080”
(the latter example includes a port number). The authority, if present, must
NOT contain the “userinfo” component - for example “joe@password:example.com” is
not correct.
urllib will auto-detect your proxy settings and use those. This is through
the ProxyHandler which is part of the normal handler chain. Normally that’s
a good thing, but there are occasions when it may not be helpful [6]. One way
to do this is to setup our own ProxyHandler, with no proxies defined. This
is done using similar steps to setting up a Basic Authentication handler :
Currently urllib.requestdoes not support fetching of https locations
through a proxy. However, this can be enabled by extending urllib.request as
shown in the recipe [7].
The Python support for fetching resources from the web is layered. urllib uses
the http.client library, which in turn uses the socket library.
As of Python 2.3 you can specify how long a socket should wait for a response
before timing out. This can be useful in applications which have to fetch web
pages. By default the socket module has no timeout and can hang. Currently,
the socket timeout is not exposed at the http.client or urllib.request levels.
However, you can set the default timeout globally for all sockets using
import socket
import urllib.request
# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
# this call to urllib.request.urlopen now uses the default timeout
# we have set in the socket module
req = urllib.request.Request('http://www.voidspace.org.uk')
response = urllib.request.urlopen(req)
Like Google for example. The proper way to use google from a program
is to use PyGoogle of course. See
Voidspace Google
for some examples of using the Google API.
Browser sniffing is a very bad practise for website design - building
sites using web standards is much more sensible. Unfortunately a lot of
sites still send different versions to different browsers.
In my case I have to use a proxy to access the internet at work. If you
attempt to fetch localhost URLs through this proxy it blocks them. IE
is set to use the proxy, which urllib picks up on. In order to test
scripts with a localhost server, I have to prevent urllib from using
the proxy.
This document shows how Python fits into the web. It presents some ways
to integrate Python with a web server, and general practices useful for
developing web sites.
Programming for the Web has become a hot topic since the rise of “Web 2.0”,
which focuses on user-generated content on web sites. It has always been
possible to use Python for creating web sites, but it was a rather tedious task.
Therefore, many frameworks and helper tools have been created to assist
developers in creating faster and more robust sites. This HOWTO describes
some of the methods used to combine Python with a web server to create
dynamic content. It is not meant as a complete introduction, as this topic is
far too broad to be covered in one single document. However, a short overview
of the most popular libraries is provided.
See also
While this HOWTO tries to give an overview of Python in the web, it cannot
always be as up to date as desired. Web development in Python is rapidly
moving forward, so the wiki page on Web Programming may be more in sync with
recent development.
When a user enters a web site, their browser makes a connection to the site’s
web server (this is called the request). The server looks up the file in the
file system and sends it back to the user’s browser, which displays it (this is
the response). This is roughly how the underlying protocol, HTTP, works.
Dynamic web sites are not based on files in the file system, but rather on
programs which are run by the web server when a request comes in, and which
generate the content that is returned to the user. They can do all sorts of
useful things, like display the postings of a bulletin board, show your email,
configure software, or just display the current time. These programs can be
written in any programming language the server supports. Since most servers
support Python, it is easy to use Python to create dynamic web sites.
Most HTTP servers are written in C or C++, so they cannot execute Python code
directly – a bridge is needed between the server and the program. These
bridges, or rather interfaces, define how programs interact with the server.
There have been numerous attempts to create the best possible interface, but
there are only a few worth mentioning.
Not every web server supports every interface. Many web servers only support
old, now-obsolete interfaces; however, they can often be extended using
third-party modules to support newer ones.
This interface, most commonly referred to as “CGI”, is the oldest, and is
supported by nearly every web server out of the box. Programs using CGI to
communicate with their web server need to be started by the server for every
request. So, every request starts a new Python interpreter – which takes some
time to start up – thus making the whole interface only usable for low load
situations.
The upside of CGI is that it is simple – writing a Python program which uses
CGI is a matter of about three lines of code. This simplicity comes at a
price: it does very few things to help the developer.
Writing CGI programs, while still possible, is no longer recommended. With
WSGI, a topic covered later in this document, it is possible to write
programs that emulate CGI, so they can be run as CGI if no better option is
available.
See also
The Python standard library includes some modules that are helpful for
creating plain CGI programs:
Depending on your web server configuration, you may need to save this code with
a .py or .cgi extension. Additionally, this file may also need to be
in a cgi-bin folder, for security reasons.
You might wonder what the cgitb line is about. This line makes it possible
to display a nice traceback instead of just crashing and displaying an “Internal
Server Error” in the user’s browser. This is useful for debugging, but it might
risk exposing some confidential data to the user. You should not use cgitb
in production code for this reason. You should always catch exceptions, and
display proper error pages – end-users don’t like to see nondescript “Internal
Server Errors” in their browsers.
If you don’t have your own web server, this does not apply to you. You can
check whether it works as-is, and if not you will need to talk to the
administrator of your web server. If it is a big host, you can try filing a
ticket asking for Python support.
If you are your own administrator or want to set up CGI for testing purposes on
your own computers, you have to configure it by yourself. There is no single
way to configure CGI, as there are many web servers with different
configuration options. Currently the most widely used free web server is
Apache HTTPd, or Apache for short. Apache can be
easily installed on nearly every system using the system’s package management
tool. lighttpd is another alternative and is
said to have better performance. On many systems this server can also be
installed using the package management tool, so manually compiling the web
server may not be needed.
On Apache you can take a look at the Dynamic Content with CGI tutorial, where everything
is described. Most of the time it is enough just to set +ExecCGI. The
tutorial also describes the most common gotchas that might arise.
On lighttpd you need to use the CGI module, which can be configured
in a straightforward way. It boils down to setting cgi.assign properly.
Using CGI sometimes leads to small annoyances while trying to get these
scripts to run. Sometimes a seemingly correct script does not work as
expected, the cause being some small hidden problem that’s difficult to spot.
Some of these potential problems are:
The Python script is not marked as executable. When CGI scripts are not
executable most web servers will let the user download it, instead of
running it and sending the output to the user. For CGI scripts to run
properly on Unix-like operating systems, the +x bit needs to be set.
Using chmoda+xyour_script.py may solve this problem.
On a Unix-like system, The line endings in the program file must be Unix
style line endings. This is important because the web server checks the
first line of the script (called shebang) and tries to run the program
specified there. It gets easily confused by Windows line endings (Carriage
Return & Line Feed, also called CRLF), so you have to convert the file to
Unix line endings (only Line Feed, LF). This can be done automatically by
uploading the file via FTP in text mode instead of binary mode, but the
preferred way is just telling your editor to save the files with Unix line
endings. Most editors support this.
Your web server must be able to read the file, and you need to make sure the
permissions are correct. On unix-like systems, the server often runs as user
and group www-data, so it might be worth a try to change the file
ownership, or making the file world readable by using chmoda+ryour_script.py.
The web server must know that the file you’re trying to access is a CGI script.
Check the configuration of your web server, as it may be configured
to expect a specific file extension for CGI scripts.
On Unix-like systems, the path to the interpreter in the shebang
(#!/usr/bin/envpython) must be correct. This line calls
/usr/bin/env to find Python, but it will fail if there is no
/usr/bin/env, or if Python is not in the web server’s path. If you know
where your Python is installed, you can also use that full path. The
commands whereispython and type-ppython could help you find
where it is installed. Once you know the path, you can change the shebang
accordingly: #!/usr/bin/python.
The file must not contain a BOM (Byte Order Mark). The BOM is meant for
determining the byte order of UTF-16 and UTF-32 encodings, but some editors
write this also into UTF-8 files. The BOM interferes with the shebang line,
so be sure to tell your editor not to write the BOM.
If the web server is using mod_python, mod_python may be having
problems. mod_python is able to handle CGI scripts by itself, but it can
also be a source of issues.
People coming from PHP often find it hard to grasp how to use Python in the web.
Their first thought is mostly mod_python,
because they think that this is the equivalent to mod_php. Actually, there
are many differences. What mod_python does is embed the interpreter into
the Apache process, thus speeding up requests by not having to start a Python
interpreter for each request. On the other hand, it is not “Python intermixed
with HTML” in the way that PHP is often intermixed with HTML. The Python
equivalent of that is a template engine. mod_python itself is much more
powerful and provides more access to Apache internals. It can emulate CGI,
work in a “Python Server Pages” mode (similar to JSP) which is “HTML
intermingled with Python”, and it has a “Publisher” which designates one file
to accept all requests and decide what to do with them.
mod_python does have some problems. Unlike the PHP interpreter, the Python
interpreter uses caching when executing files, so changes to a file will
require the web server to be restarted. Another problem is the basic concept
– Apache starts child processes to handle the requests, and unfortunately
every child process needs to load the whole Python interpreter even if it does
not use it. This makes the whole web server slower. Another problem is that,
because mod_python is linked against a specific version of libpython,
it is not possible to switch from an older version to a newer (e.g. 2.4 to 2.5)
without recompiling mod_python. mod_python is also bound to the Apache
web server, so programs written for mod_python cannot easily run on other
web servers.
These are the reasons why mod_python should be avoided when writing new
programs. In some circumstances it still might be a good idea to use
mod_python for deployment, but WSGI makes it possible to run WSGI programs
under mod_python as well.
FastCGI and SCGI try to solve the performance problem of CGI in another way.
Instead of embedding the interpreter into the web server, they create
long-running background processes. There is still a module in the web server
which makes it possible for the web server to “speak” with the background
process. As the background process is independent of the server, it can be
written in any language, including Python. The language just needs to have a
library which handles the communication with the webserver.
The difference between FastCGI and SCGI is very small, as SCGI is essentially
just a “simpler FastCGI”. As the web server support for SCGI is limited,
most people use FastCGI instead, which works the same way. Almost everything
that applies to SCGI also applies to FastCGI as well, so we’ll only cover
the latter.
These days, FastCGI is never used directly. Just like mod_python, it is only
used for the deployment of WSGI applications.
Apache has both mod_fastcgi and mod_fcgid. mod_fastcgi is the original one, but it
has some licensing issues, which is why it is sometimes considered non-free.
mod_fcgid is a smaller, compatible alternative. One of these modules needs
to be loaded by Apache.
Once you have installed and configured the module, you can test it with the
following WSGI-application:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import sys, os
from html import escape
from flup.server.fcgi import WSGIServer
def app(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
yield '<h1>FastCGI Environment</h1>'
yield '<table>'
for k, v in sorted(environ.items()):
yield '<tr><th>{0}</th><td>{1}</td></tr>'.format(
escape(k), escape(v))
yield '</table>'
WSGIServer(app).run()
This is a simple WSGI application, but you need to install flup first, as flup handles the low level
FastCGI access.
See also
There is some documentation on setting up Django with FastCGI, most of
which can be reused for other WSGI-compliant frameworks and libraries.
Only the manage.py part has to be changed, the example used here can be
used instead. Django does more or less the exact same thing.
mod_wsgi is an attempt to get rid of the
low level gateways. Given that FastCGI, SCGI, and mod_python are mostly used to
deploy WSGI applications, mod_wsgi was started to directly embed WSGI applications
into the Apache web server. mod_wsgi is specifically designed to host WSGI
applications. It makes the deployment of WSGI applications much easier than
deployment using other low level methods, which need glue code. The downside
is that mod_wsgi is limited to the Apache web server; other servers would need
their own implementations of mod_wsgi.
mod_wsgi supports two modes: embedded mode, in which it integrates with the
Apache process, and daemon mode, which is more FastCGI-like. Unlike FastCGI,
mod_wsgi handles the worker-processes by itself, which makes administration
easier.
WSGI has already been mentioned several times, so it has to be something
important. In fact it really is, and now it is time to explain it.
The Web Server Gateway Interface, or WSGI for short, is defined in
PEP 333 and is currently the best way to do Python web programming. While
it is great for programmers writing frameworks, a normal web developer does not
need to get in direct contact with it. When choosing a framework for web
development it is a good idea to choose one which supports WSGI.
The big benefit of WSGI is the unification of the application programming
interface. When your program is compatible with WSGI – which at the outer
level means that the framework you are using has support for WSGI – your
program can be deployed via any web server interface for which there are WSGI
wrappers. You do not need to care about whether the application user uses
mod_python or FastCGI or mod_wsgi – with WSGI your application will work on
any gateway interface. The Python standard library contains its own WSGI
server, wsgiref, which is a small web server that can be used for
testing.
A really great WSGI feature is middleware. Middleware is a layer around your
program which can add various functionality to it. There is quite a bit of
middleware already
available. For example, instead of writing your own session management (HTTP
is a stateless protocol, so to associate multiple HTTP requests with a single
user your application must create and manage such state via a session), you can
just download middleware which does that, plug it in, and get on with coding
the unique parts of your application. The same thing with compression – there
is existing middleware which handles compressing your HTML using gzip to save
on your server’s bandwidth. Authentication is another a problem easily solved
using existing middleware.
Although WSGI may seem complex, the initial phase of learning can be very
rewarding because WSGI and the associated middleware already have solutions to
many problems that might arise while developing web sites.
The code that is used to connect to various low level gateways like CGI or
mod_python is called a WSGI server. One of these servers is flup, which
supports FastCGI and SCGI, as well as AJP. Some of these servers
are written in Python, as flup is, but there also exist others which are
written in C and can be used as drop-in replacements.
There are many servers already available, so a Python web application
can be deployed nearly anywhere. This is one big advantage that Python has
compared with other web technologies.
See also
A good overview of WSGI-related code can be found in the WSGI wiki, which contains an extensive list of WSGI servers which can be used by any application
supporting WSGI.
You might be interested in some WSGI-supporting modules already contained in
the standard library, namely:
wsgiref – some tiny utilities and servers for WSGI
What does WSGI give the web application developer? Let’s take a look at
an application that’s been around for a while, which was written in
Python without using WSGI.
One of the most widely used wiki software packages is MoinMoin. It was created in 2000, so it predates WSGI by about
three years. Older versions needed separate code to run on CGI, mod_python,
FastCGI and standalone.
It now includes support for WSGI. Using WSGI, it is possible to deploy
MoinMoin on any WSGI compliant server, with no additional glue code.
Unlike the pre-WSGI versions, this could include WSGI servers that the
authors of MoinMoin know nothing about.
The term MVC is often encountered in statements such as “framework foo
supports MVC”. MVC is more about the overall organization of code, rather than
any particular API. Many web frameworks use this model to help the developer
bring structure to their program. Bigger web applications can have lots of
code, so it is a good idea to have an effective structure right from the beginning.
That way, even users of other frameworks (or even other languages, since MVC is
not Python-specific) can easily understand the code, given that they are
already familiar with the MVC structure.
MVC stands for three components:
The model. This is the data that will be displayed and modified. In
Python frameworks, this component is often represented by the classes used by
an object-relational mapper.
The view. This component’s job is to display the data of the model to the
user. Typically this component is implemented via templates.
The controller. This is the layer between the user and the model. The
controller reacts to user actions (like opening some specific URL), tells
the model to modify the data if necessary, and tells the view code what to
display,
While one might think that MVC is a complex design pattern, in fact it is not.
It is used in Python because it has turned out to be useful for creating clean,
maintainable web sites.
Note
While not all Python frameworks explicitly support MVC, it is often trivial
to create a web site which uses the MVC pattern by separating the data logic
(the model) from the user interaction logic (the controller) and the
templates (the view). That’s why it is important not to write unnecessary
Python code in the templates – it works against the MVC model and creates
chaos in the code base, making it harder to understand and modify.
See also
The English Wikipedia has an article about the Model-View-Controller pattern. It includes a long
list of web frameworks for various programming languages.
Websites are complex constructs, so tools have been created to help web
developers make their code easier to write and more maintainable. Tools like
these exist for all web frameworks in all languages. Developers are not forced
to use these tools, and often there is no “best” tool. It is worth learning
about the available tools because they can greatly simplify the process of
developing a web site.
See also
There are far more components than can be presented here. The Python wiki
has a page about these components, called
Web Components.
Mixing of HTML and Python code is made possible by a few libraries. While
convenient at first, it leads to horribly unmaintainable code. That’s why
templates exist. Templates are, in the simplest case, just HTML files with
placeholders. The HTML is sent to the user’s browser after filling in the
placeholders.
Python already includes a way to build simple templates:
# a simple templatetemplate="<html><body><h1>Hello {who}!</h1></body></html>"print(template.format(who="Reader"))
To generate complex HTML based on non-trivial model data, conditional
and looping constructs like Python’s for and if are generally needed.
Template engines support templates of this complexity.
There are a lot of template engines available for Python which can be used with
or without a framework. Some of these define a plain-text programming
language which is easy to learn, partly because it is limited in scope.
Others use XML, and the template output is guaranteed to be always be valid
XML. There are many other variations.
Some frameworks ship their own template engine or recommend one in
particular. In the absence of a reason to use a different template engine,
using the one provided by or recommended by the framework is a good idea.
There are many template engines competing for attention, because it is
pretty easy to create them in Python. The page Templating in the wiki lists a big,
ever-growing number of these. The three listed above are considered “second
generation” template engines and are a good place to start.
Data persistence, while sounding very complicated, is just about storing data.
This data might be the text of blog entries, the postings on a bulletin board or
the text of a wiki page. There are, of course, a number of different ways to store
information on a web server.
Often, relational database engines like MySQL or
PostgreSQL are used because of their good
performance when handling very large databases consisting of millions of
entries. There is also a small database engine called SQLite, which is bundled with Python in the sqlite3
module, and which uses only one file. It has no other dependencies. For
smaller sites SQLite is just enough.
Relational databases are queried using a language called SQL. Python programmers in general do not
like SQL too much, as they prefer to work with objects. It is possible to save
Python objects into a database using a technology called ORM (Object Relational
Mapping). ORM translates all object-oriented access into SQL code under the
hood, so the developer does not need to think about it. Most frameworks use
ORMs, and it works quite well.
A second possibility is storing data in normal, plain text files (some
times called “flat files”). This is very easy for simple sites,
but can be difficult to get right if the web site is performing many
updates to the stored data.
A third possibility are object oriented databases (also called “object
databases”). These databases store the object data in a form that closely
parallels the way the objects are structured in memory during program
execution. (By contrast, ORMs store the object data as rows of data in tables
and relations between those rows.) Storing the objects directly has the
advantage that nearly all objects can be saved in a straightforward way, unlike
in relational databases where some objects are very hard to represent.
Frameworks often give hints on which data storage method to choose. It is
usually a good idea to stick to the data store recommended by the framework
unless the application has special requirements better satisfied by an
alternate storage mechanism.
See also
Persistence Tools lists
possibilities on how to save data in the file system. Some of these
modules are part of the standard library
The process of creating code to run web sites involves writing code to provide
various services. The code to provide a particular service often works the
same way regardless of the complexity or purpose of the web site in question.
Abstracting these common solutions into reusable code produces what are called
“frameworks” for web development. Perhaps the most well-known framework for
web development is Ruby on Rails, but Python has its own frameworks. Some of
these were partly inspired by Rails, or borrowed ideas from Rails, but many
existed a long time before Rails.
Originally Python web frameworks tended to incorporate all of the services
needed to develop web sites as a giant, integrated set of tools. No two web
frameworks were interoperable: a program developed for one could not be
deployed on a different one without considerable re-engineering work. This led
to the development of “minimalist” web frameworks that provided just the tools
to communicate between the Python code and the http protocol, with all other
services to be added on top via separate components. Some ad hoc standards
were developed that allowed for limited interoperability between frameworks,
such as a standard that allowed different template engines to be used
interchangeably.
Since the advent of WSGI, the Python web framework world has been evolving
toward interoperability based on the WSGI standard. Now many web frameworks,
whether “full stack” (providing all the tools one needs to deploy the most
complex web sites) or minimalist, or anything in between, are built from
collections of reusable components that can be used with more than one
framework.
The majority of users will probably want to select a “full stack” framework
that has an active community. These frameworks tend to be well documented,
and provide the easiest path to producing a fully functional web site in
minimal time.
Django is a framework consisting of several
tightly coupled elements which were written from scratch and work together very
well. It includes an ORM which is quite powerful while being simple to use,
and has a great online administration interface which makes it possible to edit
the data in the database with a browser. The template engine is text-based and
is designed to be usable for page designers who cannot write Python. It
supports template inheritance and filters (which work like Unix pipes). Django
has many handy features bundled, such as creation of RSS feeds or generic views,
which make it possible to create web sites almost without writing any Python code.
It has a big, international community, the members of which have created many
web sites. There are also a lot of add-on projects which extend Django’s normal
functionality. This is partly due to Django’s well written online
documentation and the Django book.
Note
Although Django is an MVC-style framework, it names the elements
differently, which is described in the Django FAQ.
Another popular web framework for Python is TurboGears. TurboGears takes the approach of using already
existing components and combining them with glue code to create a seamless
experience. TurboGears gives the user flexibility in choosing components. For
example the ORM and template engine can be changed to use packages different
from those used by default.
The documentation can be found in the TurboGears wiki, where links to screencasts can be found.
TurboGears has also an active user community which can respond to most related
questions. There is also a TurboGears book
published, which is a good starting point.
The newest version of TurboGears, version 2.0, moves even further in direction
of WSGI support and a component-based architecture. TurboGears 2 is based on
the WSGI stack of another popular component-based web framework, Pylons.
The Zope framework is one of the “old original” frameworks. Its current
incarnation in Zope2 is a tightly integrated full-stack framework. One of its
most interesting feature is its tight integration with a powerful object
database called the ZODB (Zope Object Database).
Because of its highly integrated nature, Zope wound up in a somewhat isolated
ecosystem: code written for Zope wasn’t very usable outside of Zope, and
vice-versa. To solve this problem the Zope 3 effort was started. Zope 3
re-engineers Zope as a set of more cleanly isolated components. This effort
was started before the advent of the WSGI standard, but there is WSGI support
for Zope 3 from the Repoze project. Zope components
have many years of production use behind them, and the Zope 3 project gives
access to these components to the wider Python community. There is even a
separate framework based on the Zope components: Grok.
Zope is also the infrastructure used by the Plone content
management system, one of the most powerful and popular content management
systems available.
Of course these are not the only frameworks that are available. There are
many other frameworks worth mentioning.
Another framework that’s already been mentioned is Pylons. Pylons is much
like TurboGears, but with an even stronger emphasis on flexibility, which comes
at the cost of being more difficult to use. Nearly every component can be
exchanged, which makes it necessary to use the documentation of every single
component, of which there are many. Pylons builds upon Paste, an extensive set of tools which are handy for WSGI.
And that’s still not everything. The most up-to-date information can always be
found in the Python wiki.
See also
The Python wiki contains an extensive list of web frameworks.
Most frameworks also have their own mailing lists and IRC channels, look out
for these on the projects’ web sites. There is also a general “Python in the
Web” IRC channel on freenode called #python.web.
Python is an interpreted, interactive, object-oriented programming language. It
incorporates modules, exceptions, dynamic typing, very high level dynamic data
types, and classes. Python combines remarkable power with very clear syntax.
It has interfaces to many system calls and libraries, as well as to various
window systems, and is extensible in C or C++. It is also usable as an
extension language for applications that need a programmable interface.
Finally, Python is portable: it runs on many Unix variants, on the Mac, and on
PCs under MS-DOS, Windows, Windows NT, and OS/2.
The Python Software Foundation is an independent non-profit organization that
holds the copyright on Python versions 2.1 and newer. The PSF’s mission is to
advance open source technology related to the Python programming language and to
publicize the use of Python. The PSF’s home page is at
http://www.python.org/psf/.
Donations to the PSF are tax-exempt in the US. If you use Python and find it
helpful, please contribute via the PSF donation page.
You can do anything you want with the source, as long as you leave the
copyrights in and display those copyrights in any documentation about Python
that you produce. If you honor the copyright rules, it’s OK to use Python for
commercial use, to sell copies of Python in source or binary form (modified or
unmodified), or to sell products that incorporate Python in some form. We would
still like to know about all commercial use of Python, of course.
See the PSF license page to find further
explanations and a link to the full text of the license.
The Python logo is trademarked, and in certain cases permission is required to
use it. Consult the Trademark Usage Policy for more information.
Here’s a very brief summary of what started it all, written by Guido van
Rossum:
I had extensive experience with implementing an interpreted language in the
ABC group at CWI, and from working with this group I had learned a lot about
language design. This is the origin of many Python features, including the
use of indentation for statement grouping and the inclusion of
very-high-level data types (although the details are all different in
Python).
I had a number of gripes about the ABC language, but also liked many of its
features. It was impossible to extend the ABC language (or its
implementation) to remedy my complaints – in fact its lack of extensibility
was one of its biggest problems. I had some experience with using Modula-2+
and talked with the designers of Modula-3 and read the Modula-3 report.
Modula-3 is the origin of the syntax and semantics used for exceptions, and
some other Python features.
I was working in the Amoeba distributed operating system group at CWI. We
needed a better way to do system administration than by writing either C
programs or Bourne shell scripts, since Amoeba had its own system call
interface which wasn’t easily accessible from the Bourne shell. My
experience with error handling in Amoeba made me acutely aware of the
importance of exceptions as a programming language feature.
It occurred to me that a scripting language with a syntax like ABC but with
access to the Amoeba system calls would fill the need. I realized that it
would be foolish to write an Amoeba-specific language, so I decided that I
needed a language that was generally extensible.
During the 1989 Christmas holidays, I had a lot of time on my hand, so I
decided to give it a try. During the next year, while still mostly working
on it in my own time, Python was used in the Amoeba project with increasing
success, and the feedback from colleagues made me add many early
improvements.
In February 1991, after just over a year of development, I decided to post to
USENET. The rest is in the Misc/HISTORY file.
Python is a high-level general-purpose programming language that can be applied
to many different classes of problems.
The language comes with a large standard library that covers areas such as
string processing (regular expressions, Unicode, calculating differences between
files), Internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP, CGI
programming), software engineering (unit testing, logging, profiling, parsing
Python code), and operating system interfaces (system calls, filesystems, TCP/IP
sockets). Look at the table of contents for Python 标准库 to get an idea
of what’s available. A wide variety of third-party extensions are also
available. Consult the Python Package Index to
find packages of interest to you.
Python versions are numbered A.B.C or A.B. A is the major version number – it
is only incremented for really major changes in the language. B is the minor
version number, incremented for less earth-shattering changes. C is the
micro-level – it is incremented for each bugfix release. See PEP 6 for more
information about bugfix releases.
Not all releases are bugfix releases. In the run-up to a new major release, a
series of development releases are made, denoted as alpha, beta, or release
candidate. Alphas are early releases in which interfaces aren’t yet finalized;
it’s not unexpected to see an interface change between two alpha releases.
Betas are more stable, preserving existing interfaces but possibly adding new
modules, and release candidates are frozen, making no changes except as needed
to fix critical bugs.
Alpha, beta and release candidate versions have an additional suffix. The
suffix for an alpha version is “aN” for some small number N, the suffix for a
beta version is “bN” for some small number N, and the suffix for a release
candidate version is “cN” for some small number N. In other words, all versions
labeled 2.0aN precede the versions labeled 2.0bN, which precede versions labeled
2.0cN, and those precede 2.0.
You may also find version numbers with a “+” suffix, e.g. “2.2+”. These are
unreleased versions, built directly from the Subversion trunk. In practice,
after a final minor release is made, the Subversion trunk is incremented to the
next minor version, which becomes the “a0” version,
e.g. “2.4a0”.
See also the documentation for sys.version, sys.hexversion, and
sys.version_info.
The source distribution is a gzipped tar file containing the complete C source,
Sphinx-formatted documentation, Python library modules, example programs, and
several useful pieces of freely distributable software. The source will compile
and run out of the box on most UNIX platforms.
Consult the Developer FAQ for more
information on getting the source code and compiling it.
The documentation is written in reStructuredText and processed by the Sphinx
documentation tool. The reStructuredText source
for the documentation is part of the Python source distribution.
There is a newsgroup, comp.lang.python, and a mailing list,
python-list. The
newsgroup and mailing list are gatewayed into each other – if you can read news
it’s unnecessary to subscribe to the mailing list.
comp.lang.python is high-traffic, receiving hundreds of postings
every day, and Usenet readers are often more able to cope with this volume.
Announcements of new software releases and events can be found in
comp.lang.python.announce, a low-traffic moderated list that receives about five
postings per day. It’s available as the python-announce mailing list.
Alpha and beta releases are available from http://www.python.org/download/. All
releases are announced on the comp.lang.python and comp.lang.python.announce
newsgroups and on the Python home page at http://www.python.org/; an RSS feed of
news is available.
You can also access the development version of Python through Subversion. See
http://www.python.org/dev/faq/ for details.
To report a bug or submit a patch, please use the Roundup installation at
http://bugs.python.org/.
You must have a Roundup account to report bugs; this makes it possible for us to
contact you if we have follow-up questions. It will also enable Roundup to send
you updates as we act on your bug. If you had previously used SourceForge to
report bugs to Python, you can obtain your Roundup password through Roundup’s
password reset procedure.
When he began implementing Python, Guido van Rossum was also reading the
published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum
thought he needed a name that was short, unique, and slightly mysterious, so he
decided to call the language Python.
Very stable. New, stable releases have been coming out roughly every 6 to 18
months since 1991, and this seems likely to continue. Currently there are
usually around 18 months between major releases.
The developers issue “bugfix” releases of older versions, so the stability of
existing releases gradually improves. Bugfix releases, indicated by a third
component of the version number (e.g. 2.5.3, 2.6.2), are managed for stability;
only fixes for known problems are included in a bugfix release, and it’s
guaranteed that interfaces will remain the same throughout a series of bugfix
releases.
The latest stable releases can always be found on the Python download page. There are two recommended production-ready
versions at this point in time, because at the moment there are two branches of
stable releases: 2.x and 3.x. Python 3.x may be less useful than 2.x, since
currently there is more third party software available for Python 2 than for
Python 3. Python 2 code will generally not run unchanged in Python 3.
There are probably tens of thousands of users, though it’s difficult to obtain
an exact count.
Python is available for free download, so there are no sales figures, and it’s
available from many different sites and packaged with many Linux distributions,
so download statistics don’t tell the whole story either.
The comp.lang.python newsgroup is very active, but not all Python users post to
the group or even read it.
High-profile Python projects include the Mailman mailing list manager and the Zope application server. Several Linux distributions, most notably Red Hat, have written part or all of their installer and
system administration software in Python. Companies that use Python internally
include Google, Yahoo, and Lucasfilm Ltd.
See http://www.python.org/dev/peps/ for the Python Enhancement Proposals
(PEPs). PEPs are design documents describing a suggested new feature for Python,
providing a concise technical specification and a rationale. Look for a PEP
titled “Python X.Y Release Schedule”, where X.Y is a version that hasn’t been
publicly released yet.
In general, no. There are already millions of lines of Python code around the
world, so any change in the language that invalidates more than a very small
fraction of existing programs has to be frowned upon. Even if you can provide a
conversion program, there’s still the problem of updating all documentation;
many books have been written about Python, and we don’t want to invalidate them
all at a single stroke.
Providing a gradual upgrade path is necessary if a feature has to be changed.
PEP 5 describes the procedure followed for introducing backward-incompatible
changes while minimizing disruption for users.
As of August, 2003 no major problems have been reported and Y2K compliance seems
to be a non-issue.
Python does very few date calculations and for those it does perform relies on
the C library functions. Python generally represents times either as seconds
since 1970 or as a (year,month,day,...) tuple where the year is expressed
with four digits, which makes Y2K bugs unlikely. So as long as your C library
is okay, Python should be okay. Of course, it’s possible that a particular
application written in Python makes assumptions about 2-digit years.
Because Python is available free of charge, there are no absolute guarantees.
If there are unforeseen problems, liability is the user’s problem rather than
the developers’, and there is nobody you can sue for damages. The Python
copyright notice contains the following disclaimer:
4. PSF is making Python 2.3 available to Licensee on an “AS IS”
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY
WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY
REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR
PURPOSE OR THAT THE USE OF PYTHON 2.3 WILL NOT INFRINGE ANY THIRD PARTY
RIGHTS.
5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
2.3 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 2.3,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
The good news is that if you encounter a problem, you have full source
available to track it down and fix it. This is one advantage of an open source
programming environment.
It is still common to start students with a procedural and statically typed
language such as Pascal, C, or a subset of C++ or Java. Students may be better
served by learning Python as their first language. Python has a very simple and
consistent syntax and a large standard library and, most importantly, using
Python in a beginning programming course lets students concentrate on important
programming skills such as problem decomposition and data type design. With
Python, students can be quickly introduced to basic concepts such as loops and
procedures. They can probably even work with user-defined objects in their very
first course.
For a student who has never programmed before, using a statically typed language
seems unnatural. It presents additional complexity that the student must master
and slows the pace of the course. The students are trying to learn to think
like a computer, decompose problems, design consistent interfaces, and
encapsulate data. While learning to use a statically typed language is
important in the long term, it is not necessarily the best topic to address in
the students’ first programming course.
Many other aspects of Python make it a good first language. Like Java, Python
has a large standard library so that students can be assigned programming
projects very early in the course that do something. Assignments aren’t
restricted to the standard four-function calculator and check balancing
programs. By using the standard library, students can gain the satisfaction of
working on realistic applications as they learn the fundamentals of programming.
Using the standard library also teaches students about code reuse. Third-party
modules such as PyGame are also helpful in extending the students’ reach.
Python’s interactive interpreter enables students to test language features
while they’re programming. They can keep a window with the interpreter running
while they enter their program’s source in another window. If they can’t
remember the methods for a list, they can do something like this:
>>> L = []
>>> dir(L)
['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort']
>>> help(L.append)
Help on built-in function append:
append(...)
L.append(object) -- append object to end
>>> L.append(1)
>>> L
[1]
With the interpreter, documentation is never far from the student as he’s
programming.
There are also good IDEs for Python. IDLE is a cross-platform IDE for Python
that is written in Python using Tkinter. PythonWin is a Windows-specific IDE.
Emacs users will be happy to know that there is a very good Python mode for
Emacs. All of these programming environments provide syntax highlighting,
auto-indenting, and access to the interactive interpreter while coding. Consult
http://www.python.org/editors/ for a full list of Python editing environments.
If you want to discuss Python’s use in education, you may be interested in
joining the edu-sig mailing list.
Starting with Python2.3, the distribution includes the PyBSDDB package
<http://pybsddb.sf.net/> as a replacement for the old bsddb module. It
includes functions which provide backward compatibility at the API level, but
requires a newer version of the underlying Berkeley DB library. Files created with the older bsddb module
can’t be opened directly using the new module.
Using your old version of Python and a pair of scripts which are part of Python
2.3 (db2pickle.py and pickle2db.py, in the Tools/scripts directory) you can
convert your old database files to the new format. Using your old Python
version, run the db2pickle.py script to convert it to a pickle, e.g.:
The precise commands you use will vary depending on the particulars of your
installation. For full details about operation of these two scripts check the
doc string at the start of each one.
The pdb module is a simple but adequate console-mode debugger for Python. It is
part of the standard Python library, and is documentedintheLibraryReferenceManual. You can also write your own debugger by using the code
for pdb as an example.
The IDLE interactive development environment, which is part of the standard
Python distribution (normally available as Tools/scripts/idle), includes a
graphical debugger. There is documentation for the IDLE debugger at
http://www.python.org/idle/doc/idle2.html#Debugger.
PythonWin is a Python IDE that includes a GUI debugger based on pdb. The
Pythonwin debugger colors breakpoints and has quite a few cool features such as
debugging non-Pythonwin programs. Pythonwin is available as part of the Python
for Windows Extensions project and
as a part of the ActivePython distribution (see
http://www.activestate.com/Products/ActivePython/index.html).
Boa Constructor is an IDE and GUI
builder that uses wxWidgets. It offers visual frame creation and manipulation,
an object inspector, many views on the source like object browsers, inheritance
hierarchies, doc string generated html documentation, an advanced debugger,
integrated help, and Zope support.
Eric is an IDE built on PyQt
and the Scintilla editing component.
PyChecker is a static analysis tool that finds bugs in Python source code and
warns about code complexity and style. You can get PyChecker from
http://pychecker.sf.net.
Pylint is another tool that checks
if a module satisfies a coding standard, and also makes it possible to write
plug-ins to add a custom feature. In addition to the bug checking that
PyChecker performs, Pylint offers some additional features such as checking line
length, whether variable names are well-formed according to your coding
standard, whether declared interfaces are fully implemented, and more.
http://www.logilab.org/card/pylint_manual provides a full list of Pylint’s
features.
You don’t need the ability to compile Python to C code if all you want is a
stand-alone program that users can download and run without having to install
the Python distribution first. There are a number of tools that determine the
set of modules required by a program and bind these modules together with a
Python binary to produce a single executable.
One is to use the freeze tool, which is included in the Python source tree as
Tools/freeze. It converts Python byte code to C arrays; a C compiler you can
embed all your modules into a new program, which is then linked with the
standard Python modules.
It works by scanning your source recursively for import statements (in both
forms) and looking for the modules in the standard Python path as well as in the
source directory (for built-in modules). It then turns the bytecode for modules
written in Python into C code (array initializers that can be turned into code
objects using the marshal module) and creates a custom-made config file that
only contains those built-in modules which are actually used in the program. It
then compiles the generated C code and links it with the rest of the Python
interpreter to form a self-contained binary which acts exactly like your script.
Obviously, freeze requires a C compiler. There are several other utilities
which don’t. One is Thomas Heller’s py2exe (Windows only) at
Another is Christian Tismer’s SQFREEZE
which appends the byte code to a specially-prepared Python interpreter that can
find the byte code in the executable.
Other tools include Fredrik Lundh’s Squeeze and Anthony Tuininga’s
cx_Freeze.
That’s a tough one, in general. There are many tricks to speed up Python code;
consider rewriting parts in C as a last resort.
In some cases it’s possible to automatically translate Python to C or x86
assembly language, meaning that you don’t have to modify your code to gain
increased speed.
Cython and Pyrex
can compile a slightly modified version of Python code into a C extension, and
can be used on many different platforms.
Psyco is a just-in-time compiler that
translates Python code into x86 assembly language. If you can use it, Psyco can
provide dramatic speedups for critical functions.
The rest of this answer will discuss various tricks for squeezing a bit more
speed out of Python code. Never apply any optimization tricks unless you know
you need them, after profiling has indicated that a particular function is the
heavily executed hot spot in the code. Optimizations almost always make the
code less clear, and you shouldn’t pay the costs of reduced clarity (increased
development time, greater likelihood of bugs) unless the resulting performance
benefit is worth it.
One thing to notice is that function and (especially) method calls are rather
expensive; if you have designed a purely OO interface with lots of tiny
functions that don’t do much more than get or set an instance variable or call
another method, you might consider using a more direct way such as directly
accessing instance variables. Also see the standard module profile which
makes it possible to find out where your program is spending most of its time
(if you have some patience – the profiling itself can slow your program down by
an order of magnitude).
Remember that many standard optimization heuristics you may know from other
programming experience may well apply to Python. For example it may be faster
to send output to output devices using larger writes rather than smaller ones in
order to reduce the overhead of kernel system calls. Thus CGI scripts that
write all output in “one shot” may be faster than those that write lots of small
pieces of output.
Also, be sure to use Python’s core features where appropriate. For example,
slicing allows programs to chop up lists and other sequence objects in a single
tick of the interpreter’s mainloop using highly optimized C implementations.
Thus to get the same effect as:
L2=[]foriinrange(3):L2.append(L1[i])
it is much shorter and far faster to use
L2 = list(L1[:3]) # "list" is redundant if L1 is a list.
Note that the functionally-oriented built-in functions such as map(),
zip(), and friends can be a convenient accelerator for loops that
perform a single task. For example to pair the elements of two lists
together:
For example if s1..s7 are large (10K+) strings then
"".join([s1,s2,s3,s4,s5,s6,s7]) may be far faster than the more obvious
s1+s2+s3+s4+s5+s6+s7, since the “summation” will compute many
subexpressions, whereas join() does all the copying in one pass. For
manipulating strings, use the replace() and the format()methods
on string objects. Use regular expressions only when you’re
not dealing with constant string patterns.
Be sure to use the list.sort() built-in method to do sorting, and see the
sorting mini-HOWTO for examples
of moderately advanced usage. list.sort() beats other techniques for
sorting in all but the most extreme circumstances.
Another common trick is to “push loops into functions or methods.” For example
suppose you have a program that runs slowly and you use the profiler to
determine that a Python function ff() is being called lots of times. If you
notice that ff():
def ff(x):
... # do something with x computing result...
return result
tends to be called in loops like:
list=map(ff,oldlist)
or:
for x in sequence:
value = ff(x)
... # do something with value...
then you can often eliminate function call overhead by rewriting ff() to:
def ffseq(seq):
resultseq = []
for x in seq:
... # do something with x computing result...
resultseq.append(result)
return resultseq
and rewrite the two examples to list=ffseq(oldlist) and to:
for value in ffseq(sequence):
... # do something with value...
Single calls to ff(x) translate to ffseq([x])[0] with little penalty.
Of course this technique is not always appropriate and there are other variants
which you can figure out.
You can gain some performance by explicitly storing the results of a function or
method lookup into a local variable. A loop like:
forkeyintoken:dict[key]=dict.get(key,0)+1
resolves dict.get every iteration. If the method isn’t going to change, a
slightly faster implementation is:
dict_get = dict.get # look up the method once
for key in token:
dict[key] = dict_get(key, 0) + 1
Default arguments can be used to determine values once, at compile time instead
of at run time. This can only be done for functions or objects which will not
be changed during program execution, such as replacing
Because this trick uses default arguments for terms which should not be changed,
it should only be used when you are not concerned with presenting a possibly
confusing API to your users.
It can be a surprise to get the UnboundLocalError in previously working
code when it is modified by adding an assignment statement somewhere in
the body of a function.
This is because when you make an assignment to a variable in a scope, that
variable becomes local to that scope and shadows any similarly named variable
in the outer scope. Since the last statement in foo assigns a new value to
x, the compiler recognizes it as a local variable. Consequently when the
earlier print(x) attempts to print the uninitialized local variable and
an error results.
In the example above you can access the outer scope variable by declaring it
global:
This explicit declaration is required in order to remind you that (unlike the
superficially analogous situation with class and instance variables) you are
actually modifying the value of the variable in the outer scope:
>>>print(x)11
You can do a similar thing in a nested scope using the nonlocal
keyword:
In Python, variables that are only referenced inside a function are implicitly
global. If a variable is assigned a new value anywhere within the function’s
body, it’s assumed to be a local. If a variable is ever assigned a new value
inside the function, the variable is implicitly local, and you need to
explicitly declare it as ‘global’.
Though a bit surprising at first, a moment’s consideration explains this. On
one hand, requiring global for assigned variables provides a bar
against unintended side-effects. On the other hand, if global was required
for all global references, you’d be using global all the time. You’d have
to declare as global every reference to a built-in function or to a component of
an imported module. This clutter would defeat the usefulness of the global
declaration for identifying side-effects.
The canonical way to share information across modules within a single program is
to create a special module (often called config or cfg). Just import the config
module in all modules of your application; the module then becomes available as
a global name. Because there is only one instance of each module, any changes
made to the module object get reflected everywhere. For example:
config.py:
x = 0 # Default value of the 'x' configuration setting
mod.py:
importconfigconfig.x=1
main.py:
importconfigimportmodprint(config.x)
Note that using a module is also the basis for implementing the Singleton design
pattern, for the same reason.
In general, don’t use frommodulenameimport*. Doing so clutters the
importer’s namespace. Some people avoid this idiom even with the few modules
that were designed to be imported in this manner. Modules designed in this
manner include tkinter, and threading.
Import modules at the top of a file. Doing so makes it clear what other modules
your code requires and avoids questions of whether the module name is in scope.
Using one import per line makes it easy to add and delete module imports, but
using multiple imports per line uses less screen space.
It’s good practice if you import modules in the following order:
standard library modules – e.g. sys, os, getopt, re
third-party library modules (anything installed in Python’s site-packages
directory) – e.g. mx.DateTime, ZODB, PIL.Image, etc.
locally-developed modules
Never use relative package imports. If you’re writing code that’s in the
package.sub.m1 module and want to import package.sub.m2, do not just
write from.importm2, even though it’s legal. Write frompackage.subimportm2 instead. See PEP 328 for details.
It is sometimes necessary to move imports to a function or class to avoid
problems with circular imports. Gordon McMillan says:
Circular imports are fine where both modules use the “import <module>” form
of import. They fail when the 2nd module wants to grab a name out of the
first (“from module import name”) and the import is at the top level. That’s
because names in the 1st are not yet available, because the first module is
busy importing the 2nd.
In this case, if the second module is only used in one function, then the import
can easily be moved into that function. By the time the import is called, the
first module will have finished initializing, and the second module can do its
import.
It may also be necessary to move imports out of the top level of code if some of
the modules are platform-specific. In that case, it may not even be possible to
import all of the modules at the top of the file. In this case, importing the
correct modules in the corresponding platform-specific code is a good option.
Only move imports into a local scope, such as inside a function definition, if
it’s necessary to solve a problem such as avoiding a circular import or are
trying to reduce the initialization time of a module. This technique is
especially helpful if many of the imports are unnecessary depending on how the
program executes. You may also want to move imports into a function if the
modules are only ever used in that function. Note that loading a module the
first time may be expensive because of the one time initialization of the
module, but loading a module multiple times is virtually free, costing only a
couple of dictionary lookups. Even if the module name has gone out of scope,
the module is probably available in sys.modules.
If only instances of a specific class use a module, then it is reasonable to
import the module in the class’s __init__ method and then assign the module
to an instance variable so that the module is always available (via that
instance variable) during the life of the object. Note that to delay an import
until the class is instantiated, the import must be inside a method. Putting
the import inside the class but outside of any method still causes the import to
occur when the module is initialized.
Collect the arguments using the * and ** specifiers in the function’s
parameter list; this gives you the positional arguments as a tuple and the
keyword arguments as a dictionary. You can then pass these arguments when
calling another function by using * and **:
Remember that arguments are passed by assignment in Python. Since assignment
just creates references to objects, there’s no alias between an argument name in
the caller and callee, and so no call-by-reference per se. You can achieve the
desired effect in a number of ways.
By returning a tuple of the results:
def func2(a, b):
a = 'new-value' # a and b are local names
b = b + 1 # assigned to new objects
return a, b # return new values
x, y = 'old-value', 99
x, y = func2(x, y)
print(x, y) # output: new-value 100
This is almost always the clearest solution.
By using global variables. This isn’t thread-safe, and is not recommended.
By passing a mutable (changeable in-place) object:
You have two choices: you can use nested scopes or you can use callable objects.
For example, suppose you wanted to define linear(a,b) which returns a
function f(x) that computes the value a*x+b. Using nested scopes:
gives a callable object where taxes(10e6)==0.3*10e6+2.
The callable object approach has the disadvantage that it is a bit slower and
results in slightly longer code. However, note that a collection of callables
can share their signature via inheritance:
For an instance x of a user-defined class, dir(x) returns an alphabetized
list of the names containing the instance attributes and methods and attributes
defined by its class.
Generally speaking, it can’t, because objects don’t really have names.
Essentially, assignment always binds a name to a value; The same is true of
def and class statements, but in that case the value is a
callable. Consider the following code:
Arguably the class has a name: even though it is bound to two names and invoked
through the name B the created instance is still reported as an instance of
class A. However, it is impossible to say whether the instance’s name is a or
b, since both names are bound to the same value.
Generally speaking it should not be necessary for your code to “know the names”
of particular values. Unless you are deliberately writing introspective
programs, this is usually an indication that a change of approach might be
beneficial.
In comp.lang.python, Fredrik Lundh once gave an excellent analogy in answer to
this question:
The same way as you get the name of that cat you found on your porch: the cat
(object) itself cannot tell you its name, and it doesn’t really care – so
the only way to find out what it’s called is to ask all your neighbours
(namespaces) if it’s their cat (object)...
....and don’t be surprised if you’ll find that it’s known by many names, or
no name at all!
For versions previous to 2.5 the answer would be ‘No’.
In many cases you can mimic a?b:c with aandborc, but there’s a
flaw: if b is zero (or empty, or None – anything that tests false) then
c will be selected instead. In many cases you can prove by looking at the
code that this can’t happen (e.g. because b is a constant or has a type that
can never be false), but in general this can be a problem.
Tim Peters (who wishes it was Steve Majewski) suggested the following solution:
(aand[b]or[c])[0]. Because [b] is a singleton list it is never
false, so the wrong path is never taken; then applying [0] to the whole
thing gets the b or c that you really wanted. Ugly, but it gets you there
in the rare cases where it is really inconvenient to rewrite your code using
‘if’.
The best course is usually to write a simple if...else statement. Another
solution is to implement the ?: operator as a function:
In most cases you’ll pass b and c directly: q(a,b,c). To avoid evaluating
b or c when they shouldn’t be, encapsulate them within a lambda function, e.g.:
q(a,lambda:b,lambda:c).
It has been asked why Python has no if-then-else expression. There are
several answers: many languages do just fine without one; it can easily lead to
less readable code; no sufficiently “Pythonic” syntax has been discovered; a
search of the standard library found remarkably few places where using an
if-then-else expression would make the code more understandable.
In 2002, PEP 308 was written proposing several possible syntaxes and the
community was asked to vote on the issue. The vote was inconclusive. Most
people liked one of the syntaxes, but also hated other syntaxes; many votes
implied that people preferred no ternary operator rather than having a syntax
they hated.
Yes. Usually this is done by nesting lambda within
lambda. See the following three examples, due to Ulf Bartelt:
fromfunctoolsimportreduce# Primes < 1000print(list(filter(None,map(lambday:y*reduce(lambdax,y:x*y!=0,map(lambdax,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),range(2,1000)))))# First 10 Fibonacci numbersprint(list(map(lambdax,f=lambdax,f:(f(x-1,f)+f(x-2,f))ifx>1else1:f(x,f),range(10))))# Mandelbrot setprint((lambdaRu,Ro,Iu,Io,IM,Sx,Sy:reduce(lambdax,y:x+y,map(lambday,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambdayc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,Sx=Sx,Sy=Sy:reduce(lambdax,y:x+y,map(lambdax,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,i=i,Sx=Sx,F=lambdaxc,yc,x,y,k,f=lambdaxc,yc,x,y,k,f:(k<=0)or(x*x+y*y>=4.0)or1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy))))(-2.1,0.7,-1.2,1.2,30,80,24))# \___ ___/ \___ ___/ | | |__ lines on screen# V V | |______ columns on screen# | | |__________ maximum of "iterations"# | |_________________ range on y axis# |____________________________ range on x axis
To specify an octal digit, precede the octal value with a zero, and then a lower
or uppercase “o”. For example, to set the variable “a” to the octal value “10”
(8 in decimal), type:
>>>a=0o10>>>a8
Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero,
and then a lower or uppercase “x”. Hexadecimal digits can be specified in lower
or uppercase. For example, in the Python interpreter:
It’s primarily driven by the desire that i%j have the same sign as j.
If you want that, and also want:
i==(i// j) * j + (i % j)
then integer division has to return the floor. C also requires that identity to
hold, and then compilers that truncate i//j need to make i%j have
the same sign as i.
There are few real use cases for i%j when j is negative. When j
is positive, there are many, and in virtually all of them it’s more useful for
i%j to be >=0. If the clock says 10 now, what did it say 200 hours
ago? -190%12==2 is useful; -190%12==-10 is a bug waiting to
bite.
For integers, use the built-in int() type constructor, e.g. int('144')==144. Similarly, float() converts to floating-point,
e.g. float('144')==144.0.
By default, these interpret the number as decimal, so that int('0144')==144 and int('0x144') raises ValueError. int(string,base) takes
the base to convert from as a second optional argument, so int('0x144',16)==324. If the base is specified as 0, the number is interpreted using Python’s
rules: a leading ‘0’ indicates octal, and ‘0x’ indicates a hex number.
Do not use the built-in function eval() if all you need is to convert
strings to numbers. eval() will be significantly slower and it presents a
security risk: someone could pass you a Python expression that might have
unwanted side effects. For example, someone could pass
__import__('os').system("rm-rf$HOME") which would erase your home
directory.
eval() also has the effect of interpreting numbers as Python expressions,
so that e.g. eval('09') gives a syntax error because Python does not allow
leading ‘0’ in a decimal number (except ‘0’).
To convert, e.g., the number 144 to the string ‘144’, use the built-in type
constructor str(). If you want a hexadecimal or octal representation, use
the built-in functions hex() or oct(). For fancy formatting, see
the String Formatting section, e.g. "{:04d}".format(144) yields
'0144' and "{:.3f}".format(1/3) yields '0.333'.
The best is to use a dictionary that maps strings to functions. The primary
advantage of this technique is that the strings do not need to match the names
of the functions. This is also the primary technique used to emulate a case
construct:
def a():
pass
def b():
pass
dispatch = {'go': a, 'stop': b} # Note lack of parens for funcs
dispatch[get_input()]() # Note trailing parens to call function
Note: Using eval() is slow and dangerous. If you don’t have absolute
control over the contents of the string, someone could pass a string that
resulted in an arbitrary function being executed.
Starting with Python 2.2, you can use S.rstrip("\r\n") to remove all
occurrences of any line terminator from the end of the string S without
removing other trailing whitespace. If the string S represents more than
one line, with several empty lines at the end, the line terminators for all the
blank lines will be removed:
Since this is typically only desired when reading text one line at a time, using
S.rstrip() this way works well.
For older versions of Python, there are two partial substitutes:
If you want to remove all trailing whitespace, use the rstrip() method of
string objects. This removes all trailing whitespace, not just a single
newline.
Otherwise, if there is only one line in the string S, use
S.splitlines()[0].
For simple input parsing, the easiest approach is usually to split the line into
whitespace-delimited words using the split() method of string objects
and then convert decimal strings to numeric values using int() or
float(). split() supports an optional “sep” parameter which is useful
if the line uses something other than whitespace as a separator.
For more complicated input parsing, regular expressions are more powerful
than C’s sscanf() and better suited for the task.
The type constructor tuple(seq) converts any sequence (actually, any
iterable) into a tuple with the same items in the same order.
For example, tuple([1,2,3]) yields (1,2,3) and tuple('abc')
yields ('a','b','c'). If the argument is a tuple, it does not make a copy
but returns the same object, so it is cheap to call tuple() when you
aren’t sure that an object is already a tuple.
The type constructor list(seq) converts any sequence or iterable into a list
with the same items in the same order. For example, list((1,2,3)) yields
[1,2,3] and list('abc') yields ['a','b','c']. If the argument
is a list, it makes a copy just like seq[:] would.
Python sequences are indexed with positive numbers and negative numbers. For
positive numbers 0 is the first index 1 is the second index and so forth. For
negative indices -1 is the last index and -2 is the penultimate (next to last)
index and so forth. Think of seq[-n] as the same as seq[len(seq)-n].
Using negative indices can be very convenient. For example S[:-1] is all of
the string except for its last character, which is useful for removing the
trailing newline from a string.
Lists are equivalent to C or Pascal arrays in their time complexity; the primary
difference is that a Python list can contain objects of many different types.
The array module also provides methods for creating arrays of fixed types
with compact representations, but they are slower to index than lists. Also
note that the Numeric extensions and others define array-like structures with
various characteristics as well.
To get Lisp-style linked lists, you can emulate cons cells using tuples:
lisp_list=("like",("this",("example",None)))
If mutability is desired, you could use lists instead of tuples. Here the
analogue of lisp car is lisp_list[0] and the analogue of cdr is
lisp_list[1]. Only do this if you’re sure you really need to, because it’s
usually a lot slower than using Python lists.
You probably tried to make a multidimensional array like this:
A=[[None]*2]*3
This looks correct if you print it:
>>>A[[None,None],[None,None],[None,None]]
But when you assign a value, it shows up in multiple places:
>>>A[0][0]=5>>>A[[5,None],[5,None],[5,None]]
The reason is that replicating a list with * doesn’t create copies, it only
creates references to the existing objects. The *3 creates a list
containing 3 references to the same list of length two. Changes to one row will
show in all rows, which is almost certainly not what you want.
The suggested approach is to create a list of the desired length first and then
fill in each element with a newly created list:
A=[None]*3foriinrange(3):A[i]=[None]*2
This generates a list containing 3 different lists of length two. You can also
use a list comprehension:
w,h=2,3A=[[None]*wforiinrange(h)]
Or, you can use an extension that provides a matrix datatype; Numeric Python is the best known.
You can’t. Dictionaries store their keys in an unpredictable order, so the
display order of a dictionary’s elements will be similarly unpredictable.
This can be frustrating if you want to save a printable version to a file, make
some changes and then compare it with some other printed dictionary. In this
case, use the pprint module to pretty-print the dictionary; the items will
be presented in order sorted by the key.
A more complicated solution is to subclass dict to create a
SortedDict class that prints itself in a predictable order. Here’s one
simpleminded implementation of such a class:
This will work for many common situations you might encounter, though it’s far
from a perfect solution. The largest flaw is that if some values in the
dictionary are also dictionaries, their values won’t be presented in any
particular order.
The technique, attributed to Randal Schwartz of the Perl community, sorts the
elements of a list by a metric which maps each element to its “sort value”. In
Python, just use the key argument for the sort() method:
The key argument is new in Python 2.4, for older versions this kind of
sorting is quite simple to do with list comprehensions. To sort a list of
strings by their uppercase values:
tmp1 = [(x.upper(), x) for x in L] # Schwartzian transform
tmp1.sort()
Usorted = [x[1] for x in tmp1]
To sort by the integer value of a subfield extending from positions 10-15 in
each string:
tmp2 = [(int(s[10:15]), s) for s in L] # Schwartzian transform
tmp2.sort()
Isorted = [x[1] for x in tmp2]
For versions prior to 3.0, Isorted may also be computed by
Merge them into an iterator of tuples, sort the resulting list, and then pick
out the element you want.
>>> list1 = ["what", "I'm", "sorting", "by"]
>>> list2 = ["something", "else", "to", "sort"]
>>> pairs = zip(list1, list2)
>>> pairs = sorted(pairs)
>>> pairs
[("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')]
>>> result = [x[1] for x in pairs]
>>> result
['else', 'sort', 'to', 'something']
An alternative for the last step is:
>>>result=[]>>>forpinpairs:result.append(p[1])
If you find this more legible, you might prefer to use this instead of the final
list comprehension. However, it is almost twice as slow for long lists. Why?
First, the append() operation has to reallocate memory, and while it uses
some tricks to avoid doing that each time, it still has to do it occasionally,
and that costs quite a bit. Second, the expression “result.append” requires an
extra attribute lookup, and third, there’s a speed reduction from having to make
all those function calls.
A class is the particular object type created by executing a class statement.
Class objects are used as templates to create instance objects, which embody
both the data (attributes) and code (methods) specific to a datatype.
A class can be based on one or more other classes, called its base class(es). It
then inherits the attributes and methods of its base classes. This allows an
object model to be successively refined by inheritance. You might have a
generic Mailbox class that provides basic accessor methods for a mailbox,
and subclasses such as MboxMailbox, MaildirMailbox, OutlookMailbox
that handle various specific mailbox formats.
Self is merely a conventional name for the first argument of a method. A method
defined as meth(self,a,b,c) should be called as x.meth(a,b,c) for
some instance x of the class in which the definition occurs; the called
method will think it is called as meth(x,a,b,c).
Use the built-in function isinstance(obj,cls). You can check if an object
is an instance of any of a number of classes by providing a tuple instead of a
single class, e.g. isinstance(obj,(class1,class2,...)), and can also
check whether an object is one of Python’s built-in types, e.g.
isinstance(obj,str) or isinstance(obj,(int,float,complex)).
Note that most programs do not use isinstance() on user-defined classes
very often. If you are developing the classes yourself, a more proper
object-oriented style is to define methods on the classes that encapsulate a
particular behaviour, instead of checking the object’s class and doing a
different thing based on what class it is. For example, if you have a function
that does something:
defsearch(obj):ifisinstance(obj,Mailbox):# ... code to search a mailboxelifisinstance(obj,Document):# ... code to search a documentelif...
A better approach is to define a search() method on all the classes and just
call it:
classMailbox:defsearch(self):# ... code to search a mailboxclassDocument:defsearch(self):# ... code to search a documentobj.search()
Delegation is an object oriented technique (also called a design pattern).
Let’s say you have an object x and want to change the behaviour of just one
of its methods. You can create a new class that provides a new implementation
of the method you’re interested in changing and delegates all other methods to
the corresponding method of x.
Python programmers can easily implement delegation. For example, the following
class implements a class that behaves like a file but converts all written data
to uppercase:
Here the UpperOut class redefines the write() method to convert the
argument string to uppercase before calling the underlying
self.__outfile.write() method. All other methods are delegated to the
underlying self.__outfile object. The delegation is accomplished via the
__getattr__ method; consult the language reference
for more information about controlling attribute access.
Note that for more general cases delegation can get trickier. When attributes
must be set as well as retrieved, the class must define a __setattr__()
method too, and it must do so carefully. The basic implementation of
__setattr__() is roughly equivalent to the following:
For version prior to 3.0, you may be using classic classes: For a class
definition such as classDerived(Base):... you can call method meth()
defined in Base (or one of Base‘s base classes) as Base.meth(self,arguments...). Here, Base.meth is an unbound method, so you need to
provide the self argument.
You could define an alias for the base class, assign the real base class to it
before your class definition, and use the alias throughout your class. Then all
you have to change is the value assigned to the alias. Incidentally, this trick
is also handy if you want to decide dynamically (e.g. depending on availability
of resources) which base class to use. Example:
Both static data and static methods (in the sense of C++ or Java) are supported
in Python.
For static data, simply define a class attribute. To assign a new value to the
attribute, you have to explicitly use the class name in the assignment:
class C:
count = 0 # number of times C.__init__ called
def __init__(self):
C.count = C.count + 1
def getcount(self):
return C.count # or return self.count
c.count also refers to C.count for any c such that isinstance(c,C) holds, unless overridden by c itself or by some class on the base-class
search path from c.__class__ back to C.
Caution: within a method of C, an assignment like self.count=42 creates a
new and unrelated instance named “count” in self‘s own dict. Rebinding of a
class-static data name must always specify the class whether inside a method or
not:
C.count=314
Static methods are possible since Python 2.2:
classC:defstatic(arg1,arg2,arg3):# No 'self' parameter!...static=staticmethod(static)
With Python 2.4’s decorators, this can also be written as
class C:
@staticmethod
def static(arg1, arg2, arg3):
# No 'self' parameter!
...
However, a far more straightforward way to get the effect of a static method is
via a simple module-level function:
defgetcount():returnC.count
If your code is structured so as to define one class (or tightly related class
hierarchy) per module, this supplies the desired encapsulation.
Variable names with double leading underscores are “mangled” to provide a simple
but effective way to define class private variables. Any identifier of the form
__spam (at least two leading underscores, at most one trailing underscore)
is textually replaced with _classname__spam, where classname is the
current class name with any leading underscores stripped.
This doesn’t guarantee privacy: an outside user can still deliberately access
the “_classname__spam” attribute, and private values are visible in the object’s
__dict__. Many Python programmers never bother to use private variable
names at all.
The del statement does not necessarily call __del__() – it simply
decrements the object’s reference count, and if this reaches zero
__del__() is called.
If your data structures contain circular links (e.g. a tree where each child has
a parent reference and each parent has a list of children) the reference counts
will never go back to zero. Once in a while Python runs an algorithm to detect
such cycles, but the garbage collector might run some time after the last
reference to your data structure vanishes, so your __del__() method may be
called at an inconvenient and random time. This is inconvenient if you’re trying
to reproduce a problem. Worse, the order in which object’s __del__()
methods are executed is arbitrary. You can run gc.collect() to force a
collection, but there are pathological cases where objects will never be
collected.
Despite the cycle collector, it’s still a good idea to define an explicit
close() method on objects to be called whenever you’re done with them. The
close() method can then remove attributes that refer to subobjecs. Don’t
call __del__() directly – __del__() should call close() and
close() should make sure that it can be called more than once for the same
object.
Another way to avoid cyclical references is to use the weakref module,
which allows you to point to objects without incrementing their reference count.
Tree data structures, for instance, should use weak references for their parent
and sibling references (if they need them!).
Finally, if your __del__() method raises an exception, a warning message
is printed to sys.stderr.
Python does not keep track of all instances of a class (or of a built-in type).
You can program the class’s constructor to keep track of all instances by
keeping a list of weak references to each instance.
When a module is imported for the first time (or when the source is more recent
than the current compiled file) a .pyc file containing the compiled code
should be created in the same directory as the .py file.
One reason that a .pyc file may not be created is permissions problems with
the directory. This can happen, for example, if you develop as one user but run
as another, such as if you are testing with a web server. Creation of a .pyc
file is automatic if you’re importing a module and Python has the ability
(permissions, free space, etc...) to write the compiled module back to the
directory.
Running Python on a top level script is not considered an import and no .pyc
will be created. For example, if you have a top-level module abc.py that
imports another module xyz.py, when you run abc, xyz.pyc will be created
since xyz is imported, but no abc.pyc file will be created since abc.py
isn’t being imported.
If you need to create abc.pyc – that is, to create a .pyc file for a module
that is not imported – you can, using the py_compile and
compileall modules.
The py_compile module can manually compile any module. One way is to use
the compile() function in that module interactively:
This will write the .pyc to the same location as abc.py (or you can
override that with the optional parameter cfile).
You can also automatically compile all files in a directory or directories using
the compileall module. You can do it from the shell prompt by running
compileall.py and providing the path of a directory containing Python files
to compile:
A module can find out its own module name by looking at the predefined global
variable __name__. If this has the value '__main__', the program is
running as a script. Many modules that are usually used by importing them also
provide a command-line interface or a self-test, and only execute this code
after checking __name__:
def main():
print('Running test...')
...
if __name__ == '__main__':
main()
The problem is that the interpreter will perform the following steps:
main imports foo
Empty globals for foo are created
foo is compiled and starts executing
foo imports bar
Empty globals for bar are created
bar is compiled and starts executing
bar imports foo (which is a no-op since there already is a module named foo)
bar.foo_var = foo.foo_var
The last step fails, because Python isn’t done with interpreting foo yet and
the global symbol dictionary for foo is still empty.
The same thing happens when you use importfoo, and then try to access
foo.foo_var in global code.
There are (at least) three possible workarounds for this problem.
Guido van Rossum recommends avoiding all uses of from<module>import...,
and placing all code inside functions. Initializations of global variables and
class variables should use constants or built-in functions only. This means
everything from an imported module is referenced as <module>.<name>.
Jim Roskind suggests performing steps in the following order in each module:
exports (globals, functions, and classes that don’t need imported base
classes)
import statements
active code (including globals that are initialized from imported values).
van Rossum doesn’t like this approach much because the imports appear in a
strange place, but it does work.
Matthias Urlichs recommends restructuring your code so that the recursive import
is not necessary in the first place.
For reasons of efficiency as well as consistency, Python only reads the module
file on the first time a module is imported. If it didn’t, in a program
consisting of many modules where each one imports the same basic module, the
basic module would be parsed and re-parsed many times. To force rereading of a
changed module, do this:
importimpimportmodnameimp.reload(modname)
Warning: this technique is not 100% fool-proof. In particular, modules
containing statements like
frommodnameimportsome_objects
will continue to work with the old version of the imported objects. If the
module contains class definitions, existing class instances will not be
updated to use the new class definition. This can result in the following
paradoxical behaviour:
>>> import imp
>>> import cls
>>> c = cls.C() # Create an instance of C
>>> imp.reload(cls)
<module 'cls' from 'cls.py'>
>>> isinstance(c, cls.C) # isinstance is false?!?
False
The nature of the problem is made clear if you print out the “identity” of the
class objects:
Why does Python use indentation for grouping of statements?¶
Guido van Rossum believes that using indentation for grouping is extremely
elegant and contributes a lot to the clarity of the average Python program.
Most people learn to love this feature after a while.
Since there are no begin/end brackets there cannot be a disagreement between
grouping perceived by the parser and the human reader. Occasionally C
programmers will encounter a fragment of code like this:
if(x<=y)x++;y--;z++;
Only the x++ statement is executed if the condition is true, but the
indentation leads you to believe otherwise. Even experienced C programmers will
sometimes stare at it a long time wondering why y is being decremented even
for x>y.
Because there are no begin/end brackets, Python is much less prone to
coding-style conflicts. In C there are many different ways to place the braces.
If you’re used to reading and writing code that uses one style, you will feel at
least slightly uneasy when reading (or being required to write) another style.
Many coding styles place begin/end brackets on a line by themselves. This makes
programs considerably longer and wastes valuable screen space, making it harder
to get a good overview of a program. Ideally, a function should fit on one
screen (say, 20-30 lines). 20 lines of Python can do a lot more work than 20
lines of C. This is not solely due to the lack of begin/end brackets – the
lack of declarations and the high-level data types are also responsible – but
the indentation-based syntax certainly helps.
Why am I getting strange results with simple arithmetic operations?¶
See the next question.
Why are floating point calculations so inaccurate?¶
People are often very surprised by results like this:
>>>1.2-1.00.199999999999999996
and think it is a bug in Python. It’s not. This has nothing to do with Python,
but with how the underlying C platform handles floating point numbers, and
ultimately with the inaccuracies introduced when writing down numbers as a
string of a fixed number of digits.
The internal representation of floating point numbers uses a fixed number of
binary digits to represent a decimal number. Some decimal numbers can’t be
represented exactly in binary, resulting in small roundoff errors.
In decimal math, there are many numbers that can’t be represented with a fixed
number of decimal digits, e.g. 1/3 = 0.3333333333.......
In base 2, 1/2 = 0.1, 1/4 = 0.01, 1/8 = 0.001, etc. .2 equals 2/10 equals 1/5,
resulting in the binary fractional number 0.001100110011001...
Floating point numbers only have 32 or 64 bits of precision, so the digits are
cut off at some point, and the resulting number is 0.199999999999999996 in
decimal, not 0.2.
A floating point number’s repr() function prints as many digits are
necessary to make eval(repr(f))==f true for any float f. The str()
function prints fewer digits and this often results in the more sensible number
that was probably intended:
>>>1.1-0.90.20000000000000007>>>print(1.1-0.9)0.2
One of the consequences of this is that it is error-prone to compare the result
of some computation to a float with ==. Tiny inaccuracies may mean that
== fails. Instead, you have to check that the difference between the two
numbers is less than a certain threshold:
One is performance: knowing that a string is immutable means we can allocate
space for it at creation time, and the storage requirements are fixed and
unchanging. This is also one of the reasons for the distinction between tuples
and lists.
Another advantage is that strings in Python are considered as “elemental” as
numbers. No amount of activity will change the value 8 to anything else, and in
Python, no amount of activity will change the string “eight” to anything else.
Why must ‘self’ be used explicitly in method definitions and calls?¶
The idea was borrowed from Modula-3. It turns out to be very useful, for a
variety of reasons.
First, it’s more obvious that you are using a method or instance attribute
instead of a local variable. Reading self.x or self.meth() makes it
absolutely clear that an instance variable or method is used even if you don’t
know the class definition by heart. In C++, you can sort of tell by the lack of
a local variable declaration (assuming globals are rare or easily recognizable)
– but in Python, there are no local variable declarations, so you’d have to
look up the class definition to be sure. Some C++ and Java coding standards
call for instance attributes to have an m_ prefix, so this explicitness is
still useful in those languages, too.
Second, it means that no special syntax is necessary if you want to explicitly
reference or call the method from a particular class. In C++, if you want to
use a method from a base class which is overridden in a derived class, you have
to use the :: operator – in Python you can write
baseclass.methodname(self,<argumentlist>). This is particularly useful
for __init__() methods, and in general in cases where a derived class
method wants to extend the base class method of the same name and thus has to
call the base class method somehow.
Finally, for instance variables it solves a syntactic problem with assignment:
since local variables in Python are (by definition!) those variables to which a
value is assigned in a function body (and that aren’t explicitly declared
global), there has to be some way to tell the interpreter that an assignment was
meant to assign to an instance variable instead of to a local variable, and it
should preferably be syntactic (for efficiency reasons). C++ does this through
declarations, but Python doesn’t have declarations and it would be a pity having
to introduce them just for this purpose. Using the explicit self.var solves
this nicely. Similarly, for using instance variables, having to write
self.var means that references to unqualified names inside a method don’t
have to search the instance’s directories. To put it another way, local
variables and instance variables live in two different namespaces, and you need
to tell Python which namespace to use.
Many people used to C or Perl complain that they want to use this C idiom:
while(line=readline(f)){// do something with line}
where in Python you’re forced to write this:
while True:
line = f.readline()
if not line:
break
... # do something with line
The reason for not allowing assignment in Python expressions is a common,
hard-to-find bug in those other languages, caused by this construct:
if(x=0){// error handling}else{// code that only works for nonzero x}
The error is a simple typo: x=0, which assigns 0 to the variable x,
was written while the comparison x==0 is certainly what was intended.
Many alternatives have been proposed. Most are hacks that save some typing but
use arbitrary or cryptic syntax or keywords, and fail the simple criterion for
language change proposals: it should intuitively suggest the proper meaning to a
human reader who has not yet been introduced to the construct.
An interesting phenomenon is that most experienced Python programmers recognize
the whileTrue idiom and don’t seem to be missing the assignment in
expression construct much; it’s only newcomers who express a strong desire to
add this to the language.
There’s an alternative way of spelling this that seems attractive but is
generally less robust than the “while True” solution:
line = f.readline()
while line:
... # do something with line...
line = f.readline()
The problem with this is that if you change your mind about exactly how you get
the next line (e.g. you want to change it into sys.stdin.readline()) you
have to remember to change two places in your program – the second occurrence
is hidden at the bottom of the loop.
The best approach is to use iterators, making it possible to loop through
objects using the for statement. For example, file objects support the iterator protocol, so you can write simply:
for line in f:
... # do something with line...
Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?¶
The major reason is history. Functions were used for those operations that were
generic for a group of types and which were intended to work even for objects
that didn’t have methods at all (e.g. tuples). It is also convenient to have a
function that can readily be applied to an amorphous collection of objects when
you use the functional features of Python (map(), apply() et al).
In fact, implementing len(), max(), min() as a built-in function is
actually less code than implementing them as methods for each type. One can
quibble about individual cases but it’s a part of Python, and it’s too late to
make such fundamental changes now. The functions have to remain to avoid massive
code breakage.
Note
For string operations, Python has moved from external functions (the
string module) to methods. However, len() is still a function.
Why is join() a string method instead of a list or tuple method?¶
Strings became much more like other standard types starting in Python 1.6, when
methods were added which give the same functionality that has always been
available using the functions of the string module. Most of these new methods
have been widely accepted, but the one which appears to make some programmers
feel uncomfortable is:
", ".join(['1', '2', '4', '8', '16'])
which gives the result:
"1, 2, 4, 8, 16"
There are two common arguments against this usage.
The first runs along the lines of: “It looks really ugly using a method of a
string literal (string constant)”, to which the answer is that it might, but a
string literal is just a fixed value. If the methods are to be allowed on names
bound to strings there is no logical reason to make them unavailable on
literals.
The second objection is typically cast as: “I am really telling a sequence to
join its members together with a string constant”. Sadly, you aren’t. For some
reason there seems to be much less difficulty with having split() as
a string method, since in that case it is easy to see that
"1, 2, 4, 8, 16".split(", ")
is an instruction to a string literal to return the substrings delimited by the
given separator (or, by default, arbitrary runs of white space).
join() is a string method because in using it you are telling the
separator string to iterate over a sequence of strings and insert itself between
adjacent elements. This method can be used with any argument which obeys the
rules for sequence objects, including any new classes you might define yourself.
Similar methods exist for bytes and bytearray objects.
A try/except block is extremely efficient. Actually catching an exception is
expensive. In versions of Python prior to 2.0 it was common to use this idiom:
For this specific case, you could also use value=dict.setdefault(key,getvalue(key)), but only if the getvalue() call is cheap enough because it
is evaluated in all cases.
Why isn’t there a switch or case statement in Python?¶
You can do this easily enough with a sequence of if...elif...elif...else.
There have been some proposals for switch statement syntax, but there is no
consensus (yet) on whether and how to do range tests. See PEP 275 for
complete details and the current status.
For cases where you need to choose from a very large number of possibilities,
you can create a dictionary mapping case values to functions to call. For
example:
It’s suggested that you use a prefix for the method names, such as visit_ in
this example. Without such a prefix, if values are coming from an untrusted
source, an attacker would be able to call any method on your object.
Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation?¶
Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for
each Python stack frame. Also, extensions can call back into Python at almost
random moments. Therefore, a complete threads implementation requires thread
support for C.
Answer 2: Fortunately, there is Stackless Python,
which has a completely redesigned interpreter loop that avoids the C stack.
It’s still experimental but looks very promising. Although it is binary
compatible with standard Python, it’s still unclear whether Stackless will make
it into the core – maybe it’s just too revolutionary.
Python lambda forms cannot contain statements because Python’s syntactic
framework can’t handle statements nested inside expressions. However, in
Python, this is not a serious problem. Unlike lambda forms in other languages,
where they add functionality, Python lambdas are only a shorthand notation if
you’re too lazy to define a function.
Functions are already first class objects in Python, and can be declared in a
local scope. Therefore the only advantage of using a lambda form instead of a
locally-defined function is that you don’t need to invent a name for the
function – but that’s just a local variable to which the function object (which
is exactly the same type of object that a lambda form yields) is assigned!
Can Python be compiled to machine code, C or some other language?¶
Not easily. Python’s high level data types, dynamic typing of objects and
run-time invocation of the interpreter (using eval() or exec())
together mean that a “compiled” Python program would probably consist mostly of
calls into the Python run-time system, even for seemingly simple operations like
x+1.
Several projects described in the Python newsgroup or at past Python
conferences have shown that this
approach is feasible, although the speedups reached so far are only modest
(e.g. 2x). Jython uses the same strategy for compiling to Java bytecode. (Jim
Hugunin has demonstrated that in combination with whole-program analysis,
speedups of 1000x are feasible for small demo programs. See the proceedings
from the 1997 Python conference for more information.)
Internally, Python source code is always translated into a bytecode
representation, and this bytecode is then executed by the Python virtual
machine. In order to avoid the overhead of repeatedly parsing and translating
modules that rarely change, this byte code is written into a file whose name
ends in ”.pyc” whenever a module is parsed. When the corresponding .py file is
changed, it is parsed and translated again and the .pyc file is rewritten.
There is no performance difference once the .pyc file has been loaded, as the
bytecode read from the .pyc file is exactly the same as the bytecode created by
direct translation. The only difference is that loading code from a .pyc file
is faster than parsing and translating a .py file, so the presence of
precompiled .pyc files improves the start-up time of Python scripts. If
desired, the Lib/compileall.py module can be used to create valid .pyc files for
a given set of modules.
Note that the main script executed by Python, even if its filename ends in .py,
is not compiled to a .pyc file. It is compiled to bytecode, but the bytecode is
not saved to a file. Usually main scripts are quite short, so this doesn’t cost
much speed.
There are also several programs which make it easier to intermingle Python and C
code in various ways to increase performance. See, for example, Cython, Pyrex and Weave.
The details of Python memory management depend on the implementation. The
standard C implementation of Python uses reference counting to detect
inaccessible objects, and another mechanism to collect reference cycles,
periodically executing a cycle detection algorithm which looks for inaccessible
cycles and deletes the objects involved. The gc module provides functions
to perform a garbage collection, obtain debugging statistics, and tune the
collector’s parameters.
Jython relies on the Java runtime so the JVM’s garbage collector is used. This
difference can cause some subtle porting problems if your Python code depends on
the behavior of the reference counting implementation.
In the absence of circularities, Python programs do not need to manage memory
explicitly.
Why doesn’t Python use a more traditional garbage collection scheme? For one
thing, this is not a C standard feature and hence it’s not portable. (Yes, we
know about the Boehm GC library. It has bits of assembler code for most
common platforms, not for all of them, and although it is mostly transparent, it
isn’t completely transparent; patches are required to get Python to work with
it.)
Traditional GC also becomes a problem when Python is embedded into other
applications. While in a standalone Python it’s fine to replace the standard
malloc() and free() with versions provided by the GC library, an application
embedding Python may want to have its own substitute for malloc() and free(),
and may not want Python’s. Right now, Python works with anything that
implements malloc() and free() properly.
In Jython, the following code (which is fine in CPython) will probably run out
of file descriptors long before it runs out of memory:
Using the current reference counting and destructor scheme, each new assignment
to f closes the previous file. Using GC, this is not guaranteed. If you want
to write code that will work with any Python implementation, you should
explicitly close the file or use the with statement; this will work
regardless of GC:
Objects referenced from the global namespaces of Python modules are not always
deallocated when Python exits. This may happen if there are circular
references. There are also certain bits of memory that are allocated by the C
library that are impossible to free (e.g. a tool like Purify will complain about
these). Python is, however, aggressive about cleaning up memory on exit and
does try to destroy every single object.
If you want to force Python to delete certain things on deallocation use the
atexit module to run a function that will force those deletions.
Why are there separate tuple and list data types?¶
Lists and tuples, while similar in many respects, are generally used in
fundamentally different ways. Tuples can be thought of as being similar to
Pascal records or C structs; they’re small collections of related data which may
be of different types which are operated on as a group. For example, a
Cartesian coordinate is appropriately represented as a tuple of two or three
numbers.
Lists, on the other hand, are more like arrays in other languages. They tend to
hold a varying number of objects all of which have the same type and which are
operated on one-by-one. For example, os.listdir('.') returns a list of
strings representing the files in the current directory. Functions which
operate on this output would generally not break if you added another file or
two to the directory.
Tuples are immutable, meaning that once a tuple has been created, you can’t
replace any of its elements with a new value. Lists are mutable, meaning that
you can always change a list’s elements. Only immutable elements can be used as
dictionary keys, and hence only tuples and not lists can be used as keys.
Python’s lists are really variable-length arrays, not Lisp-style linked lists.
The implementation uses a contiguous array of references to other objects, and
keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of
the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some
cleverness is applied to improve the performance of appending items repeatedly;
when the array must be grown, some extra space is allocated so the next few
times don’t require an actual resize.
Python’s dictionaries are implemented as resizable hash tables. Compared to
B-trees, this gives better performance for lookup (the most common operation by
far) under most circumstances, and the implementation is simpler.
Dictionaries work by computing a hash code for each key stored in the dictionary
using the hash() built-in function. The hash code varies widely depending
on the key; for example, “Python” hashes to -539294296 while “python”, a string
that differs by a single bit, hashes to 1142331976. The hash code is then used
to calculate a location in an internal array where the value will be stored.
Assuming that you’re storing keys that all have different hash values, this
means that dictionaries take constant time – O(1), in computer science notation
– to retrieve a key. It also means that no sorted order of the keys is
maintained, and traversing the array as the .keys() and .items() do will
output the dictionary’s content in some arbitrary jumbled order.
The hash table implementation of dictionaries uses a hash value calculated from
the key value to find the key. If the key were a mutable object, its value
could change, and thus its hash could also change. But since whoever changes
the key object can’t tell that it was being used as a dictionary key, it can’t
move the entry around in the dictionary. Then, when you try to look up the same
object in the dictionary it won’t be found because its hash value is different.
If you tried to look up the old value it wouldn’t be found either, because the
value of the object found in that hash bin would be different.
If you want a dictionary indexed with a list, simply convert the list to a tuple
first; the function tuple(L) creates a tuple with the same entries as the
list L. Tuples are immutable and can therefore be used as dictionary keys.
Some unacceptable solutions that have been proposed:
Hash lists by their address (object ID). This doesn’t work because if you
construct a new list with the same value it won’t be found; e.g.:
mydict = {[1, 2]: '12'}
print(mydict[[1, 2]])
would raise a KeyError exception because the id of the [1,2] used in the
second line differs from that in the first line. In other words, dictionary
keys should be compared using ==, not using is.
Make a copy when using a list as a key. This doesn’t work because the list,
being a mutable object, could contain a reference to itself, and then the
copying code would run into an infinite loop.
Allow lists as keys but tell the user not to modify them. This would allow a
class of hard-to-track bugs in programs when you forgot or modified a list by
accident. It also invalidates an important invariant of dictionaries: every
value in d.keys() is usable as a key of the dictionary.
Mark lists as read-only once they are used as a dictionary key. The problem
is that it’s not just the top-level object that could change its value; you
could use a tuple containing a list as a key. Entering anything as a key into
a dictionary would require marking all objects reachable from there as
read-only – and again, self-referential objects could cause an infinite loop.
There is a trick to get around this if you need to, but use it at your own risk:
You can wrap a mutable structure inside a class instance which has both a
__eq__() and a __hash__() method. You must then make sure that the
hash value for all such wrapper objects that reside in a dictionary (or other
hash based structure), remain fixed while the object is in the dictionary (or
other structure).
Note that the hash computation is complicated by the possibility that some
members of the list may be unhashable and also by the possibility of arithmetic
overflow.
Furthermore it must always be the case that if o1==o2 (ie o1.__eq__(o2)isTrue) then hash(o1)==hash(o2) (ie, o1.__hash__()==o2.__hash__()),
regardless of whether the object is in a dictionary or not. If you fail to meet
these restrictions dictionaries and other hash based structures will misbehave.
In the case of ListWrapper, whenever the wrapper object is in a dictionary the
wrapped list must not change to avoid anomalies. Don’t do this unless you are
prepared to think hard about the requirements and the consequences of not
meeting them correctly. Consider yourself warned.
In situations where performance matters, making a copy of the list just to sort
it would be wasteful. Therefore, list.sort() sorts the list in place. In
order to remind you of that fact, it does not return the sorted list. This way,
you won’t be fooled into accidentally overwriting a list when you need a sorted
copy but also need to keep the unsorted version around.
In Python 2.4 a new built-in function – sorted() – has been added.
This function creates a new list from a provided iterable, sorts it and returns
it. For example, here’s how to iterate over the keys of a dictionary in sorted
order:
for key in sorted(mydict):
... # do whatever with mydict[key]...
How do you specify and enforce an interface spec in Python?¶
An interface specification for a module as provided by languages such as C++ and
Java describes the prototypes for the methods and functions of the module. Many
feel that compile-time enforcement of interface specifications helps in the
construction of large programs.
Python 2.6 adds an abc module that lets you define Abstract Base Classes
(ABCs). You can then use isinstance() and issubclass() to check
whether an instance or a class implements a particular ABC. The
collections module defines a set of useful ABCs such as
Iterable, Container, and MutableMapping.
For Python, many of the advantages of interface specifications can be obtained
by an appropriate test discipline for components. There is also a tool,
PyChecker, which can be used to find problems due to subclassing.
A good test suite for a module can both provide a regression test and serve as a
module interface specification and a set of examples. Many Python modules can
be run as a script to provide a simple “self test.” Even modules which use
complex external interfaces can often be tested in isolation using trivial
“stub” emulations of the external interface. The doctest and
unittest modules or third-party test frameworks can be used to construct
exhaustive test suites that exercise every line of code in a module.
An appropriate testing discipline can help build large complex applications in
Python as well as having interface specifications would. In fact, it can be
better because an interface specification cannot test certain properties of a
program. For example, the append() method is expected to add new elements
to the end of some internal list; an interface specification cannot test that
your append() implementation will actually do this correctly, but it’s
trivial to check this property in a test suite.
Writing test suites is very helpful, and you might want to design your code with
an eye to making it easily tested. One increasingly popular technique,
test-directed development, calls for writing parts of the test suite first,
before you write any of the actual code. Of course Python allows you to be
sloppy and not write test cases at all.
This type of bug commonly bites neophyte programmers. Consider this function:
def foo(mydict={}): # Danger: shared reference to one dict for all calls
... compute something ...
mydict[key] = value
return mydict
The first time you call this function, mydict contains a single item. The
second time, mydict contains two items because when foo() begins
executing, mydict starts out with an item already in it.
It is often expected that a function call creates new objects for default
values. This is not what happens. Default values are created exactly once, when
the function is defined. If that object is changed, like the dictionary in this
example, subsequent calls to the function will refer to this changed object.
By definition, immutable objects such as numbers, strings, tuples, and None,
are safe from change. Changes to mutable objects such as dictionaries, lists,
and class instances can lead to confusion.
Because of this feature, it is good programming practice to not use mutable
objects as default values. Instead, use None as the default value and
inside the function, check if the parameter is None and create a new
list/dictionary/whatever if it is. For example, don’t write:
deffoo(mydict={}):...
but:
def foo(mydict=None):
if mydict is None:
mydict = {} # create a new dict for local namespace
This feature can be useful. When you have a function that’s time-consuming to
compute, a common technique is to cache the parameters and the resulting value
of each call to the function, and return the cached value if the same value is
requested again. This is called “memoizing”, and can be implemented like this:
# Callers will never provide a third parameter for this function.
def expensive (arg1, arg2, _cache={}):
if (arg1, arg2) in _cache:
return _cache[(arg1, arg2)]
# Calculate the value
result = ... expensive computation ...
_cache[(arg1, arg2)] = result # Store result in the cache
return result
You could use a global variable containing a dictionary instead of the default
value; it’s a matter of taste.
You can use exceptions to provide a “structured goto” that even works across
function calls. Many feel that exceptions can conveniently emulate all
reasonable uses of the “go” or “goto” constructs of C, Fortran, and other
languages. For example:
class label: pass # declare a label
try:
...
if (condition): raise label() # goto label
...
except label: # where to goto
pass
...
This doesn’t allow you to jump into the middle of a loop, but that’s usually
considered an abuse of goto anyway. Use sparingly.
Why can’t raw strings (r-strings) end with a backslash?¶
More precisely, they can’t end with an odd number of backslashes: the unpaired
backslash at the end escapes the closing quote character, leaving an
unterminated string.
Raw strings were designed to ease creating input for processors (chiefly regular
expression engines) that want to do their own backslash escape processing. Such
processors consider an unmatched trailing backslash to be an error anyway, so
raw strings disallow that. In return, they allow you to pass on the string
quote character by escaping it with a backslash. These rules work well when
r-strings are used for their intended purpose.
If you’re trying to build Windows pathnames, note that all Windows system calls
accept forward slashes too:
f = open("/mydir/file.txt") # works fine!
If you’re trying to build a pathname for a DOS command, try e.g. one of
Why doesn’t Python have a “with” statement for attribute assignments?¶
Python has a ‘with’ statement that wraps the execution of a block, calling code
on the entrance and exit from the block. Some language have a construct that
looks like this:
with obj:
a = 1 # equivalent to obj.a = 1
total = total + 1 # obj.total = obj.total + 1
In Python, such a construct would be ambiguous.
Other languages, such as Object Pascal, Delphi, and C++, use static types, so
it’s possible to know, in an unambiguous way, what member is being assigned
to. This is the main point of static typing – the compiler always knows the
scope of every variable at compile time.
Python uses dynamic types. It is impossible to know in advance which attribute
will be referenced at runtime. Member attributes may be added or removed from
objects on the fly. This makes it impossible to know, from a simple reading,
what attribute is being referenced: a local one, a global one, or a member
attribute?
For instance, take the following incomplete snippet:
deffoo(a):witha:print(x)
The snippet assumes that “a” must have a member attribute called “x”. However,
there is nothing in Python that tells the interpreter this. What should happen
if “a” is, let us say, an integer? If there is a global variable named “x”,
will it be used inside the with block? As you see, the dynamic nature of Python
makes such choices much harder.
The primary benefit of “with” and similar language features (reduction of code
volume) can, however, easily be achieved in Python by assignment. Instead of:
This also has the side-effect of increasing execution speed because name
bindings are resolved at run-time in Python, and the second version only needs
to perform the resolution once.
Why are colons required for the if/while/def/class statements?¶
The colon is required primarily to enhance readability (one of the results of
the experimental ABC language). Consider this:
ifa==bprint(a)
versus
ifa==b:print(a)
Notice how the second one is slightly easier to read. Notice further how a
colon sets off the example in this FAQ answer; it’s a standard usage in English.
Another minor reason is that the colon makes it easier for editors with syntax
highlighting; they can look for colons to decide when indentation needs to be
increased instead of having to do a more elaborate parsing of the program text.
Why does Python allow commas at the end of lists and tuples?¶
Python lets you add a trailing comma at the end of lists, tuples, and
dictionaries:
[1, 2, 3,]
('a', 'b', 'c',)
d = {
"A": [1, 5],
"B": [6, 7], # last trailing comma is optional but good style
}
There are several reasons to allow this.
When you have a literal value for a list, tuple, or dictionary spread across
multiple lines, it’s easier to add more elements because you don’t have to
remember to add a comma to the previous line. The lines can also be sorted in
your editor without creating a syntax error.
Accidentally omitting the comma can lead to errors that are hard to diagnose.
For example:
x=["fee","fie""foo","fum"]
This list looks like it has four elements, but it actually contains three:
“fee”, “fiefoo” and “fum”. Always adding the comma avoids this source of error.
Allowing the trailing comma may also make programmatic code generation easier.
Check the Library Reference to see if there’s a relevant
standard library module. (Eventually you’ll learn what’s in the standard
library and will able to skip this step.)
For third-party packages, search the Python Package Index or try Google or
another Web search engine. Searching for “Python” plus a keyword or two for
your topic of interest will usually find something helpful.
If you can’t find a source file for a module it may be a built-in or
dynamically loaded module implemented in C, C++ or other compiled language.
In this case you may not have the source file or it may be something like
mathmodule.c, somewhere in a C source directory (not on the Python Path).
There are (at least) three kinds of modules in Python:
modules written in Python (.py);
modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc);
modules written in C and linked with the interpreter; to get a list of these,
type:
You need to do two things: the script file’s mode must be executable and the
first line must begin with #! followed by the path of the Python
interpreter.
The first is done by executing chmod+xscriptfile or perhaps chmod755scriptfile.
The second can be done in a number of ways. The most straightforward way is to
write
#!/usr/local/bin/python
as the very first line of your file, using the pathname for where the Python
interpreter is installed on your platform.
If you would like the script to be independent of where the Python interpreter
lives, you can use the “env” program. Almost all Unix variants support the
following, assuming the Python interpreter is in a directory on the user’s
$PATH:
#!/usr/bin/env python
Don’t do this for CGI scripts. The $PATH variable for CGI scripts is often
very minimal, so you need to use the actual absolute pathname of the
interpreter.
Occasionally, a user’s environment is so full that the /usr/bin/env program
fails; or there’s no env program at all. In that case, you can try the
following hack (due to Alex Rezinsky):
#! /bin/sh
""":"
exec python $0 ${1+"$@"}
"""
The minor disadvantage is that this defines the script’s __doc__ string.
However, you can fix that by adding
For Unix variants: The standard Python source distribution comes with a curses
module in the Modules/ subdirectory, though it’s not compiled by default
(note that this is not available in the Windows distribution – there is no
curses module for Windows).
The curses module supports basic curses features as well as many additional
functions from ncurses and SYSV curses such as colour, alternative character set
support, pads, and mouse support. This means the module isn’t compatible with
operating systems that only have BSD curses, but there don’t seem to be any
currently maintained OSes that fall into this category.
Python comes with two testing frameworks. The doctest module finds
examples in the docstrings for a module and runs them, comparing the output with
the expected output given in the docstring.
The unittest module is a fancier testing framework modelled on Java and
Smalltalk testing frameworks.
For testing, it helps to write the program so that it may be easily tested by
using good modular design. Your program should have almost all functionality
encapsulated in either functions or class methods – and this sometimes has the
surprising and delightful effect of making the program run faster (because local
variable accesses are faster than global accesses). Furthermore the program
should avoid depending on mutating global variables, since this makes testing
much more difficult to do.
The “global main logic” of your program may be as simple as
if__name__=="__main__":main_logic()
at the bottom of the main module of your program.
Once your program is organized as a tractable collection of functions and class
behaviours you should write test functions that exercise the behaviours. A test
suite can be associated with each module which automates a sequence of tests.
This sounds like a lot of work, but since Python is so terse and flexible it’s
surprisingly easy. You can make coding much more pleasant and fun by writing
your test functions in parallel with the “production code”, since this makes it
easy to find bugs and even design flaws earlier.
“Support modules” that are not intended to be the main module of a program may
include a self-test of the module.
if__name__=="__main__":self_test()
Even programs that interact with complex external interfaces may be tested when
the external interfaces are unavailable by using “fake” interfaces implemented
in Python.
The pydoc module can create HTML from the doc strings in your Python
source code. An alternative for creating API documentation purely from
docstrings is epydoc. Sphinx can also include docstring content.
Be sure to use the threading module and not the _thread module.
The threading module builds convenient abstractions on top of the
low-level primitives provided by the _thread module.
As soon as the main thread exits, all threads are killed. Your main thread is
running too quickly, giving the threads no time to do any work.
A simple fix is to add a sleep to the end of the program that’s long enough for
all the threads to finish:
import threading, time
def thread_task(name, n):
for i in range(n): print(name, i)
for i in range(10):
T = threading.Thread(target=thread_task, args=(str(i), i))
T.start()
time.sleep(10) # <---------------------------!
But now (on many platforms) the threads don’t run in parallel, but appear to run
sequentially, one at a time! The reason is that the OS thread scheduler doesn’t
start a new thread until the previous thread is blocked.
A simple fix is to add a tiny sleep to the start of the run function:
def thread_task(name, n):
time.sleep(0.001) # <--------------------!
for i in range(n): print(name, i)
for i in range(10):
T = threading.Thread(target=thread_task, args=(str(i), i))
T.start()
time.sleep(10)
Instead of trying to guess how long a time.sleep() delay will be enough,
it’s better to use some kind of semaphore mechanism. One idea is to use the
queue module to create a queue object, let each thread append a token to
the queue when it finishes, and let the main thread read as many tokens from the
queue as there are threads.
Or, if you want fine control over the dispatching algorithm, you can write
your own logic manually. Use the queue module to create a queue
containing a list of jobs. The Queue class maintains a
list of objects with .put(obj) to add an item to the queue and .get()
to return an item. The class will take care of the locking necessary to
ensure that each job is handed out exactly once.
Here’s a trivial example:
import threading, queue, time
# The worker thread gets jobs off the queue. When the queue is empty, it
# assumes there will be no more work and exits.
# (Realistically workers will run until terminated.)
def worker ():
print('Running worker')
time.sleep(0.1)
while True:
try:
arg = q.get(block=False)
except queue.Empty:
print('Worker', threading.currentThread(), end=' ')
print('queue empty')
break
else:
print('Worker', threading.currentThread(), end=' ')
print('running with argument', arg)
time.sleep(0.5)
# Create queue
q = queue.Queue()
# Start a pool of 5 workers
for i in range(5):
t = threading.Thread(target=worker, name='worker %i' % (i+1))
t.start()
# Begin adding work to the queue
for i in range(50):
q.put(i)
# Give threads time to run
print('Main thread sleeping')
time.sleep(5)
A global interpreter lock (GIL) is used internally to ensure that only one
thread runs in the Python VM at a time. In general, Python offers to switch
among threads only between bytecode instructions; how frequently it switches can
be set via sys.setswitchinterval(). Each bytecode instruction and
therefore all the C implementation code reached from each instruction is
therefore atomic from the point of view of a Python program.
In theory, this means an exact accounting requires an exact understanding of the
PVM bytecode implementation. In practice, it means that operations on shared
variables of built-in data types (ints, lists, dicts, etc) that “look atomic”
really are.
For example, the following operations are all atomic (L, L1, L2 are lists, D,
D1, D2 are dicts, x, y are objects, i, j are ints):
Operations that replace other objects may invoke those other objects’
__del__() method when their reference count reaches zero, and that can
affect things. This is especially true for the mass updates to dictionaries and
lists. When in doubt, use a mutex!
The global interpreter lock (GIL) is often seen as a hindrance to Python’s
deployment on high-end multiprocessor server machines, because a multi-threaded
Python program effectively only uses one CPU, due to the insistence that
(almost) all Python code can only run while the GIL is held.
Back in the days of Python 1.5, Greg Stein actually implemented a comprehensive
patch set (the “free threading” patches) that removed the GIL and replaced it
with fine-grained locking. Adam Olsen recently did a similar experiment
in his python-safethread
project. Unfortunately, both experiments exhibited a sharp drop in single-thread
performance (at least 30% slower), due to the amount of fine-grained locking
necessary to compensate for the removal of the GIL.
This doesn’t mean that you can’t make good use of Python on multi-CPU machines!
You just have to be creative with dividing the work up between multiple
processes rather than multiple threads. The
ProcessPoolExecutor class in the new
concurrent.futures module provides an easy way of doing so; the
multiprocessing module provides a lower-level API in case you want
more control over dispatching of tasks.
Judicious use of C extensions will also help; if you use a C extension to
perform a time-consuming task, the extension can release the GIL while the
thread of execution is in the C code and allow other threads to get some work
done. Some standard library modules such as zlib and hashlib
already do this.
It has been suggested that the GIL should be a per-interpreter-state lock rather
than truly global; interpreters then wouldn’t be able to share objects.
Unfortunately, this isn’t likely to happen either. It would be a tremendous
amount of work, because many object implementations currently have global state.
For example, small integers and short strings are cached; these caches would
have to be moved to the interpreter state. Other object types have their own
free list; these free lists would have to be moved to the interpreter state.
And so on.
And I doubt that it can even be done in finite time, because the same problem
exists for 3rd party extensions. It is likely that 3rd party extensions are
being written at a faster rate than you can convert them to store all their
global state in the interpreter state.
And finally, once you have multiple interpreters not sharing any state, what
have you gained over running each interpreter in a separate process?
Use os.remove(filename) or os.unlink(filename); for documentation, see
the os module. The two functions are identical; unlink() is simply
the name of the Unix system call for this function.
To remove a directory, use os.rmdir(); use os.mkdir() to create one.
os.makedirs(path) will create any intermediate directories in path that
don’t exist. os.removedirs(path) will remove intermediate directories as
long as they’re empty; if you want to delete an entire directory tree and its
contents, use shutil.rmtree().
To rename a file, use os.rename(old_path,new_path).
To truncate a file, open it using f=open(filename,"rb+"), and use
f.truncate(offset); offset defaults to the current seek position. There’s
also os.ftruncate(fd,offset) for files opened with os.open(), where
fd is the file descriptor (a small integer).
To read or write complex binary data formats, it’s best to use the struct
module. It allows you to take a string containing binary data (usually numbers)
and convert it to Python objects; and vice versa.
For example, the following code reads two 2-byte integers and one 4-byte integer
in big-endian format from a file:
The ‘>’ in the format string forces big-endian data; the letter ‘h’ reads one
“short integer” (2 bytes), and ‘l’ reads one “long integer” (4 bytes) from the
string.
For data that is more regular (e.g. a homogeneous list of ints or thefloats),
you can also use the array module.
Note
To read and write binary data, it is mandatory to open the file in
binary mode (here, passing "rb" to open()). If you use
"r" instead (the default), the file will be open in text mode
and f.read() will return str objects rather than
bytes objects.
os.read() is a low-level function which takes a file descriptor, a small
integer representing the opened file. os.popen() creates a high-level
file object, the same type returned by the built-in open() function.
Thus, to read n bytes from a pipe p created with os.popen(), you need to
use p.read(n).
Python file objects are a high-level layer of
abstraction on low-level C file descriptors.
For most file objects you create in Python via the built-in open()
function, f.close() marks the Python file object as being closed from
Python’s point of view, and also arranges to close the underlying C file
descriptor. This also happens automatically in f‘s destructor, when
f becomes garbage.
But stdin, stdout and stderr are treated specially by Python, because of the
special status also given to them by C. Running sys.stdout.close() marks
the Python-level file object as being closed, but does not close the
associated C file descriptor.
To close the underlying C file descriptor for one of these three, you should
first be sure that’s what you really want to do (e.g., you may confuse
extension modules trying to do I/O). If it is, use os.close():
I would like to retrieve web pages that are the result of POSTing a form. Is
there existing code that would let me do this easily?
Yes. Here’s a simple example that uses urllib.request:
#!/usr/local/bin/python
import urllib.request
### build the query string
qs = "First=Josephine&MI=Q&Last=Public"
### connect and send the server a path
req = urllib.request.urlopen('http://www.some-server.out-there'
'/cgi-bin/some-cgi-script', data=qs)
msg, hdrs = req.read(), req.info()
Note that in general for percent-encoded POST operations, query strings must be
quoted using urllib.parse.urlencode(). For example to send name=”Guy Steele,
Jr.”:
HTMLgen is a class library of objects corresponding to all the HTML 3.2 markup
tags. It’s used when you are writing in Python and wish to synthesize HTML
pages for generating a web or for CGI forms, etc.
DocumentTemplate and Zope Page Templates are two different systems that are
part of Zope.
Quixote’s PTL uses Python syntax to assemble strings of text.
Here’s a very simple interactive mail sender that uses it. This method will
work on any host that supports an SMTP listener.
import sys, smtplib
fromaddr = input("From: ")
toaddrs = input("To: ").split(',')
print("Enter message, end with ^D:")
msg = ''
while True:
line = sys.stdin.readline()
if not line:
break
msg += line
# The actual mail send
server = smtplib.SMTP('localhost')
server.sendmail(fromaddr, toaddrs, msg)
server.quit()
A Unix-only alternative uses sendmail. The location of the sendmail program
varies between systems; sometimes it is /usr/lib/sendmail, sometime
/usr/sbin/sendmail. The sendmail manual page will help you out. Here’s
some sample code:
SENDMAIL = "/usr/sbin/sendmail" # sendmail location
import os
p = os.popen("%s -t -i" % SENDMAIL, "w")
p.write("To: receiver@example.com\n")
p.write("Subject: test\n")
p.write("\n") # blank line separating headers from body
p.write("Some text\n")
p.write("some more text\n")
sts = p.close()
if sts != 0:
print("Sendmail exit status", sts)
The select module is commonly used to help with asynchronous I/O on
sockets.
To prevent the TCP connect from blocking, you can set the socket to non-blocking
mode. Then when you do the connect(), you will either connect immediately
(unlikely) or get an exception that contains the error number as .errno.
errno.EINPROGRESS indicates that the connection is in progress, but hasn’t
finished yet. Different OSes will return different values, so you’re going to
have to check what’s returned on your system.
You can use the connect_ex() method to avoid creating an exception. It will
just return the errno value. To poll, you can call connect_ex() again later
– 0 or errno.EISCONN indicate that you’re connected – or you can pass this
socket to select to check if it’s writable.
Note
The asyncore module presents a framework-like approach to the problem
of writing non-blocking networking code.
The third-party Twisted library is
a popular and feature-rich alternative.
Interfaces to disk-based hashes such as DBM and GDBM are also included with standard Python. There is also the
sqlite3 module, which provides a lightweight disk-based relational
database.
The pickle library module solves this in a very general way (though you
still can’t store things like open files, sockets or windows), and the
shelve library module uses pickle and (g)dbm to create persistent
mappings containing arbitrary Python objects.
A more awkward way of doing things is to use pickle’s little sister, marshal.
The marshal module provides very fast ways to store noncircular basic
Python types to files and strings, and back again. Although marshal does not do
fancy things like store instances or handle shared references properly, it does
run extremely fast. For example loading a half megabyte of data may take less
than a third of a second. This often beats doing something more complex and
general such as using gdbm with pickle/shelve.
The bsddb module is now available as a standalone package pybsddb.
Databases opened for write access with the bsddb module (and often by the anydbm
module, since it will preferentially use bsddb) must explicitly be closed using
the .close() method of the database. The underlying library caches database
contents which need to be converted to on-disk form and written.
If you have initialized a new bsddb database but not written anything to it
before the program crashes, you will often wind up with a zero-length file and
encounter an exception the next time the file is opened.
The bsddb module is now available as a standalone package pybsddb.
Don’t panic! Your data is probably intact. The most frequent cause for the error
is that you tried to open an earlier Berkeley DB file with a later version of
the Berkeley DB library.
Many Linux systems now have all three versions of Berkeley DB available. If you
are migrating from version 1 to a newer version use db_dump185 to dump a plain
text version of the database. If you are migrating from version 2 to version 3
use db2_dump to create a plain text version of the database. In either case,
use db_load to create a new native database for the latest version installed on
your computer. If you have version 3 of Berkeley DB installed, you should be
able to use db2_load to create a native version 2 database.
You should move away from Berkeley DB version 1 files because the hash file code
contains known bugs that can corrupt your data.
Yes, you can create built-in modules containing functions, variables, exceptions
and even new types in C. This is explained in the document
扩展和嵌入 Python 解释器.
Most intermediate or advanced Python books will also cover this topic.
Yes, using the C compatibility features found in C++. Place extern"C"{...} around the Python include files and put extern"C" before each
function that is going to be called by the Python interpreter. Global or static
C++ objects with constructors are probably not a good idea.
There are a number of alternatives to writing your own C extensions, depending
on what you’re trying to do.
If you need more speed, Psyco generates x86
assembly code from Python bytecode. You can use Psyco to compile the most
time-critical functions in your code, and gain a significant improvement with
very little effort, as long as you’re running on a machine with an
x86-compatible processor.
Cython and its relative Pyrex are compilers
that accept a slightly modified form of Python and generate the corresponding
C code. Cython and Pyrex make it possible to write an extension without having
to learn Python’s C API.
If you need to interface to some C or C++ library for which no Python extension
currently exists, you can try wrapping the library’s data types and functions
with a tool such as SWIG. SIP, CXXBoost, or Weave are also alternatives for wrapping
C++ libraries.
The highest-level function to do this is PyRun_SimpleString() which takes
a single string argument to be executed in the context of the module
__main__ and returns 0 for success and -1 when an exception occurred
(including SyntaxError). If you want more control, use
PyRun_String(); see the source for PyRun_SimpleString() in
Python/pythonrun.c.
Call the function PyRun_String() from the previous question with the
start symbol Py_eval_input; it parses an expression, evaluates it and
returns its value.
That depends on the object’s type. If it’s a tuple, PyTuple_Size()
returns its length and PyTuple_GetItem() returns the item at a specified
index. Lists have similar functions, PyListSize() and
PyList_GetItem().
For strings, PyString_Size() returns its length and
PyString_AsString() a pointer to its value. Note that Python strings may
contain null bytes so C’s strlen() should not be used.
To test the type of an object, first make sure it isn’t NULL, and then use
PyString_Check(), PyTuple_Check(), PyList_Check(), etc.
There is also a high-level API to Python objects which is provided by the
so-called ‘abstract’ interface – read Include/abstract.h for further
details. It allows interfacing with any kind of Python sequence using calls
like PySequence_Length(), PySequence_GetItem(), etc.) as well as
many other useful protocols.
You can’t. Use t=PyTuple_New(n) instead, and fill it with objects using
PyTuple_SetItem(t,i,o) – note that this “eats” a reference count of
o, so you have to Py_INCREF() it. Lists have similar functions
PyList_New(n) and PyList_SetItem(l,i,o). Note that you must set all
the tuple items to some value before you pass the tuple to Python code –
PyTuple_New(n) initializes them to NULL, which isn’t a valid Python value.
The PyObject_CallMethod() function can be used to call an arbitrary
method of an object. The parameters are the object, the name of the method to
call, a format string like that used with Py_BuildValue(), and the
argument values:
Note that since PyObject_CallObject()always wants a tuple for the
argument list, to call a function without arguments, pass “()” for the format,
and to call a function with one argument, surround the argument in parentheses,
e.g. “(i)”.
In Python code, define an object that supports the write() method. Assign
this object to sys.stdout and sys.stderr. Call print_error, or
just allow the standard traceback mechanism to work. Then, the output will go
wherever your write() method sends it.
The easiest way to do this is to use the StringIO class in the standard library.
You can get a pointer to the module object as follows:
module=PyImport_ImportModule("<modulename>");
If the module hasn’t been imported yet (i.e. it is not yet present in
sys.modules), this initializes the module; otherwise it simply returns
the value of sys.modules["<modulename>"]. Note that it doesn’t enter the
module into any namespace – it only ensures it has been initialized and is
stored in sys.modules.
You can then access the module’s attributes (i.e. any name defined in the
module) as follows:
Depending on your requirements, there are many approaches. To do this manually,
begin by reading the “Extending and Embedding” document. Realize that for the Python run-time system, there isn’t a
whole lot of difference between C and C++ – so the strategy of building a new
Python type around a C structure (pointer) type will also work for C++ objects.
Setup must end in a newline, if there is no newline there, the build process
fails. (Fixing this requires some ugly shell script hackery, and this bug is so
minor that it doesn’t seem worth the effort.)
When using GDB with dynamically loaded extensions, you can’t set a breakpoint in
your extension until your extension is loaded.
In your .gdbinit file (or interactively), add the command:
br_PyImport_LoadDynamicModule
Then, when you run GDB:
$ gdb /local/bin/python
gdb) run myscript.py
gdb) continue # repeat until your extension is loaded
gdb) finish # so that your extension is loaded
gdb) br myfunction.c:50
gdb) continue
Most packaged versions of Python don’t include the
/usr/lib/python2.x/config/ directory, which contains various files
required for compiling Python extensions.
For Red Hat, install the python-devel RPM to get the necessary files.
Sometimes you want to emulate the Python interactive interpreter’s behavior,
where it gives you a continuation prompt when the input is incomplete (e.g. you
typed the start of an “if” statement or you didn’t close your parentheses or
triple string quotes), but it gives you a syntax error message immediately when
the input is invalid.
In Python you can use the codeop module, which approximates the parser’s
behavior sufficiently. IDLE uses this, for example.
The easiest way to do it in C is to call PyRun_InteractiveLoop() (perhaps
in a separate thread) and let the Python interpreter handle the input for
you. You can also set the PyOS_ReadlineFunctionPointer() to point at your
custom input function. See Modules/readline.c and Parser/myreadline.c
for more hints.
However sometimes you have to run the embedded Python interpreter in the same
thread as your rest application and you can’t allow the
PyRun_InteractiveLoop() to stop while waiting for user input. The one
solution then is to call PyParser_ParseString() and test for e.error
equal to E_EOF, which means the input is incomplete). Here’s a sample code
fragment, untested, inspired by code from Alex Farber:
#include <Python.h>#include <node.h>#include <errcode.h>#include <grammar.h>#include <parsetok.h>#include <compile.h>inttestcomplete(char*code)/* code should end in \n *//* return -1 for error, 0 for incomplete, 1 for complete */{node*n;perrdetaile;n=PyParser_ParseString(code,&_PyParser_Grammar,Py_file_input,&e);if(n==NULL){if(e.error==E_EOF)return0;return-1;}PyNode_Free(n);return1;}
Another solution is trying to compile the received string with
Py_CompileString(). If it compiles without errors, try to execute the
returned code object by calling PyEval_EvalCode(). Otherwise save the
input for later. If the compilation fails, find out if it’s an error or just
more input is required - by extracting the message string from the exception
tuple and comparing it to the string “unexpected EOF while parsing”. Here is a
complete example using the GNU readline library (you may want to ignore
SIGINT while calling readline()):
#include <stdio.h>#include <readline.h>#include <Python.h>#include <object.h>#include <compile.h>#include <eval.h>intmain(intargc,char*argv[]){inti,j,done=0;/* lengths of line, code */charps1[]=">>> ";charps2[]="... ";char*prompt=ps1;char*msg,*line,*code=NULL;PyObject*src,*glb,*loc;PyObject*exc,*val,*trb,*obj,*dum;Py_Initialize();loc=PyDict_New();glb=PyDict_New();PyDict_SetItemString(glb,"__builtins__",PyEval_GetBuiltins());while(!done){line=readline(prompt);if(NULL==line)/* CTRL-D pressed */{done=1;}else{i=strlen(line);if(i>0)add_history(line);/* save non-empty lines */if(NULL==code)/* nothing in code yet */j=0;elsej=strlen(code);code=realloc(code,i+j+2);if(NULL==code)/* out of memory */exit(1);if(0==j)/* code was empty, so */code[0]='\0';/* keep strncat happy */strncat(code,line,i);/* append line to code */code[i+j]='\n';/* append '\n' to code */code[i+j+1]='\0';src=Py_CompileString(code,"<stdin>",Py_single_input);if(NULL!=src)/* compiled just fine - */{if(ps1==prompt||/* ">>> " or */'\n'==code[i+j-1])/* "... " and double '\n' */{/* so execute it */dum=PyEval_EvalCode(src,glb,loc);Py_XDECREF(dum);Py_XDECREF(src);free(code);code=NULL;if(PyErr_Occurred())PyErr_Print();prompt=ps1;}}/* syntax error or E_EOF? */elseif(PyErr_ExceptionMatches(PyExc_SyntaxError)){PyErr_Fetch(&exc,&val,&trb);/* clears exception! */if(PyArg_ParseTuple(val,"sO",&msg,&obj)&&!strcmp(msg,"unexpected EOF while parsing"))/* E_EOF */{Py_XDECREF(exc);Py_XDECREF(val);Py_XDECREF(trb);prompt=ps2;}else/* some other syntax error */{PyErr_Restore(exc,val,trb);PyErr_Print();free(code);code=NULL;prompt=ps1;}}else/* some non-syntax error */{PyErr_Print();free(code);code=NULL;prompt=ps1;}free(line);}}Py_XDECREF(glb);Py_XDECREF(loc);Py_Finalize();exit(0);}
To dynamically load g++ extension modules, you must recompile Python, relink it
using g++ (change LINKCC in the Python Modules Makefile), and link your
extension module using g++ (e.g., g++-shared-omymodule.somymodule.o).
In Python 2.2, you can inherit from built-in classes such as int,
list, dict, etc.
The Boost Python Library (BPL, http://www.boost.org/libs/python/doc/index.html)
provides a way of doing this from C++ (i.e. you can inherit from an extension
class written in C++ using the BPL).
You are using a version of Python that uses a 4-byte representation for Unicode
characters, but some C extension module you are importing was compiled using a
Python that uses a 2-byte representation for Unicode characters (the default).
If instead the name of the undefined symbol starts with PyUnicodeUCS4, the
problem is the reverse: Python was built using 2-byte Unicode characters, and
the extension module was compiled using a Python with 4-byte Unicode characters.
This can easily occur when using pre-built extension packages. RedHat Linux
7.x, in particular, provided a “python2” binary that is compiled with 4-byte
Unicode. This only causes the link failure if the extension uses any of the
PyUnicode_*() functions. It is also a problem if an extension uses any of
the Unicode-related format specifiers for Py_BuildValue() (or similar) or
parameter specifications for PyArg_ParseTuple().
You can check the size of the Unicode character a Python interpreter is using by
checking the value of sys.maxunicode:
This is not necessarily a straightforward question. If you are already familiar
with running programs from the Windows command line then everything will seem
obvious; otherwise, you might need a little more guidance. There are also
differences between Windows 95, 98, NT, ME, 2000 and XP which can add to the
confusion.
This series of screencasts aims to get you up and running with Python on
Windows XP. The knowledge is distilled into 1.5 hours and will get you up
and running with the right Python distribution, coding in your choice of IDE,
and debugging and writing solid code with unit-tests.
Unless you use some sort of integrated development environment, you will end up
typing Windows commands into what is variously referred to as a “DOS window”
or “Command prompt window”. Usually you can create such a window from your
Start menu; under Windows 2000 the menu selection is Start ‣
Programs ‣ Accessories ‣ Command Prompt. You should be able to recognize
when you have started such a window because you will see a Windows “command
prompt”, which usually looks like this:
C:\>
The letter may be different, and there might be other things after it, so you
might just as easily see something like:
D:\Steve\Projects\Python>
depending on how your computer has been set up and what else you have recently
done with it. Once you have started such a window, you are well on the way to
running Python programs.
You need to realize that your Python scripts have to be processed by another
program called the Python interpreter. The interpreter reads your script,
compiles it into bytecodes, and then executes the bytecodes to run your
program. So, how do you arrange for the interpreter to handle your Python?
First, you need to make sure that your command window recognises the word
“python” as an instruction to start the interpreter. If you have opened a
command window, you should try entering the command python and hitting
return. You should then see something like:
Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
You have started the interpreter in “interactive mode”. That means you can enter
Python statements or expressions interactively and have them executed or
evaluated while you wait. This is one of Python’s strongest features. Check it
by entering a few expressions of your choice and seeing the results:
>>>print("Hello")Hello>>>"Hello"*3HelloHelloHello
Many people use the interactive mode as a convenient yet highly programmable
calculator. When you want to end your interactive Python session, hold the Ctrl
key down while you enter a Z, then hit the “Enter” key to get back to your
Windows command prompt.
You may also find that you have a Start-menu entry such as Start
‣ Programs ‣ Python 2.2 ‣ Python (command line) that results in you
seeing the >>> prompt in a new window. If so, the window will disappear
after you enter the Ctrl-Z character; Windows is running a single “python”
command in the window, and closes it when you terminate the interpreter.
If the python command, instead of displaying the interpreter prompt >>>,
gives you a message like:
'python' is not recognized as an internal or external command,
operable program or batch file.
Python is not added to the DOS path by default. This screencast will walk
you through the steps to add the correct entry to the System Path, allowing
Python to be executed from the command-line by all users.
or:
Badcommandorfilename
then you need to make sure that your computer knows where to find the Python
interpreter. To do this you will have to modify a setting called PATH, which is
a list of directories where Windows will look for programs.
You should arrange for Python’s installation directory to be added to the PATH
of every command window as it starts. If you installed Python fairly recently
then the command
dir C:\py*
will probably tell you where it is installed; the usual location is something
like C:\Python23. Otherwise you will be reduced to a search of your whole
disk ... use Tools ‣ Find or hit the Search
button and look for “python.exe”. Supposing you discover that Python is
installed in the C:\Python23 directory (the default at the time of writing),
you should make sure that entering the command
c:\Python23\python
starts up the interpreter as above (and don’t forget you’ll need a “CTRL-Z” and
an “Enter” to get out of it). Once you have verified the directory, you need to
add it to the start-up routines your computer goes through. For older versions
of Windows the easiest way to do this is to edit the C:\AUTOEXEC.BAT
file. You would want to add a line like the following to AUTOEXEC.BAT:
PATH C:\Python23;%PATH%
For Windows NT, 2000 and (I assume) XP, you will need to add a string such as
;C:\Python23
to the current setting for the PATH environment variable, which you will find in
the properties window of “My Computer” under the “Advanced” tab. Note that if
you have sufficient privilege you might get a choice of installing the settings
either for the Current User or for System. The latter is preferred if you want
everybody to be able to run Python on the machine.
If you aren’t confident doing any of these manipulations yourself, ask for help!
At this stage you may want to reboot your system to make absolutely sure the new
setting has taken effect. You probably won’t need to reboot for Windows NT, XP
or 2000. You can also avoid it in earlier versions by editing the file
C:\WINDOWS\COMMAND\CMDINIT.BAT instead of AUTOEXEC.BAT.
You should now be able to start a new command window, enter python at the
C:\> (or whatever) prompt, and see the >>> prompt that indicates the
Python interpreter is reading interactive commands.
Let’s suppose you have a program called pytest.py in directory
C:\Steve\Projects\Python. A session to run that program might look like
this:
C:\> cd \Steve\Projects\Python
C:\Steve\Projects\Python> python pytest.py
Because you added a file name to the command to start the interpreter, when it
starts up it reads the Python script in the named file, compiles it, executes
it, and terminates, so you see another C:\> prompt. You might also have
entered
C:\> python \Steve\Projects\Python\pytest.py
if you hadn’t wanted to change your current directory.
Under NT, 2000 and XP you may well find that the installation process has also
arranged that the command pytest.py (or, if the file isn’t in the current
directory, C:\Steve\Projects\Python\pytest.py) will automatically recognize
the ”.py” extension and run the Python interpreter on the named file. Using this
feature is fine, but some versions of Windows have bugs which mean that this
form isn’t exactly equivalent to using the interpreter explicitly, so be
careful.
The important things to remember are:
Start Python from the Start Menu, or make sure the PATH is set correctly so
Windows can find the Python interpreter.
python
should give you a ‘>>>’ prompt from the Python interpreter. Don’t forget the
CTRL-Z and ENTER to terminate the interpreter (and, if you started the window
from the Start Menu, make the window disappear).
Once this works, you run programs with commands:
python{program-file}
When you know the commands to use you can build Windows shortcuts to run the
Python interpreter on any of your scripts, naming particular working
directories, and adding them to your menus. Take a look at
python--help
if your needs are complex.
Interactive mode (where you see the >>> prompt) is best used for checking
that individual statements and expressions do what you think they will, and
for developing code by experiment.
On Windows 2000, the standard Python installer already associates the .py
extension with a file type (Python.File) and gives that file type an open
command that runs the interpreter (D:\ProgramFiles\Python\python.exe"%1"%*). This is enough to make scripts executable from the command prompt as
‘foo.py’. If you’d rather be able to execute the script by simple typing ‘foo’
with no extension you need to add .py to the PATHEXT environment variable.
On Windows NT, the steps taken by the installer as described above allow you to
run a script with ‘foo.py’, but a longtime bug in the NT command processor
prevents you from redirecting the input or output of any script executed in this
way. This is often important.
The incantation for making a Python script executable under WinNT is to give the
file an extension of .cmd and add the following as the first line:
Usually Python starts very quickly on Windows, but occasionally there are bug
reports that Python suddenly begins to take a long time to start up. This is
made even more puzzling because Python will work fine on other Windows systems
which appear to be configured identically.
The problem may be caused by a misconfiguration of virus checking software on
the problem machine. Some virus scanners have been known to introduce startup
overhead of two orders of magnitude when the scanner is configured to monitor
all reads from the filesystem. Try checking the configuration of virus scanning
software on your systems to ensure that they are indeed configured identically.
McAfee, when configured to scan all file system read activity, is a particular
offender.
“Freeze” is a program that allows you to ship a Python program as a single
stand-alone executable file. It is not a compiler; your programs don’t run
any faster, but they are more easily distributable, at least to platforms with
the same OS and CPU. Read the README file of the freeze program for more
disclaimers.
You can use freeze on Windows, but you must download the source tree (see
http://www.python.org/download/source). The freeze program is in the
Tools\freeze subdirectory of the source tree.
You need the Microsoft VC++ compiler, and you probably need to build Python.
The required project files are in the PCbuild directory.
Yes, .pyd files are dll’s, but there are a few differences. If you have a DLL
named foo.pyd, then it must have a function initfoo(). You can then
write Python “import foo”, and Python will search for foo.pyd (as well as
foo.py, foo.pyc) and if it finds it, will attempt to call initfoo() to
initialize it. You do not link your .exe with foo.lib, as that would cause
Windows to require the DLL to be present.
Note that the search path for foo.pyd is PYTHONPATH, not the same as the path
that Windows uses to search for foo.dll. Also, foo.pyd need not be present to
run your program, whereas if you linked your program with a dll, the dll is
required. Of course, foo.pyd is required if you want to say importfoo. In
a DLL, linkage is declared in the source code with __declspec(dllexport).
In a .pyd, linkage is defined in a list of available functions.
Embedding the Python interpreter in a Windows app can be summarized as follows:
Do _not_ build Python into your .exe file directly. On Windows, Python must
be a DLL to handle importing modules that are themselves DLL’s. (This is the
first key undocumented fact.) Instead, link to pythonNN.dll; it is
typically installed in C:\Windows\System. NN is the Python version, a
number such as “23” for Python 2.3.
You can link to Python in two different ways. Load-time linking means
linking against pythonNN.lib, while run-time linking means linking
against pythonNN.dll. (General note: pythonNN.lib is the
so-called “import lib” corresponding to pythonNN.dll. It merely
defines symbols for the linker.)
Run-time linking greatly simplifies link options; everything happens at run
time. Your code must load pythonNN.dll using the Windows
LoadLibraryEx() routine. The code must also use access routines and data
in pythonNN.dll (that is, Python’s C API’s) using pointers obtained
by the Windows GetProcAddress() routine. Macros can make using these
pointers transparent to any C code that calls routines in Python’s C API.
Borland note: convert pythonNN.lib to OMF format using Coff2Omf.exe
first.
If you use SWIG, it is easy to create a Python “extension module” that will
make the app’s data and methods available to Python. SWIG will handle just
about all the grungy details for you. The result is C code that you link
into your .exe file (!) You do _not_ have to create a DLL file, and this
also simplifies linking.
SWIG will create an init function (a C function) whose name depends on the
name of the extension module. For example, if the name of the module is leo,
the init function will be called initleo(). If you use SWIG shadow classes,
as you should, the init function will be called initleoc(). This initializes
a mostly hidden helper class used by the shadow class.
The reason you can link the C code in step 2 into your .exe file is that
calling the initialization function is equivalent to importing the module
into Python! (This is the second key undocumented fact.)
In short, you can use the following code to initialize the Python interpreter
with your extension module.
#include "python.h"...Py_Initialize();// Initialize Python.initmyAppc();// Initialize (import) the helper class.PyRun_SimpleString("import myApp");// Import the shadow class.
There are two problems with Python’s C API which will become apparent if you
use a compiler other than MSVC, the compiler used to build pythonNN.dll.
Problem 1: The so-called “Very High Level” functions that take FILE *
arguments will not work in a multi-compiler environment because each
compiler’s notion of a struct FILE will be different. From an implementation
standpoint these are very _low_ level functions.
Problem 2: SWIG generates the following code when generating wrappers to void
functions:
Alas, Py_None is a macro that expands to a reference to a complex data
structure called _Py_NoneStruct inside pythonNN.dll. Again, this code will
fail in a mult-compiler environment. Replace such code by:
returnPy_BuildValue("");
It may be possible to use SWIG’s %typemap command to make the change
automatically, though I have not been able to get this to work (I’m a
complete SWIG newbie).
Using a Python shell script to put up a Python interpreter window from inside
your Windows app is not a good idea; the resulting window will be independent
of your app’s windowing system. Rather, you (or the wxPythonWindow class)
should create a “native” interpreter window. It is easy to connect that
window to the Python interpreter. You can redirect Python’s i/o to _any_
object that supports read and write, so all you need is a Python object
(defined in your extension module) that contains read() and write() methods.
and enter the following line (making any specific changes that your system may
need):
.py :REG_SZ: c:\<path to python>\python.exe -u %s %s
This line will allow you to call your script with a simple reference like:
http://yourserver/scripts/yourscript.py provided “scripts” is an
“executable” directory for your server (which it usually is by default). The
-u flag specifies unbuffered and binary mode for stdin - needed when
working with binary data.
In addition, it is recommended that using ”.py” may not be a good idea for the
file extensions when used in this context (you might want to reserve *.py
for support modules and use *.cgi or *.cgp for “main program” scripts).
In order to set up Internet Information Services 5 to use Python for CGI
processing, please see the following links:
The FAQ does not recommend using tabs, and the Python style guide, PEP 8,
recommends 4 spaces for distributed Python code; this is also the Emacs
python-mode default.
Under any editor, mixing tabs and spaces is a bad idea. MSVC is no different in
this respect, and is easily configured to use spaces: Take Tools
‣ Options ‣ Tabs, and for file type “Default” set “Tab size” and “Indent
size” to 4, and select the “Insert spaces” radio button.
If you suspect mixed tabs and spaces are causing problems in leading whitespace,
run Python with the -t switch or run Tools/Scripts/tabnanny.py to
check a directory tree in batch mode.
Use the msvcrt module. This is a standard Windows-specific extension module.
It defines a function kbhit() which checks whether a keyboard hit is
present, and getch() which gets one character without echoing it.
Prior to Python 2.7 and 3.2, to terminate a process, you can use ctypes:
importctypesdefkill(pid):"""kill function for Win32"""kernel32=ctypes.windll.kernel32handle=kernel32.OpenProcess(1,0,pid)return(0!=kernel32.TerminateProcess(handle,0))
In 2.7 and 3.2, os.kill() is implemented similar to the above function,
with the additional feature of being able to send CTRL+C and CTRL+BREAK
to console subprocesses which are designed to handle those signals. See
os.kill() for further details.
Be sure you have the latest python.exe, that you are using python.exe rather
than a GUI version of Python and that you have configured the server to execute
"...\python.exe -u ..."
for the CGI execution. The -u (unbuffered) option on NT and Win95
prevents the interpreter from altering newlines in the standard input and
output. Without it post/multipart requests will seem to have the wrong length
and binary (e.g. GIF) responses may get garbled (resulting in broken images, PDF
files, and other binary downloads failing).
The reason that os.popen() doesn’t work from within PythonWin is due to a bug in
Microsoft’s C Runtime Library (CRT). The CRT assumes you have a Win32 console
attached to the process.
You should use the win32pipe module’s popen() instead which doesn’t depend on
having an attached Win32 console.
Example:
import win32pipe
f = win32pipe.popen('dir /c c:\\')
print(f.readlines())
f.close()
There is a bug in Win9x that prevents os.popen/win32pipe.popen* from
working. The good news is there is a way to work around this problem. The
Microsoft Knowledge Base article that you need to lookup is: Q150956. You will
find links to the knowledge base at: http://support.microsoft.com/.
This is very sensitive to the compiler vendor, version and (perhaps) even
options. If the FILE* structure in your embedding program isn’t the same as is
assumed by the Python interpreter it won’t work.
The Python 1.5.* DLLs (python15.dll) are all compiled with MS VC++ 5.0 and
with multithreading-DLL options (/MD).
If you can’t change compilers or flags, try using Py_RunSimpleString().
A trick to get it to run an arbitrary file is to construct a call to
exec() and open() with the name of your file as argument.
Also note that you can not mix-and-match Debug and Release versions. If you
wish to use the Debug Multithreaded DLL, then your module must have _d
appended to the base name.
It could be that you haven’t installed Tcl/Tk, but if you did install Tcl/Tk,
and the Wish application works correctly, the problem may be that its installer
didn’t manage to edit the autoexec.bat file correctly. It tries to add a
statement that changes the PATH environment variable to include the Tcl/Tk ‘bin’
subdirectory, but sometimes this edit doesn’t quite work. Opening it with
notepad usually reveals what the problem is.
(One additional hint, noted by David Szafranski: you can’t use long filenames
here; e.g. use C:\PROGRA~1\Tcl\bin instead of C:\ProgramFiles\Tcl\bin.)
Sometimes, when you download the documentation package to a Windows machine
using a web browser, the file extension of the saved file ends up being .EXE.
This is a mistake; the extension should be .TGZ.
Simply rename the downloaded file to have the .TGZ extension, and WinZip will be
able to handle it. (If your copy of WinZip doesn’t, get a newer one from
http://www.winzip.com.)
Sometimes, when using Tkinter on Windows, you get an error that cw3215mt.dll or
cw3215.dll is missing.
Cause: you have an old Tcl/Tk DLL built with cygwin in your path (probably
C:\Windows). You must use the Tcl/Tk DLLs from the standard Tcl/Tk
installation (Python 1.5.2 comes with one).
This is a Microsoft DLL, and a notorious source of problems. The message
means what it says: you have the wrong version of this DLL for your operating
system. The Python installation did not cause this – something else you
installed previous to this overwrote the DLL that came with your OS (probably
older shareware of some sort, but there’s no way to tell now). If you search
for “CTL3D32” using any search engine (AltaVista, for example), you’ll find
hundreds and hundreds of web pages complaining about the same problem with
all sorts of installation programs. They’ll point you to ways to get the
correct version reinstalled on your system (since Python doesn’t cause this,
we can’t fix it).
Depending on what platform(s) you are aiming at, there are several. Some
of them haven’t been ported to Python 3 yet. At least Tkinter and Qt
are known to be Python 3-compatible.
Standard builds of Python include an object-oriented interface to the Tcl/Tk
widget set, called tkinter. This is probably the easiest to
install (since it comes included with most
binary distributions of Python) and use.
For more info about Tk, including pointers to the source, see the
Tcl/Tk home page. Tcl/Tk is fully portable to the
MacOS, Windows, and Unix platforms.
wxWidgets (http://www.wxwidgets.org) is a free, portable GUI class
library written in C++ that provides a native look and feel on a
number of platforms, with Windows, MacOS X, GTK, X11, all listed as
current stable targets. Language bindings are available for a number
of languages including Python, Perl, Ruby, etc.
wxPython (http://www.wxpython.org) is the Python binding for
wxwidgets. While it often lags slightly behind the official wxWidgets
releases, it also offers a number of features via pure Python
extensions that are not available in other language bindings. There
is an active wxPython user and developer community.
Both wxWidgets and wxPython are free, open source, software with
permissive licences that allow their use in commercial products as
well as in freeware or shareware.
There are bindings available for the Qt toolkit (using either PyQt or PySide) and for KDE (PyKDE).
PyQt is currently more mature than PySide, but you must buy a PyQt license from
Riverbank Computing
if you want to write proprietary applications. PySide is free for all applications.
Qt 4.5 upwards is licensed under the LGPL license; also, commercial licenses
are available from Nokia.
The Mac port by Jack Jansen has a rich and
ever-growing set of modules that support the native Mac toolbox calls. The port
supports MacOS X’s Carbon libraries.
By installing the PyObjc Objective-C bridge, Python programs can use MacOS X’s
Cocoa libraries. See the documentation that comes with the Mac port.
Pythonwin by Mark Hammond includes an interface to the
Microsoft Foundation Classes and a Python programming environment
that’s written mostly in Python using the MFC classes.
Freeze is a tool to create stand-alone applications. When freezing Tkinter
applications, the applications will not be truly stand-alone, as the application
will still need the Tcl and Tk libraries.
One solution is to ship the application with the Tcl and Tk libraries, and point
to them at run-time using the TCL_LIBRARY and TK_LIBRARY
environment variables.
To get truly stand-alone applications, the Tcl scripts that form the library
have to be integrated into the application as well. One tool supporting that is
SAM (stand-alone modules), which is part of the Tix distribution
(http://tix.sourceforge.net/).
Build Tix with SAM enabled, perform the appropriate call to
Tclsam_init(), etc. inside Python’s
Modules/tkappinit.c, and link with libtclsam and libtksam (you
might include the Tix libraries as well).
Yes, and you don’t even need threads! But you’ll have to restructure your I/O
code a bit. Tk has the equivalent of Xt’s XtAddInput() call, which allows you
to register a callback function which will be called from the Tk mainloop when
I/O is possible on a file descriptor. Here’s what you need:
The file may be a Python file or socket object (actually, anything with a
fileno() method), or an integer file descriptor. The mask is one of the
constants tkinter.READABLE or tkinter.WRITABLE. The callback is called as
follows:
callback(file,mask)
You must unregister the callback when you’re done, using
tkinter.deletefilehandler(file)
Note: since you don’t know how many bytes are available for reading, you can’t
use the Python file object’s read or readline methods, since these will insist
on reading a predefined number of bytes. For sockets, the recv() or
recvfrom() methods will work fine; for other files, use
os.read(file.fileno(),maxbytecount).
An often-heard complaint is that event handlers bound to events with the
bind() method don’t get handled even when the appropriate key is pressed.
The most common cause is that the widget to which the binding applies doesn’t
have “keyboard focus”. Check out the Tk documentation for the focus command.
Usually a widget is given the keyboard focus by clicking in it (but not for
labels; see the takefocus option).
Python is a programming language. It’s used for many different applications.
It’s used in some high schools and colleges as an introductory programming
language because Python is easy to learn, but it’s also used by professional
software developers at places such as Google, NASA, and Lucasfilm Ltd.
If you find Python installed on your system but don’t remember installing it,
there are several possible ways it could have gotten there.
Perhaps another user on the computer wanted to learn programming and installed
it; you’ll have to figure out who’s been using the machine and might have
installed it.
A third-party application installed on the machine might have been written in
Python and included a Python installation. There are many such applications,
from GUI programs to network servers and administrative scripts.
Some Windows machines also have Python installed. At this writing we’re aware
of computers from Hewlett-Packard and Compaq that include Python. Apparently
some of HP/Compaq’s administrative tools are written in Python.
Many Unix-compatible operating systems, such as Mac OS X and some Linux
distributions, have Python installed by default; it’s included in the base
installation.
If someone installed it deliberately, you can remove it without hurting
anything. On Windows, use the Add/Remove Programs icon in the Control Panel.
If Python was installed by a third-party application, you can also remove it,
but that application will no longer work. You should use that application’s
uninstaller rather than removing Python directly.
If Python came with your operating system, removing it is not recommended. If
you remove it, whatever tools were written in Python will no longer run, and
some of them might be important to you. Reinstalling the whole system would
then be required to fix things again.
The default Python prompt of the interactive shell. Often seen for code
examples which can be executed interactively in the interpreter.
...
The default Python prompt of the interactive shell when entering code for
an indented code block or within a pair of matching left and right
delimiters (parentheses, square brackets or curly braces).
2to3
A tool that tries to convert Python 2.x code to Python 3.x code by
handling most of the incompatibilities which can be detected by parsing the
source and traversing the parse tree.
Abstract base classes complement duck-typing by
providing a way to define interfaces when other techniques like
hasattr() would be clumsy or subtly wrong (for example with
magic methods). Python comes with many built-in ABCs for
data structures (in the collections module), numbers (in the
numbers module), streams (in the io module), import finders
and loaders (in the importlib.abc module). You can create your own
ABCs with the abc module.
argument
A value passed to a function or method, assigned to a named local
variable in the function body. A function or method may have both
positional arguments and keyword arguments in its definition.
Positional and keyword arguments may be variable-length: * accepts
or passes (if in the function definition or call) several positional
arguments in a list, while ** does the same for keyword arguments
in a dictionary.
Any expression may be used within the argument list, and the evaluated
value is passed to the local variable.
attribute
A value associated with an object which is referenced by name using
dotted expressions. For example, if an object o has an attribute
a it would be referenced as o.a.
BDFL
Benevolent Dictator For Life, a.k.a. Guido van Rossum, Python’s creator.
bytecode
Python source code is compiled into bytecode, the internal representation
of a Python program in the CPython interpreter. The bytecode is also
cached in .pyc and .pyo files so that executing the same file is
faster the second time (recompilation from source to bytecode can be
avoided). This “intermediate language” is said to run on a
virtual machine that executes the machine code corresponding to
each bytecode. Do note that bytecodes are not expected to work between
different Python virtual machines, nor to be stable between Python
releases.
A list of bytecode instructions can be found in the documentation for
the dis module.
class
A template for creating user-defined objects. Class definitions
normally contain method definitions which operate on instances of the
class.
coercion
The implicit conversion of an instance of one type to another during an
operation which involves two arguments of the same type. For example,
int(3.15) converts the floating point number to the integer 3, but
in 3+4.5, each argument is of a different type (one int, one float),
and both must be converted to the same type before they can be added or it
will raise a TypeError. Without coercion, all arguments of even
compatible types would have to be normalized to the same value by the
programmer, e.g., float(3)+4.5 rather than just 3+4.5.
complex number
An extension of the familiar real number system in which all numbers are
expressed as a sum of a real part and an imaginary part. Imaginary
numbers are real multiples of the imaginary unit (the square root of
-1), often written i in mathematics or j in
engineering. Python has built-in support for complex numbers, which are
written with this latter notation; the imaginary part is written with a
j suffix, e.g., 3+1j. To get access to complex equivalents of the
math module, use cmath. Use of complex numbers is a fairly
advanced mathematical feature. If you’re not aware of a need for them,
it’s almost certain you can safely ignore them.
The canonical implementation of the Python programming language, as
distributed on python.org. The term “CPython”
is used when necessary to distinguish this implementation from others
such as Jython or IronPython.
decorator
A function returning another function, usually applied as a function
transformation using the @wrapper syntax. Common examples for
decorators are classmethod() and staticmethod().
The decorator syntax is merely syntactic sugar, the following two
function definitions are semantically equivalent:
def f(...):
...
f = staticmethod(f)
@staticmethod
def f(...):
...
The same concept exists for classes, but is less commonly used there. See
the documentation for function definitions and
class definitions for more about decorators.
descriptor
Any object which defines the methods __get__(), __set__(), or
__delete__(). When a class attribute is a descriptor, its special
binding behavior is triggered upon attribute lookup. Normally, using
a.b to get, set or delete an attribute looks up the object named b in
the class dictionary for a, but if b is a descriptor, the respective
descriptor method gets called. Understanding descriptors is a key to a
deep understanding of Python because they are the basis for many features
including functions, methods, properties, class methods, static methods,
and reference to super classes.
For more information about descriptors’ methods, see 实现描述符.
dictionary
An associative array, where arbitrary keys are mapped to values. The keys
can be any object with __hash__() function and __eq__()
methods. Called a hash in Perl.
docstring
A string literal which appears as the first expression in a class,
function or module. While ignored when the suite is executed, it is
recognized by the compiler and put into the __doc__ attribute
of the enclosing class, function or module. Since it is available via
introspection, it is the canonical place for documentation of the
object.
duck-typing
A programming style which does not look at an object’s type to determine
if it has the right interface; instead, the method or attribute is simply
called or used (“If it looks like a duck and quacks like a duck, it
must be a duck.”) By emphasizing interfaces rather than specific types,
well-designed code improves its flexibility by allowing polymorphic
substitution. Duck-typing avoids tests using type() or
isinstance(). (Note, however, that duck-typing can be complemented
with abstract base classes.) Instead, it typically employs
hasattr() tests or EAFP programming.
EAFP
Easier to ask for forgiveness than permission. This common Python coding
style assumes the existence of valid keys or attributes and catches
exceptions if the assumption proves false. This clean and fast style is
characterized by the presence of many try and except
statements. The technique contrasts with the LBYL style
common to many other languages such as C.
expression
A piece of syntax which can be evaluated to some value. In other words,
an expression is an accumulation of expression elements like literals,
names, attribute access, operators or function calls which all return a
value. In contrast to many other languages, not all language constructs
are expressions. There are also statements which cannot be used
as expressions, such as if. Assignments are also statements,
not expressions.
extension module
A module written in C or C++, using Python’s C API to interact with the
core and with user code.
file object
An object exposing a file-oriented API (with methods such as
read() or write()) to an underlying resource. Depending
on the way it was created, a file object can mediate access to a real
on-disk file or to another other type of storage or communication device
(for example standard input/output, in-memory buffers, sockets, pipes,
etc.). File objects are also called file-like objects or
streams.
There are actually three categories of file objects: raw binary files,
buffered binary files and text files. Their interfaces are defined in the
io module. The canonical way to create a file object is by using
the open() function.
Mathematical division that rounds down to nearest integer. The floor
division operator is //. For example, the expression 11//4
evaluates to 2 in contrast to the 2.75 returned by float true
division. Note that (-11)//4 is -3 because that is -2.75
rounded downward. See PEP 238.
function
A series of statements which returns some value to a caller. It can also
be passed zero or more arguments which may be used in the execution of
the body. See also argument and method.
__future__
A pseudo-module which programmers can use to enable new language features
which are not compatible with the current interpreter.
By importing the __future__ module and evaluating its variables,
you can see when a new feature was first added to the language and when it
becomes the default:
The process of freeing memory when it is not used anymore. Python
performs garbage collection via reference counting and a cyclic garbage
collector that is able to detect and break reference cycles.
generator
A function which returns an iterator. It looks like a normal function
except that it contains yield statements for producing a series
a values usable in a for-loop or that can be retrieved one at a time with
the next() function. Each yield temporarily suspends
processing, remembering the location execution state (including local
variables and pending try-statements). When the generator resumes, it
picks-up where it left-off (in contrast to functions which start fresh on
every invocation).
generator expression
An expression that returns an iterator. It looks like a normal expression
followed by a for expression defining a loop variable, range,
and an optional if expression. The combined expression
generates values for an enclosing function:
>>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81
285
The mechanism used by the CPython interpreter to assure that
only one thread executes Python bytecode at a time.
This simplifies the CPython implementation by making the object model
(including critical built-in types such as dict) implicitly
safe against concurrent access. Locking the entire interpreter
makes it easier for the interpreter to be multi-threaded, at the
expense of much of the parallelism afforded by multi-processor
machines.
However, some extension modules, either standard or third-party,
are designed so as to release the GIL when doing computationally-intensive
tasks such as compression or hashing. Also, the GIL is always released
when doing I/O.
Past efforts to create a “free-threaded” interpreter (one which locks
shared data at a much finer granularity) have not been successful
because performance suffered in the common single-processor case. It
is believed that overcoming this performance issue would make the
implementation much more complicated and therefore costlier to maintain.
hashable
An object is hashable if it has a hash value which never changes during
its lifetime (it needs a __hash__() method), and can be compared to
other objects (it needs an __eq__() method). Hashable objects which
compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member,
because these data structures use the hash value internally.
All of Python’s immutable built-in objects are hashable, while no mutable
containers (such as lists or dictionaries) are. Objects which are
instances of user-defined classes are hashable by default; they all
compare unequal, and their hash value is their id().
IDLE
An Integrated Development Environment for Python. IDLE is a basic editor
and interpreter environment which ships with the standard distribution of
Python.
immutable
An object with a fixed value. Immutable objects include numbers, strings and
tuples. Such an object cannot be altered. A new object has to
be created if a different value has to be stored. They play an important
role in places where a constant hash value is needed, for example as a key
in a dictionary.
importer
An object that both finds and loads a module; both a
finder and loader object.
interactive
Python has an interactive interpreter which means you can enter
statements and expressions at the interpreter prompt, immediately
execute them and see their results. Just launch python with no
arguments (possibly by selecting it from your computer’s main
menu). It is a very powerful way to test out new ideas or inspect
modules and packages (remember help(x)).
interpreted
Python is an interpreted language, as opposed to a compiled one,
though the distinction can be blurry because of the presence of the
bytecode compiler. This means that source files can be run directly
without explicitly creating an executable which is then run.
Interpreted languages typically have a shorter development/debug cycle
than compiled ones, though their programs generally also run more
slowly. See also interactive.
iterable
An object capable of returning its members one at a
time. Examples of iterables include all sequence types (such as
list, str, and tuple) and some non-sequence
types like dict and file and objects of any classes you
define with an __iter__() or __getitem__() method. Iterables
can be used in a for loop and in many other places where a
sequence is needed (zip(), map(), ...). When an iterable
object is passed as an argument to the built-in function iter(), it
returns an iterator for the object. This iterator is good for one pass
over the set of values. When using iterables, it is usually not necessary
to call iter() or deal with iterator objects yourself. The for
statement does that automatically for you, creating a temporary unnamed
variable to hold the iterator for the duration of the loop. See also
iterator, sequence, and generator.
iterator
An object representing a stream of data. Repeated calls to the iterator’s
__next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data
are available a StopIteration exception is raised instead. At this
point, the iterator object is exhausted and any further calls to its
__next__() method just raise StopIteration again. Iterators
are required to have an __iter__() method that returns the iterator
object itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is code
which attempts multiple iteration passes. A container object (such as a
list) produces a fresh new iterator each time you pass it to the
iter() function or use it in a for loop. Attempting this
with an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty container.
A key function or collation function is a callable that returns a value
used for sorting or ordering. For example, locale.strxfrm() is
used to produce a sort key that is aware of locale specific sort
conventions.
There are several ways to create a key function. For example. the
str.lower() method can serve as a key function for case insensitive
sorts. Alternatively, an ad-hoc key function can be built from a
lambda expression such as lambdar:(r[0],r[2]). Also,
the operator module provides three key function constuctors:
attrgetter(), itemgetter(), and
methodcaller(). See the Sorting HOW TO for examples of how to create and use key functions.
keyword argument
Arguments which are preceded with a variable_name= in the call.
The variable name designates the local name in the function to which the
value is assigned. ** is used to accept or pass a dictionary of
keyword arguments. See argument.
lambda
An anonymous inline function consisting of a single expression
which is evaluated when the function is called. The syntax to create
a lambda function is lambda[arguments]:expression
LBYL
Look before you leap. This coding style explicitly tests for
pre-conditions before making calls or lookups. This style contrasts with
the EAFP approach and is characterized by the presence of many
if statements.
In a multi-threaded environment, the LBYL approach can risk introducing a
race condition between “the looking” and “the leaping”. For example, the
code, ifkeyinmapping:returnmapping[key] can fail if another
thread removes key from mapping after the test, but before the lookup.
This issue can be solved with locks or by using the EAFP approach.
list
A built-in Python sequence. Despite its name it is more akin
to an array in other languages than to a linked list since access to
elements are O(1).
list comprehension
A compact way to process all or part of the elements in a sequence and
return a list with the results. result=['{:#04x}'.format(x)forxinrange(256)ifx%2==0] generates a list of strings containing
even hex numbers (0x..) in the range from 0 to 255. The if
clause is optional. If omitted, all elements in range(256) are
processed.
The class of a class. Class definitions create a class name, a class
dictionary, and a list of base classes. The metaclass is responsible for
taking those three arguments and creating the class. Most object oriented
programming languages provide a default implementation. What makes Python
special is that it is possible to create custom metaclasses. Most users
never need this tool, but when the need arises, metaclasses can provide
powerful, elegant solutions. They have been used for logging attribute
access, adding thread-safety, tracking object creation, implementing
singletons, and many other tasks.
A function which is defined inside a class body. If called as an attribute
of an instance of that class, the method will get the instance object as
its first argument (which is usually called self).
See function and nested scope.
Mutable objects can change their value but keep their id(). See
also immutable.
named tuple
Any tuple-like class whose indexable elements are also accessible using
named attributes (for example, time.localtime() returns a
tuple-like object where the year is accessible either with an
index such as t[0] or with a named attribute like t.tm_year).
A named tuple can be a built-in type such as time.struct_time,
or it can be created with a regular class definition. A full featured
named tuple can also be created with the factory function
collections.namedtuple(). The latter approach automatically
provides extra features such as a self-documenting representation like
Employee(name='jones',title='programmer').
namespace
The place where a variable is stored. Namespaces are implemented as
dictionaries. There are the local, global and built-in namespaces as well
as nested namespaces in objects (in methods). Namespaces support
modularity by preventing naming conflicts. For instance, the functions
builtins.open() and os.open() are distinguished by their
namespaces. Namespaces also aid readability and maintainability by making
it clear which module implements a function. For instance, writing
random.seed() or itertools.islice() makes it clear that those
functions are implemented by the random and itertools
modules, respectively.
nested scope
The ability to refer to a variable in an enclosing definition. For
instance, a function defined inside another function can refer to
variables in the outer function. Note that nested scopes by default work
only for reference and not for assignment. Local variables both read and
write in the innermost scope. Likewise, global variables read and write
to the global namespace. The nonlocal allows writing to outer
scopes.
new-style class
Old name for the flavor of classes now used for all class objects. In
earlier Python versions, only new-style classes could use Python’s newer,
versatile features like __slots__, descriptors, properties,
__getattribute__(), class methods, and static methods.
object
Any data with state (attributes or value) and defined behavior
(methods). Also the ultimate base class of any new-style
class.
positional argument
The arguments assigned to local names inside a function or method,
determined by the order in which they were given in the call. * is
used to either accept multiple positional arguments (when in the
definition), or pass several arguments as a list to a function. See
argument.
Python 3000
Nickname for the Python 3.x release line (coined long ago when the release
of version 3 was something in the distant future.) This is also
abbreviated “Py3k”.
Pythonic
An idea or piece of code which closely follows the most common idioms
of the Python language, rather than implementing code using concepts
common to other languages. For example, a common idiom in Python is
to loop over all elements of an iterable using a for
statement. Many other languages don’t have this type of construct, so
people unfamiliar with Python sometimes use a numerical counter instead:
foriinrange(len(food)):print(food[i])
As opposed to the cleaner, Pythonic method:
forpieceinfood:print(piece)
reference count
The number of references to an object. When the reference count of an
object drops to zero, it is deallocated. Reference counting is
generally not visible to Python code, but it is a key element of the
CPython implementation. The sys module defines a
getrefcount() function that programmers can call to return the
reference count for a particular object.
__slots__
A declaration inside a class that saves memory by pre-declaring space for
instance attributes and eliminating instance dictionaries. Though
popular, the technique is somewhat tricky to get right and is best
reserved for rare cases where there are large numbers of instances in a
memory-critical application.
sequence
An iterable which supports efficient element access using integer
indices via the __getitem__() special method and defines a
len() method that returns the length of the sequence.
Some built-in sequence types are list, str,
tuple, and bytes. Note that dict also
supports __getitem__() and __len__(), but is considered a
mapping rather than a sequence because the lookups use arbitrary
immutable keys rather than integers.
slice
An object usually containing a portion of a sequence. A slice is
created using the subscript notation, [] with colons between numbers
when several are given, such as in variable_name[1:3:5]. The bracket
(subscript) notation uses slice objects internally.
special method
A method that is called implicitly by Python to execute a certain
operation on a type, such as addition. Such methods have names starting
and ending with double underscores. Special methods are documented in
特殊方法名.
statement
A statement is part of a suite (a “block” of code). A statement is either
an expression or a one of several constructs with a keyword, such
as if, while or for.
triple-quoted string
A string which is bound by three instances of either a quotation mark
(”) or an apostrophe (‘). While they don’t provide any functionality
not available with single-quoted strings, they are useful for a number
of reasons. They allow you to include unescaped single and double
quotes within a string and they can span multiple lines without the
use of the continuation character, making them especially useful when
writing docstrings.
type
The type of a Python object determines what kind of object it is; every
object has a type. An object’s type is accessible as its
__class__ attribute or can be retrieved with type(obj).
view
The objects returned from dict.keys(), dict.values(), and
dict.items() are called dictionary views. They are lazy sequences
that will see changes in the underlying dictionary. To force the
dictionary view to become a full list use list(dictview). See
Dictionary view objects.
virtual machine
A computer defined entirely in software. Python’s virtual machine
executes the bytecode emitted by the bytecode compiler.
Zen of Python
Listing of Python design principles and philosophies that are helpful in
understanding and using the language. The listing can be found by typing
“importthis” at the interactive prompt.
These documents are generated from reStructuredText sources by Sphinx, a
document processor specifically written for the Python documentation.
Development of the documentation and its toolchain takes place on the
docs@python.org mailing list. We’re always looking for volunteers wanting
to help with the docs, so feel free to send a mail there!
Many thanks go to:
Fred L. Drake, Jr., the creator of the original Python documentation toolset
and writer of much of the content;
the Docutils project for creating
reStructuredText and the Docutils suite;
This section lists people who have contributed in some way to the Python
documentation. It is probably not complete – if you feel that you or
anyone else should be on this list, please let us know (send email to
docs@python.org), and we’ll be glad to correct the problem.
Aahz
Michael Abbott
Steve Alexander
Jim Ahlstrom
Fred Allen
A. Amoroso
Pehr Anderson
Oliver Andrich
Heidi Annexstad
Jesús Cea Avión
Manuel Balsera
Daniel Barclay
Chris Barker
Don Bashford
Anthony Baxter
Alexander Belopolsky
Bennett Benson
Jonathan Black
Robin Boerdijk
Michal Bozon
Aaron Brancotti
Georg Brandl
Keith Briggs
Ian Bruntlett
Lee Busby
Lorenzo M. Catucci
Carl Cerecke
Mauro Cicognini
Gilles Civario
Mike Clarkson
Steve Clift
Dave Cole
Matthew Cowles
Jeremy Craven
Andrew Dalke
Ben Darnell
L. Peter Deutsch
Robert Donohue
Fred L. Drake, Jr.
Jacques Ducasse
Josip Dzolonga
Jeff Epler
Michael Ernst
Blame Andy Eskilsson
Carey Evans
Martijn Faassen
Carl Feynman
Dan Finnie
Hernán Martínez Foffani
Stefan Franke
Jim Fulton
Peter Funk
Lele Gaifax
Matthew Gallagher
Gabriel Genellina
Ben Gertzfield
Nadim Ghaznavi
Jonathan Giddy
Matt Giuca
Shelley Gooch
Nathaniel Gray
Grant Griffin
Thomas Guettler
Anders Hammarquist
Mark Hammond
Harald Hanche-Olsen
Manus Hand
Gerhard Häring
Travis B. Hartwell
Tim Hatch
Janko Hauser
Ben Hayden
Thomas Heller
Bernhard Herzog
Magnus L. Hetland
Konrad Hinsen
Stefan Hoffmeister
Albert Hofkamp
Gregor Hoffleit
Steve Holden
Thomas Holenstein
Gerrit Holl
Rob Hooft
Brian Hooper
Randall Hopper
Michael Hudson
Eric Huss
Jeremy Hylton
Roger Irwin
Jack Jansen
Philip H. Jensen
Pedro Diaz Jimenez
Kent Johnson
Lucas de Jonge
Andreas Jung
Robert Kern
Jim Kerr
Jan Kim
Kamil Kisiel
Greg Kochanski
Guido Kollerie
Peter A. Koren
Daniel Kozan
Andrew M. Kuchling
Dave Kuhlman
Erno Kuusela
Ross Lagerwall
Thomas Lamb
Detlef Lannert
Piers Lauder
Glyph Lefkowitz
Robert Lehmann
Marc-André Lemburg
Ross Light
Gediminas Liktaras
Ulf A. Lindgren
Everett Lipman
Mirko Liss
Martin von Löwis
Fredrik Lundh
Jeff MacDonald
John Machin
Andrew MacIntyre
Vladimir Marangozov
Vincent Marchetti
Westley Martínez
Laura Matson
Daniel May
Rebecca McCreary
Doug Mennella
Paolo Milani
Skip Montanaro
Paul Moore
Ross Moore
Sjoerd Mullender
Dale Nagata
Michal Nowikowski
Steffen Daode Nurpmeso
Ng Pheng Siong
Koray Oner
Tomas Oppelstrup
Denis S. Otkidach
Zooko O’Whielacronx
Shriphani Palakodety
William Park
Joonas Paalasmaa
Harri Pasanen
Bo Peng
Tim Peters
Benjamin Peterson
Christopher Petrilli
Justin D. Pettit
Chris Phoenix
François Pinard
Paul Prescod
Eric S. Raymond
Edward K. Ream
Terry J. Reedy
Sean Reifschneider
Bernhard Reiter
Armin Rigo
Wes Rishel
Armin Ronacher
Jim Roskind
Guido van Rossum
Donald Wallace Rouse II
Mark Russell
Nick Russo
Chris Ryland
Constantina S.
Hugh Sasse
Bob Savage
Scott Schram
Neil Schemenauer
Barry Scott
Joakim Sernbrant
Justin Sheehy
Charlie Shepherd
SilentGhost
Michael Simcich
Ionel Simionescu
Michael Sloan
Gregory P. Smith
Roy Smith
Clay Spence
Nicholas Spies
Tage Stabell-Kulo
Frank Stajano
Anthony Starks
Greg Stein
Peter Stoehr
Mark Summerfield
Reuben Sumner
Kalle Svensson
Jim Tittsler
David Turner
Sandro Tosi
Ville Vainio
Martijn Vries
Charles G. Waldman
Greg Ward
Barry Warsaw
Corran Webster
Glyn Webster
Bob Weiner
Eddy Welbourne
Jeff Wheeler
Mats Wichmann
Gerry Wiener
Timothy Wild
Paul Winkler
Collin Winter
Blake Winton
Dan Wolfe
Steven Work
Thomas Wouters
Ka-Ping Yee
Rory Yorke
Moshe Zadka
Milan Zamazal
Cheng Zhang
Trent Nelson
Michael Foord
It is only with the input and contributions of the Python community
that Python has such wonderful documentation – Thank You!
Python is a mature programming language which has established a reputation for
stability. In order to maintain this reputation, the developers would like to
know of any deficiencies you find in Python.
If you find a bug in this documentation or would like to propose an improvement,
please send an e-mail to docs@python.org describing the bug and where you found
it. If you have a suggestion how to fix it, include that as well.
docs@python.org is a mailing list run by volunteers; your request will be
noticed, even if it takes a while to be processed.
Of course, if you want a more persistent record of your issue, you can use the
issue tracker for documentation bugs as well.
Bug reports for Python itself should be submitted via the Python Bug Tracker
(http://bugs.python.org/). The bug tracker offers a Web form which allows
pertinent information to be entered and submitted to the developers.
The first step in filing a report is to determine whether the problem has
already been reported. The advantage in doing so, aside from saving the
developers time, is that you learn what has been done to fix it; it may be that
the problem has already been fixed for the next release, or additional
information is needed (in which case you are welcome to provide it if you can!).
To do this, search the bug database using the search box on the top of the page.
If the problem you’re reporting is not already in the bug tracker, go back to
the Python Bug Tracker and log in. If you don’t already have a tracker account,
select the “Register” link or, if you use OpenID, one of the OpenID provider
logos in the sidebar. It is not possible to submit a bug report anonymously.
Being now logged in, you can submit a bug. Select the “Create New” link in the
sidebar to open the bug reporting form.
The submission form has a number of fields. For the “Title” field, enter a
very short description of the problem; less than ten words is good. In the
“Type” field, select the type of your problem; also select the “Component” and
“Versions” to which the bug relates.
In the “Comment” field, describe the problem in detail, including what you
expected to happen and what did happen. Be sure to include whether any
extension modules were involved, and what hardware and software platform you
were using (including version information as appropriate).
Each bug report will be assigned to a developer who will determine what needs to
be done to correct the problem. You will receive an update each time action is
taken on the bug. See http://www.python.org/dev/workflow/ for a detailed
description of the issue workflow.
Python was created in the early 1990s by Guido van Rossum at Stichting
Mathematisch Centrum (CWI, see http://www.cwi.nl/) in the Netherlands as a
successor of a language called ABC. Guido remains Python’s principal author,
although it includes many contributions from others.
In 1995, Guido continued his work on Python at the Corporation for National
Research Initiatives (CNRI, see http://www.cnri.reston.va.us/) in Reston,
Virginia where he released several versions of the software.
In May 2000, Guido and the Python core development team moved to BeOpen.com to
form the BeOpen PythonLabs team. In October of the same year, the PythonLabs
team moved to Digital Creations (now Zope Corporation; see
http://www.zope.com/). In 2001, the Python Software Foundation (PSF, see
http://www.python.org/psf/) was formed, a non-profit organization created
specifically to own Python-related Intellectual Property. Zope Corporation is a
sponsoring member of the PSF.
All Python releases are Open Source (see http://www.opensource.org/ for the Open
Source Definition). Historically, most, but not all, Python releases have also
been GPL-compatible; the table below summarizes the various releases.
Release
Derived from
Year
Owner
GPL compatible?
0.9.0 thru 1.2
n/a
1991-1995
CWI
yes
1.3 thru 1.5.2
1.2
1995-1999
CNRI
yes
1.6
1.5.2
2000
CNRI
no
2.0
1.6
2000
BeOpen.com
no
1.6.1
1.6
2001
CNRI
no
2.1
2.0+1.6.1
2001
PSF
no
2.0.1
2.0+1.6.1
2001
PSF
yes
2.1.1
2.1+2.0.1
2001
PSF
yes
2.2
2.1.1
2001
PSF
yes
2.1.2
2.1.1
2002
PSF
yes
2.1.3
2.1.2
2002
PSF
yes
2.2.1
2.2
2002
PSF
yes
2.2.2
2.2.1
2002
PSF
yes
2.2.3
2.2.2
2002-2003
PSF
yes
2.3
2.2.2
2002-2003
PSF
yes
2.3.1
2.3
2002-2003
PSF
yes
2.3.2
2.3.1
2003
PSF
yes
2.3.3
2.3.2
2003
PSF
yes
2.3.4
2.3.3
2004
PSF
yes
2.3.5
2.3.4
2005
PSF
yes
2.4
2.3
2004
PSF
yes
2.4.1
2.4
2005
PSF
yes
2.4.2
2.4.1
2005
PSF
yes
2.4.3
2.4.2
2006
PSF
yes
2.4.4
2.4.3
2006
PSF
yes
2.5
2.4
2006
PSF
yes
2.5.1
2.5
2007
PSF
yes
2.6
2.5
2008
PSF
yes
2.6.1
2.6
2008
PSF
yes
2.6.2
2.6.1
2009
PSF
yes
2.6.3
2.6.2
2009
PSF
yes
2.6.4
2.6.3
2009
PSF
yes
3.0
2.6
2008
PSF
yes
3.0.1
3.0
2009
PSF
yes
3.1
3.0.1
2009
PSF
yes
3.1.1
3.1
2009
PSF
yes
3.1.2
3.1.1
2010
PSF
yes
3.1.3
3.1.2
2010
PSF
yes
3.1.4
3.1.3
2011
PSF
yes
3.2
3.1
2011
PSF
yes
3.2.1
3.2
2011
PSF
yes
3.2.2
3.2.1
2011
PSF
yes
Note
GPL-compatible doesn’t mean that we’re distributing Python under the GPL. All
Python licenses, unlike the GPL, let you distribute a modified version without
making your changes open source. The GPL-compatible licenses make it possible to
combine Python with other software that is released under the GPL; the others
don’t.
Thanks to the many outside volunteers who have worked under Guido’s direction to
make these releases possible.
Terms and conditions for accessing or otherwise using Python¶
PSF LICENSE AGREEMENT FOR PYTHON 3.2.2
This LICENSE AGREEMENT is between the Python Software Foundation (“PSF”), and
the Individual or Organization (“Licensee”) accessing and otherwise using Python
3.2.2 software in source or binary form and its associated documentation.
In the event Licensee prepares a derivative work that is based on or
incorporates Python 3.2.2 or any part thereof, and wants to make the
derivative work available to others as provided herein, then Licensee hereby
agrees to include in any such work a brief summary of the changes made to Python
3.2.2.
PSF is making Python 3.2.2 available to Licensee on an “AS IS” basis.
PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF
EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR
WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE
USE OF PYTHON 3.2.2 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 3.2.2
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF
MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 3.2.2, OR ANY DERIVATIVE
THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
This License Agreement will automatically terminate upon a material breach of
its terms and conditions.
Nothing in this License Agreement shall be deemed to create any relationship
of agency, partnership, or joint venture between PSF and Licensee. This License
Agreement does not grant permission to use PSF trademarks or trade name in a
trademark sense to endorse or promote products or services of Licensee, or any
third party.
By copying, installing or otherwise using Python 3.2.2, Licensee agrees
to be bound by the terms and conditions of this License Agreement.
BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
This LICENSE AGREEMENT is between BeOpen.com (“BeOpen”), having an office at
160 Saratoga Avenue, Santa Clara, CA 95051, and the Individual or Organization
(“Licensee”) accessing and otherwise using this software in source or binary
form and its associated documentation (“the Software”).
Subject to the terms and conditions of this BeOpen Python License Agreement,
BeOpen hereby grants Licensee a non-exclusive, royalty-free, world-wide license
to reproduce, analyze, test, perform and/or display publicly, prepare derivative
works, distribute, and otherwise use the Software alone or in any derivative
version, provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.
BeOpen is making the Software available to Licensee on an “AS IS” basis.
BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF
EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND DISCLAIMS ANY REPRESENTATION OR
WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE
USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR
ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF USING,
MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF
ADVISED OF THE POSSIBILITY THEREOF.
This License Agreement will automatically terminate upon a material breach of
its terms and conditions.
This License Agreement shall be governed by and interpreted in all respects
by the law of the State of California, excluding conflict of law provisions.
Nothing in this License Agreement shall be deemed to create any relationship of
agency, partnership, or joint venture between BeOpen and Licensee. This License
Agreement does not grant permission to use BeOpen trademarks or trade names in a
trademark sense to endorse or promote products or services of Licensee, or any
third party. As an exception, the “BeOpen Python” logos available at
http://www.pythonlabs.com/logos.html may be used according to the permissions
granted on that web page.
By copying, installing or otherwise using the software, Licensee agrees to be
bound by the terms and conditions of this License Agreement.
CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
This LICENSE AGREEMENT is between the Corporation for National Research
Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191
(“CNRI”), and the Individual or Organization (“Licensee”) accessing and
otherwise using Python 1.6.1 software in source or binary form and its
associated documentation.
In the event Licensee prepares a derivative work that is based on or
incorporates Python 1.6.1 or any part thereof, and wants to make the derivative
work available to others as provided herein, then Licensee hereby agrees to
include in any such work a brief summary of the changes made to Python 1.6.1.
CNRI is making Python 1.6.1 available to Licensee on an “AS IS” basis. CNRI
MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE,
BUT NOT LIMITATION, CNRI MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY
OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF
PYTHON 1.6.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 1.6.1 FOR
ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF
MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, OR ANY DERIVATIVE
THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
This License Agreement will automatically terminate upon a material breach of
its terms and conditions.
This License Agreement shall be governed by the federal intellectual property
law of the United States, including without limitation the federal copyright
law, and, to the extent such U.S. federal law does not apply, by the law of the
Commonwealth of Virginia, excluding Virginia’s conflict of law provisions.
Notwithstanding the foregoing, with regard to derivative works based on Python
1.6.1 that incorporate non-separable material that was previously distributed
under the GNU General Public License (GPL), the law of the Commonwealth of
Virginia shall govern this License Agreement only as to issues arising under or
with respect to Paragraphs 4, 5, and 7 of this License Agreement. Nothing in
this License Agreement shall be deemed to create any relationship of agency,
partnership, or joint venture between CNRI and Licensee. This License Agreement
does not grant permission to use CNRI trademarks or trade name in a trademark
sense to endorse or promote products or services of Licensee, or any third
party.
By clicking on the “ACCEPT” button where indicated, or by copying, installing
or otherwise using Python 1.6.1, Licensee agrees to be bound by the terms and
conditions of this License Agreement.
ACCEPT
CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided that
the above copyright notice appear in all copies and that both that copyright
notice and this permission notice appear in supporting documentation, and that
the name of Stichting Mathematisch Centrum or CWI not be used in advertising or
publicity pertaining to distribution of the software without specific, written
prior permission.
STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT
OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.
Licenses and Acknowledgements for Incorporated Software¶
This section is an incomplete, but growing list of licenses and acknowledgements
for third-party software incorporated in the Python distribution.
The _random module includes code based on a download from
http://www.math.keio.ac.jp/ matumoto/MT2002/emt19937ar.html. The following are
the verbatim comments from the original code:
A C-program for MT19937, with initialization improved 2002/1/26.
Coded by Takuji Nishimura and Makoto Matsumoto.
Before using, initialize the state by using init_genrand(seed)
or init_by_array(init_key, key_length).
Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Any feedback is very welcome.
http://www.math.keio.ac.jp/matumoto/emt.html
email: matumoto@math.keio.ac.jp
The socket module uses the functions, getaddrinfo(), and
getnameinfo(), which are coded in separate source files from the WIDE
Project, http://www.wide.ad.jp/.
Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
GAI_ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR GAI_ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON GAI_ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN GAI_ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
The source for the fpectl module includes the following notice:
---------------------------------------------------------------------
/ Copyright (c) 1996. \
| The Regents of the University of California. |
| All rights reserved. |
| |
| Permission to use, copy, modify, and distribute this software for |
| any purpose without fee is hereby granted, provided that this en- |
| tire notice is included in all copies of any software which is or |
| includes a copy or modification of this software and in all |
| copies of the supporting documentation for such software. |
| |
| This work was produced at the University of California, Lawrence |
| Livermore National Laboratory under contract no. W-7405-ENG-48 |
| between the U.S. Department of Energy and The Regents of the |
| University of California for the operation of UC LLNL. |
| |
| DISCLAIMER |
| |
| This software was prepared as an account of work sponsored by an |
| agency of the United States Government. Neither the United States |
| Government nor the University of California nor any of their em- |
| ployees, makes any warranty, express or implied, or assumes any |
| liability or responsibility for the accuracy, completeness, or |
| usefulness of any information, apparatus, product, or process |
| disclosed, or represents that its use would not infringe |
| privately-owned rights. Reference herein to any specific commer- |
| cial products, process, or service by trade name, trademark, |
| manufacturer, or otherwise, does not necessarily constitute or |
| imply its endorsement, recommendation, or favoring by the United |
| States Government or the University of California. The views and |
| opinions of authors expressed herein do not necessarily state or |
| reflect those of the United States Government or the University |
| of California, and shall not be used for advertising or product |
\ endorsement purposes. /
---------------------------------------------------------------------
Copyright 1996 by Sam Rushing
All Rights Reserved
Permission to use, copy, modify, and distribute this software and
its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of Sam
Rushing not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.
SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
The http.cookies module contains the following notice:
Copyright 2000 by Timothy O'Malley <timo@alum.mit.edu>
All Rights Reserved
Permission to use, copy, modify, and distribute this software
and its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of
Timothy O'Malley not be used in advertising or publicity
pertaining to distribution of the software without specific, written
prior permission.
Timothy O'Malley DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS, IN NO EVENT SHALL Timothy O'Malley BE LIABLE FOR
ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
portions copyright 2001, Autonomous Zones Industries, Inc., all rights...
err... reserved and offered to the public under the terms of the
Python 2.2 license.
Author: Zooko O'Whielacronx
http://zooko.com/
mailto:zooko@zooko.com
Copyright 2000, Mojam Media, Inc., all rights reserved.
Author: Skip Montanaro
Copyright 1999, Bioreason, Inc., all rights reserved.
Author: Andrew Dalke
Copyright 1995-1997, Automatrix, Inc., all rights reserved.
Author: Skip Montanaro
Copyright 1991-1995, Stichting Mathematisch Centrum, all rights reserved.
Permission to use, copy, modify, and distribute this Python software and
its associated documentation for any purpose without fee is hereby
granted, provided that the above copyright notice appears in all copies,
and that both that copyright notice and this permission notice appear in
supporting documentation, and that the name of neither Automatrix,
Bioreason or Mojam Media be used in advertising or publicity pertaining to
distribution of the software without specific, written prior permission.
Copyright 1994 by Lance Ellinghouse
Cathedral City, California Republic, United States of America.
All Rights Reserved
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Lance Ellinghouse
not be used in advertising or publicity pertaining to distribution
of the software without specific, written prior permission.
LANCE ELLINGHOUSE DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL LANCE ELLINGHOUSE CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Modified by Jack Jansen, CWI, July 1995:
- Use binascii module to do the actual line-by-line conversion
between ascii and binary. This results in a 1000-fold speedup. The C
version is still 5 times faster, though.
- Arguments more compliant with Python standard
The xmlrpc.client module contains the following notice:
The XML-RPC client interface is
Copyright (c) 1999-2002 by Secret Labs AB
Copyright (c) 1999-2002 by Fredrik Lundh
By obtaining, using, and/or copying this software and/or its
associated documentation, you agree that you have read, understood,
and will comply with the following terms and conditions:
Permission to use, copy, modify, and distribute this software and
its associated documentation for any purpose and without fee is
hereby granted, provided that the above copyright notice appears in
all copies, and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of
Secret Labs AB or the author not be used in advertising or publicity
pertaining to distribution of the software without specific, written
prior permission.
SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
OF THIS SOFTWARE.
Copyright (c) 2001-2006 Twisted Matrix Laboratories.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The select and contains the following notice for the kqueue interface:
Copyright (c) 2000 Doug White, 2006 James Knight, 2007 Christian Heimes
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
The file Python/dtoa.c, which supplies C functions dtoa and
strtod for conversion of C doubles to and from strings, is derived
from the file of the same name by David M. Gay, currently available
from http://www.netlib.org/fp/. The original file, as retrieved on
March 16, 2009, contains the following copyright and licensing
notice:
/****************************************************************
*
* The author of this software is David M. Gay.
*
* Copyright (c) 1991, 2000, 2001 by Lucent Technologies.
*
* Permission to use, copy, modify, and distribute this software for any
* purpose without fee is hereby granted, provided that this entire notice
* is included in all copies of any software which is or includes a copy
* or modification of this software and in all copies of the supporting
* documentation for such software.
*
* THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED
* WARRANTY. IN PARTICULAR, NEITHER THE AUTHOR NOR LUCENT MAKES ANY
* REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY
* OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
*
***************************************************************/
The modules hashlib, posix, ssl, crypt use
the OpenSSL library for added performance if made available by the
operating system. Additionally, the Windows installers for Python
include a copy of the OpenSSL libraries, so we include a copy of the
OpenSSL license here:
LICENSE ISSUES
==============
The OpenSSL toolkit stays under a dual license, i.e. both the conditions of
the OpenSSL License and the original SSLeay license apply to the toolkit.
See below for the actual license texts. Actually both licenses are BSD-style
Open Source licenses. In case of any license issues related to OpenSSL
please contact openssl-core@openssl.org.
OpenSSL License
---------------
/* ====================================================================
* Copyright (c) 1998-2008 The OpenSSL Project. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* 3. All advertising materials mentioning features or use of this
* software must display the following acknowledgment:
* "This product includes software developed by the OpenSSL Project
* for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
*
* 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
* endorse or promote products derived from this software without
* prior written permission. For written permission, please contact
* openssl-core@openssl.org.
*
* 5. Products derived from this software may not be called "OpenSSL"
* nor may "OpenSSL" appear in their names without prior written
* permission of the OpenSSL Project.
*
* 6. Redistributions of any form whatsoever must retain the following
* acknowledgment:
* "This product includes software developed by the OpenSSL Project
* for use in the OpenSSL Toolkit (http://www.openssl.org/)"
*
* THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
* EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
* ====================================================================
*
* This product includes cryptographic software written by Eric Young
* (eay@cryptsoft.com). This product includes software written by Tim
* Hudson (tjh@cryptsoft.com).
*
*/
Original SSLeay License
-----------------------
/* Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com)
* All rights reserved.
*
* This package is an SSL implementation written
* by Eric Young (eay@cryptsoft.com).
* The implementation was written so as to conform with Netscapes SSL.
*
* This library is free for commercial and non-commercial use as long as
* the following conditions are aheared to. The following conditions
* apply to all code found in this distribution, be it the RC4, RSA,
* lhash, DES, etc., code; not just the SSL code. The SSL documentation
* included with this distribution is covered by the same copyright terms
* except that the holder is Tim Hudson (tjh@cryptsoft.com).
*
* Copyright remains Eric Young's, and as such any Copyright notices in
* the code are not to be removed.
* If this package is used in a product, Eric Young should be given attribution
* as the author of the parts of the library used.
* This can be in the form of a textual message at program startup or
* in documentation (online or textual) provided with the package.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* "This product includes cryptographic software written by
* Eric Young (eay@cryptsoft.com)"
* The word 'cryptographic' can be left out if the rouines from the library
* being used are not cryptographic related :-).
* 4. If you include any Windows specific code (or a derivative thereof) from
* the apps directory (application code) you must include an acknowledgement:
* "This product includes software written by Tim Hudson (tjh@cryptsoft.com)"
*
* THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* The licence and distribution terms for any publically available version or
* derivative of this code cannot be changed. i.e. this code cannot simply be
* copied and put under another distribution licence
* [including the GNU Public Licence.]
*/
The pyexpat extension is built using an included copy of the expat
sources unless the build is configured --with-system-expat:
Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
and Clark Cooper
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The _ctypes extension is built using an included copy of the libffi
sources unless the build is configured --with-system-libffi:
Copyright (c) 1996-2008 Red Hat, Inc and others.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
``Software''), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
The zlib extension is built using an included copy of the zlib
sources if the zlib version found on the system is too old to be
used for the build:
Copyright (C) 1995-2011 Jean-loup Gailly and Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Jean-loup Gailly Mark Adler
jloup@gzip.org madler@alumni.caltech.edu