The pickle module has an optimized cousin called the
cPickle module. As its name implies, cPickle is
written in C, so it can be up to 1000 times faster than
pickle. However it does not support subclassing of the
Pickler() and Unpickler() classes, because in
cPickle these are functions, not classes. Most applications
have no need for this functionality, and can benefit from the improved
performance of cPickle. Other than that, the interfaces of
the two modules are nearly identical; the common interface is
described in this manual and differences are pointed out where
necessary. In the following discussions, we use the term ``pickle''
to collectively describe the pickle and
cPickle modules.
The data streams the two modules produce are guaranteed to be
interchangeable.
Python has a more primitive serialization module called
marshal, but in general
pickle should always be the preferred way to serialize Python
objects. marshal exists primarily to support Python's
.pyc files.
The pickle module differs from marshal several
significant ways:
The pickle module keeps track of the objects it has
already serialized, so that later references to the same object
won't be serialized again. marshal doesn't do this.
This has implications both for recursive objects and object
sharing. Recursive objects are objects that contain references
to themselves. These are not handled by marshal, and in fact,
attempting to marshal recursive objects will crash your Python
interpreter. Object sharing happens when there are multiple
references to the same object in different places in the object
hierarchy being serialized. pickle stores such objects
only once, and ensures that all other references point to the
master copy. Shared objects remain shared, which can be very
important for mutable objects.
marshal cannot be used to serialize user-defined
classes and their instances. pickle can save and
restore class instances transparently, however the class
definition must be importable and live in the same module as
when the object was stored.
The marshal serialization format is not guaranteed to
be portable across Python versions. Because its primary job in
life is to support .pyc files, the Python implementers
reserve the right to change the serialization format in
non-backwards compatible ways should the need arise. The
pickle serialization format is guaranteed to be
backwards compatible across Python releases.
The pickle module doesn't handle code objects, which
the marshal module does. This avoids the possibility
of smuggling Trojan horses into a program through the
pickle module3.3.
Note that serialization is a more primitive notion than persistence;
although
pickle reads and writes file objects, it does not handle the
issue of naming persistent objects, nor the (even more complicated)
issue of concurrent access to persistent objects. The pickle
module can transform a complex object into a byte stream and it can
transform the byte stream into an object with the same internal
structure. Perhaps the most obvious thing to do with these byte
streams is to write them onto a file, but it is also conceivable to
send them across a network or store them in a database. The module
shelve provides a simple interface
to pickle and unpickle objects on DBM-style database files.
This doesn't necessarily imply
that pickle is inherently secure. See
section 3.14.6 for a more detailed discussion on
pickle module security. Besides, it's possible that
pickle will eventually support serializing code
objects.