...making Linux just a little more fun!

Introducing Python Pickling

By Amit Kumar Saha

The motivation for writing this article came when I was working on my first major Python project and I wanted a way to write my class data to a on-disk file, just the way I had done it on numerous occasions in C, where I wrote the structure data to a file. So if you want to learn the Pythonic way of persistence storage of your class data, this is for you. Let us start!

A. Pickle, Unpickle

A pickle is a Python object represented as a string of bytes. Sounds utterly simple? Oh well, it is that simple! This process is called Pickling. So we have successfully converted our object into some bytes, now how do we get that back? To unpickle means to reconstruct the Python object from the pickled string of bytes. Strictly speaking its not reconstruction in a physical sense - it only means that if we have pickled a list, L, then after unpickling we can get back the contents of list simply by again accessing L.

The terms 'pickle' and 'unpickle' are related to object serialization and de-serialization respectively, which are language-neutral related terms for a process that turns arbitrarily complex objects into textual or binary representations of those objects and back.

A.1 The 'pickle' Module

The pickle module implements the functions to dump the class instance's data to a file and load the pickled data to make it usable.

Consider the Demo class below:

import pickle

class Demo:
	def __init__(self):
		self.a = 6
		self.l = ('hello','world')
		print self.a,self.l

Now, we will create an instance of Demo and pickle it.

>>> f=Demo()
6 ('hello', 'world')
>>> pickle.dumps(f)
"(i__main__\nDemo\np0\n(dp1\nS'a'\np2\nI6\nsS'l'\np3\n(S'hello'\np4\nS'world'\np5\ntp6\nsb.

The dumps function pickles the object and dumps the pickled object on the screen. I am sure that this is not really comprehensible and doesn't look very useful - but if we dump the pickled object to a on-disk file, the utility increases many fold. This is what we'll do next. Let's modify our code slightly to include the pickling code:

import pickle

class Demo:
	def __init__(self):
		self.a = 6
		self.l = ('hello','world')
		print self.a,self.l
		

if __name__ == "__main__":
        f=Demo()
        pickle.dump(f, file('Demo.pickle','w'))


Now, let us unpickle:

>>> f3=pickle.load(file("Demo.pickle"))
>>> f3.a
6
>>> f3.l
('hello', 'world')
>>> 

So far, so good.

A.2 The 'cPickle' Module

cPickle is an extension module written in C to provide pickling facilities which is about 1000 times faster than the pickle module. The usage is the same as pickle. Pickles produced by each are compatible.

>>> import cPickle
>>> f3=cPickle.load(file("Demo.pickle"))
>>> f3.l
('hello', 'world')

B. A Glimpse Behind the Scenes

The data format used by pickle is Python specific, which obviously discards pickling as an option for persistent storage if you are looking for a language-neutral solution. Human-readable and thus easily debuggable ASCII is the default format used by Python for writing pickled objects. There are 3 different protocols which can be used for pickling:

  1. Protocol version 0 is the original ASCII protocol and is backward compatible with earlier versions of Python.
  2. Protocol version 1 is the old binary format which is also compatible with earlier versions of Python.
  3. Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.

C. Conclusion

The basic goal of this short tutorial was a hands-on introduction to pickling in Python as a method of writing class data to persistent storage, especially for new Python programmers. I have intentionally left out issues related to working with complex and bigger classes, for which some good resources are listed below. Again, more basic things such as pickling simple lists and dictionaries have been omitted, but this will not require much looking around to find the answers.

I hope that you are ready to use pickling in your projects. Happy coding!

References:

  1. Python persistence management
  2. pickle module
  3. cPickle module

Talkback: Discuss this article with The Answer Gang


Bio picture

The author is a freelance technical writer. He mainly writes on the Linux kernel, Network Security and XML.


Copyright © 2007, Amit Kumar Saha. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 143 of Linux Gazette, October 2007

Tux