Python pickle类库介绍(对象序列化和反序列化)

一、pickle

pickle模块用来实现python对象的序列化和反序列化。通常地pickle将python对象序列化为二进制流或文件。
 
python对象与文件之间的序列化和反序列化:


pickle.dump()

pickle.load()


如果要实现python对象和字符串间的序列化和反序列化,则使用:

pickle.dumps()

pickle.loads()


 
可以被序列化的类型有:
* None,True 和 False;
* 整数,浮点数,复数;
* 字符串,字节流,字节数组;
* 包含可pickle对象的tuples,lists,sets和dictionaries;
* 定义在module顶层的函数:
* 定义在module顶层的内置函数;
* 定义在module顶层的类;
* 拥有__dict__()或__setstate__()的自定义类型;
 

注意:对于函数或类的序列化是以名字来识别的,所以需要import相应的module。

二、pickle的运行过程

在大部分情况下,要是的对象picklable,我们不需要额外的代码。默认地pickle将智能地检查类和实例的属性,当一个类实例反序列化的时候,它的__init__()方法通常不被调用。而是首先创建一个未初始化的实例,然后再回复存储的属性。
 

但是可以通过实现下列的方法来修改默认的行为:


object.__getstate__() :默认地序列化对象的__dict__,但是如果你实现了__getstate__(),则__getstate__()函数返回的值将被序列化。

object.__setstate__(state) :如果类型实现了此方法,则在反序列化的时候,此方法用来恢复对象的属性。

object.__getnewargs__() : 如果实例构造的时候(__new__())需要参数,则需要实现此函数。


注意:如果__getstate__()返回False,则在反序列化的时候__setstate__()则不被调用。

有的时候为了效率,或上面的3个函数不能满足需求时,需要实现__reduce__()函数。

三、实例


import pickle

# An arbitrary collection of objects supported by pickle. data = {     'a': [1, 2.0, 3, 4+6j],     'b': ("character string", b"byte string"),     'c': set([None, True, False]) }

with open('data.pickle', 'wb') as f:     # Pickle the 'data' dictionary using the highest protocol available.     pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

    with open('data.pickle', 'rb') as f:     # The protocol version used is detected automatically, so we do not     # have to specify it.     data = pickle.load(f)     print(str(data))

四、修改picklable类型的默认行为  


class TextReader:

    """Print and number lines in a text file."""

    def __init__(self, filename):         self.filename = filename         self.file = open(filename)         self.lineno = 0

    def readline(self):         self.lineno += 1         line = self.file.readline()         if not line:             return None         if line.endswith('\n'):             line = line[:-1]         return "%i: %s" % (self.lineno, line)

    def __getstate__(self):         # Copy the object's state from self.__dict__ which contains         # all our instance attributes. Always use the dict.copy()         # method to avoid modifying the original state.         state = self.__dict__.copy()         # Remove the unpicklable entries.         del state['file']         return state

    def __setstate__(self, state):         # Restore instance attributes (i.e., filename and lineno).         self.__dict__.update(state)         # Restore the previously opened file's state. To do so, we need to         # reopen it and read from it until the line count is restored.         file = open(self.filename)         for _ in range(self.lineno):             file.readline()         # Finally, save the file.         self.file = file         reader = TextReader("hello.txt") print(reader.readline()) print(reader.readline()) s = pickle.dumps(reader) #print(s) new_reader = pickle.loads(s) print(new_reader.readline())

# the output is # 1: hello # 2: how are you # 3: goodbye