📜  使用 pickle 和 cPickle 模块序列化数据

📅  最后修改于: 2022-05-13 01:54:26.155000             🧑  作者: Mango

使用 pickle 和 cPickle 模块序列化数据

序列化是将对象存储为字节或字符流的过程,以便在网络上传输它或将其存储在磁盘上,以便在需要时重新创建它及其状态。相反的过程称为反序列化。

在Python,Pickle 模块为我们提供了序列化和反序列化Python对象的方法。 Pickle 是一个强大的库,它可以序列化许多其他库无法做到的复杂和自定义对象。就像pickle一样,有一个cPickle模块与pickle共享相同的方法,但它是用C编写的。cPickle模块是作为C函数而不是类格式编写的。

Pickle 和 cPickle 的区别:

  • Pickle 使用基于Python类的实现,而 cPickle 被编写为 C 函数。因此,cPickle 比pickle 快很多倍。
  • Pickle 在Python 2.x 和Python 3.x 中都可用,而 cPickle 默认在Python 2.x 中可用。要在Python 3.x 中使用 cPickle,我们可以导入 _pickle。
  • cPickle 不支持 pickle 的子类。如果子类化不重要,cPickle 会更好,否则 Pickle 是最好的选择。

由于pickle 和cPickle 共享相同的接口,因此我们可以以相同的方式使用它们。下面是一个示例代码作为参考:

Python3
try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Open a file to write bytes
p_file = open('model.pkl', 'wb')
  
# Pickle the object
pickle.dump(model, p_file)
p_file.close()
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights after pickling', new_model.get_weights())


Python3
try:
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())


Python3
try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# If the file is available,
# we can use import statement to import the class
  
# A custom class to demonstrate pickling
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0, 0, 0]
  
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
  
    def get_weights(self):
        return self.weights
  
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())


Python3
try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Pickle the object
byte_string = pickle.dumps(model)
  
print("The bytes of object are:",byte_string)
  
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
  
print('Weights after depickling', new_model.get_weights())


输出:



在上面的代码中,我们创建了一个自定义类 ModelTrainer 来初始化一个 0 的列表。 train() 方法用一些随机值填充列表,get_weight() 方法返回生成的值。接下来,我们创建了模型对象并打印了生成的权重。我们以“wb”(写入字节)模式创建了一个新文件。 dump() 方法将对象作为字节流转储到文件中。验证是通过将文件加载到新对象中并打印权重来完成的。

Pickle 模块对于Python对象非常强大。但它只能保留数据,不能保留类结构。因此,如果我们不提供类定义,任何自定义类对象都不会加载。以下是脱酸失败的示例:

蟒蛇3

try:
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

输出:

产生上面的错误是因为我们当前的脚本不知道这个对象的类。因此,我们可以说pickle只保存对象内部的数据,而不能保存方法和类结构。

要纠正上述错误,我们必须向脚本提供类定义。以下是如何正确加载自定义对象的示例:

蟒蛇3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# If the file is available,
# we can use import statement to import the class
  
# A custom class to demonstrate pickling
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0, 0, 0]
  
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
  
    def get_weights(self):
        return self.weights
  
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

输出:

我们为 ModelTrainer 类提供了参考。脚本现在可以识别该类,并且可以再次调用构造函数来构建对象。我们可以简单地从以前的文件中导入它,而不是键入整个类代码。

序列化为字符串

我们也可以将对象序列化为字符串。 Pickle 和 cPickle 模块提供 dumps() 和 loading() 方法。 dumps() 方法将对象作为参数并返回编码后的字符串。 load() 方法则相反。它接受编码后的字符串并返回原始对象。下面是将自定义对象序列化为字符串的代码。

蟒蛇3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Pickle the object
byte_string = pickle.dumps(model)
  
print("The bytes of object are:",byte_string)
  
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
  
print('Weights after depickling', new_model.get_weights())

输出: