使用 pickle 和 cPickle 模块序列化数据

序列化是将对象存储为字节或字符流的过程，以便在网络上传输它或将其存储在磁盘上，以便在需要时重新创建它及其状态。相反的过程称为反序列化。

在Python，Pickle 模块为我们提供了序列化和反序列化Python对象的方法。 Pickle 是一个强大的库，它可以序列化许多其他库无法做到的复杂和自定义对象。就像pickle一样，有一个cPickle模块与pickle共享相同的方法，但它是用C编写的。cPickle模块是作为C函数而不是类格式编写的。

Pickle 和 cPickle 的区别：

Pickle 使用基于Python类的实现，而 cPickle 被编写为 C 函数。因此，cPickle 比pickle 快很多倍。
Pickle 在Python 2.x 和Python 3.x 中都可用，而 cPickle 默认在Python 2.x 中可用。要在Python 3.x 中使用 cPickle，我们可以导入 _pickle。
cPickle 不支持 pickle 的子类。如果子类化不重要，cPickle 会更好，否则 Pickle 是最好的选择。

由于pickle 和cPickle 共享相同的接口，因此我们可以以相同的方式使用它们。下面是一个示例代码作为参考：

Python3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Open a file to write bytes
p_file = open('model.pkl', 'wb')
  
# Pickle the object
pickle.dump(model, p_file)
p_file.close()
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights after pickling', new_model.get_weights())

Python3

try:
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

Python3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# If the file is available,
# we can use import statement to import the class
  
# A custom class to demonstrate pickling
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0, 0, 0]
  
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
  
    def get_weights(self):
        return self.weights
  
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

Python3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Pickle the object
byte_string = pickle.dumps(model)
  
print("The bytes of object are:",byte_string)
  
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
  
print('Weights after depickling', new_model.get_weights())

输出：

Weights before pickling [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]

Weights after pickling [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]

编程需要懂一点英语

在上面的代码中，我们创建了一个自定义类 ModelTrainer 来初始化一个 0 的列表。 train() 方法用一些随机值填充列表，get_weight() 方法返回生成的值。接下来，我们创建了模型对象并打印了生成的权重。我们以“wb”（写入字节）模式创建了一个新文件。 dump() 方法将对象作为字节流转储到文件中。验证是通过将文件加载到新对象中并打印权重来完成的。

Pickle 模块对于Python对象非常强大。但它只能保留数据，不能保留类结构。因此，如果我们不提供类定义，任何自定义类对象都不会加载。以下是脱酸失败的示例：

蟒蛇3

try:
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

输出：

Traceback (most recent call last):

File “des.py”, line 12, in

new_model = pickle.load(file)

AttributeError: Can’t get attribute ‘ModelTrainer’ on

编程需要懂一点英语

产生上面的错误是因为我们当前的脚本不知道这个对象的类。因此，我们可以说pickle只保存对象内部的数据，而不能保存方法和类结构。

要纠正上述错误，我们必须向脚本提供类定义。以下是如何正确加载自定义对象的示例：

蟒蛇3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# If the file is available,
# we can use import statement to import the class
  
# A custom class to demonstrate pickling
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0, 0, 0]
  
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
  
    def get_weights(self):
        return self.weights
  
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
  
print('Weights of model', new_model.get_weights())

输出：

Weights of model [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]

编程需要懂一点英语

我们为 ModelTrainer 类提供了参考。脚本现在可以识别该类，并且可以再次调用构造函数来构建对象。我们可以简单地从以前的文件中导入它，而不是键入整个类代码。

序列化为字符串

我们也可以将对象序列化为字符串。 Pickle 和 cPickle 模块提供 dumps() 和 loading() 方法。 dumps() 方法将对象作为参数并返回编码后的字符串。 load() 方法则相反。它接受编码后的字符串并返回原始对象。下面是将自定义对象序列化为字符串的代码。

蟒蛇3

try:
    
    # In python 2.x it is available as default
    import cPickle as pickle
except ImportError:
    
    # In python 3.x cPickle is not available
    import pickle
  
import random
  
# A custom class to demonstrate pickling 
class ModelTrainer:
    def __init__(self) -> None:
        self.weights = [0,0,0]
      
    def train(self):
        for i in range(len(self.weights)):
            self.weights[i] = random.random()
      
    def get_weights(self):
        return self.weights
  
# Create an object 
model = ModelTrainer()
  
# Populate the data
model.train()
  
print('Weights before pickling', model.get_weights())
  
# Pickle the object
byte_string = pickle.dumps(model)
  
print("The bytes of object are:",byte_string)
  
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
  
print('Weights after depickling', new_model.get_weights())

输出：

Weights before pickling [0.923474126606742, 0.34909608824193983, 0.3761122243447367]

The bytes of object are: b’\x80\x03c__main__\nModelTrainer\nq\x00)\x81q\x01}q\x02X\x07\x00\x00\x00weightsq\x03]q\x04(G?\xed\x8d\x19\x9c\x8fL\xc3G?\xd6W\x97\x1e\x8aHHG?\xd8\x129\x01\xcb\xee\xf2esb.’

Weights after depickling [0.923474126606742, 0.34909608824193983, 0.3761122243447367]

编程需要懂一点英语