TensorFlow 中的 CIFAR-10 图像分类

先决条件：

图像分类
卷积神经网络包括基本池化、神经网络中具有归一化的卷积层和 dropout。
数据增强。
神经网络。
Numpy 数组。

在本文中，我们将讨论如何使用 TensorFlow 对图像进行分类。图像分类是一种将图像分类到它们各自的类别类中的方法。正如它所暗示的那样，CIFAR-10 数据集有 10 种不同类别的图像。共有 60000 张图像，分为 10 个不同的类别，分别命名为飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船、卡车。所有图像的大小均为 32×32。总共有 50000 张训练图像和 10000 张测试图像。

为了构建图像分类器，我们使用 tensorflow 的 keras API 来构建我们的模型。为了构建模型，建议有 GPU 支持，或者您也可以使用 Google colab notebooks。

分步实施：

编写任何代码的第一步是导入所有必需的库和模块。这包括导入 tensorflow 和其他模块，如 numpy。如果模块不存在，那么您可以在命令提示符下使用pip install tensorflow下载它（对于 Windows），或者如果您使用的是 jupyter notebook，那么只需在单元格中输入!pip install tensorflow并运行它以下载模块.其他模块也可以类似导入。

Python3

import tensorflow as tf  
 
# Display the version
print(tf.__version__)    
 
# other imports
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout
from tensorflow.keras.layers import GlobalMaxPooling2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import Model

Python3

# Load in the data
cifar10 = tf.keras.datasets.cifar10
 
# Distribute it to train and test set
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

Python3

# Reduce pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0
 
# flatten the label values
y_train, y_test = y_train.flatten(), y_test.flatten()

Python3

# visualize data by plotting images
fig, ax = plt.subplots(5, 5)
k = 0
 
for i in range(5):
    for j in range(5):
        ax[i][j].imshow(x_train[k], aspect='auto')
        k += 1
 
plt.show()

Python3

# number of classes
K = len(set(y_train))
 
# calculate total number of classes
# for output layer
print("number of classes:", K)
 
# Build the model using the functional API
# input layer
i = Input(shape=x_train[0].shape)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Flatten()(x)
x = Dropout(0.2)(x)
 
# Hidden layer
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
 
# last hidden layer i.e.. output layer
x = Dense(K, activation='softmax')(x)
 
model = Model(i, x)
 
# model description
model.summary()

Python3

# Compile
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Python3

# Fit
r = model.fit(
  x_train, y_train, validation_data=(x_test, y_test), epochs=50)

Python3

# Fit with data augmentation
# Note: if you run this AFTER calling
# the previous model.fit()
# it will CONTINUE training where it left off
batch_size = 32
data_generator = tf.keras.preprocessing.image.ImageDataGenerator(
  width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True)
 
train_generator = data_generator.flow(x_train, y_train, batch_size)
steps_per_epoch = x_train.shape[0] // batch_size
 
r = model.fit(train_generator, validation_data=(x_test, y_test),
              steps_per_epoch=steps_per_epoch, epochs=50)

Python3

# Plot accuracy per iteration
plt.plot(r.history['accuracy'], label='acc', color='red')
plt.plot(r.history['val_accuracy'], label='val_acc', color='green')
plt.legend()

Python3

# label mapping
 
labels = '''airplane automobile bird cat deerdog frog horseship truck'''.split()
 
# select the image from our test dataset
image_number = 0
 
# display the image
plt.imshow(x_test[image_number])
 
# load the image in an array
n = np.array(x_test[image_number])
 
# reshape it
p = n.reshape(1, 32, 32, 3)
 
# pass in the network for prediction and
# save the predicted label
predicted_label = labels[model.predict(p).argmax()]
 
# load the original label
original_label = labels[y_test[image_number]]
 
# display the result
print("Original label is {} and predicted label is {}".format(
    original_label, predicted_label))

Python3

# save the model
model.save('geeksforgeeks.h5')

输出：

2.4.1

上述代码的输出应显示您使用的 tensorflow 版本，例如 2.4.1 或任何其他版本。

现在我们有了所需的模块支持，所以让我们加载我们的数据。 CIFAR-10 的数据集在 tensorflow keras API 上可用，我们可以使用tensorflow.keras.daatasets.cifar10将其下载到我们的本地机器上，然后使用 load_data()函数将其分发到训练和测试集。

蟒蛇3

# Load in the data
cifar10 = tf.keras.datasets.cifar10
 
# Distribute it to train and test set
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

输出：

上面代码的输出将显示所有四个分区的形状，看起来像这样

在这里我们可以看到我们有 5000 张训练图像和 1000 张测试图像，所有图像都是 32 x 32 大小，有 3 个颜色通道，即图像是彩色图像。此外还可以看到，每个图像只分配了一个标签。

直到现在，我们都拥有我们的数据。但是，我们仍然无法将其直接发送到我们的神经网络。我们需要处理数据才能将其发送到网络。该过程中的第一件事是减少像素值。目前，所有图像像素都在 1-256 的范围内，我们需要将这些值减少到 0 到 1 之间的值。这使我们的模型能够轻松跟踪趋势和有效训练。我们可以通过将所有像素值除以 255.0 来简单地做到这一点。

我们想做的另一件事是使用 flatten()函数扁平化（简单地说，将它们以行的形式重新排列）标签值。

蟒蛇3

# Reduce pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0
 
# flatten the label values
y_train, y_test = y_train.flatten(), y_test.flatten()

现在是查看我们数据集的一些图像的好时机。我们可以将其可视化为子图网格形式。由于图像大小仅为 32×32，因此不要对图像期望太多。这将是一个模糊的。我们可以使用 matplotlib 中的 subplot()函数并循环训练数据集部分的前 25 张图像来进行可视化。

蟒蛇3

# visualize data by plotting images
fig, ax = plt.subplots(5, 5)
k = 0
 
for i in range(5):
    for j in range(5):
        ax[i][j].imshow(x_train[k], aspect='auto')
        k += 1
 
plt.show()

输出：

虽然图像不清晰，但有足够的像素让我们指定这些图像中存在哪个对象。

完成所有步骤后，现在是构建我们模型的时候了。我们将使用卷积神经网络或 CNN 来训练我们的模型。它包括在其中使用卷积层，即 Conv2d 层以及池化和归一化方法。最后，我们将把它传递到一个密集层和最后一个密集层，即我们的输出层。我们正在使用“relu”激活函数。输出层使用“softmax”函数。

蟒蛇3

# number of classes
K = len(set(y_train))
 
# calculate total number of classes
# for output layer
print("number of classes:", K)
 
# Build the model using the functional API
# input layer
i = Input(shape=x_train[0].shape)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
 
x = Flatten()(x)
x = Dropout(0.2)(x)
 
# Hidden layer
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
 
# last hidden layer i.e.. output layer
x = Dense(K, activation='softmax')(x)
 
model = Model(i, x)
 
# model description
model.summary()

输出：

我们的模型现在已经准备好了，是时候编译它了。我们正在使用 model.compile()函数来编译我们的模型。对于参数，我们使用

亚当优化器
sparse_categorical_crossentropy 作为损失函数
指标=['准确度']

蟒蛇3

# Compile
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

现在让我们使用 model.fit() 将我们所有的数据传递给它来拟合我们的模型。我们将训练我们的模型直到 50 个 epochs，它给了我们一个公平的结果，尽管你可以根据需要调整它。

蟒蛇3

# Fit
r = model.fit(
  x_train, y_train, validation_data=(x_test, y_test), epochs=50)

输出：

模型将开始训练，它看起来像这样

在此之后，我们的模型被训练。虽然它会工作得很好，但为了使我们的模型更加准确，我们可以在我们的数据上添加数据增强，然后再次训练它。对增强数据再次调用 model.fit() 将在停止的地方继续训练。我们将在 32 的批量大小上激活我们的数据，我们将宽度和高度的范围移动 0.1 并水平翻转图像。然后再次调用 model.fit 50 个时期。

蟒蛇3

# Fit with data augmentation
# Note: if you run this AFTER calling
# the previous model.fit()
# it will CONTINUE training where it left off
batch_size = 32
data_generator = tf.keras.preprocessing.image.ImageDataGenerator(
  width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True)
 
train_generator = data_generator.flow(x_train, y_train, batch_size)
steps_per_epoch = x_train.shape[0] // batch_size
 
r = model.fit(train_generator, validation_data=(x_test, y_test),
              steps_per_epoch=steps_per_epoch, epochs=50)

输出：

该模型将开始训练 50 个 epoch。虽然它在 GPU 上运行，但至少需要 10 到 15 分钟。

现在我们已经训练了我们的模型，在对其进行任何预测之前，让我们将每次迭代的准确性可视化以进行更好的分析。尽管还有其他方法包括混淆矩阵，以便更好地分析模型。

蟒蛇3

# Plot accuracy per iteration
plt.plot(r.history['accuracy'], label='acc', color='red')
plt.plot(r.history['val_accuracy'], label='val_acc', color='green')
plt.legend()

输出：

让我们使用 model.predict()函数对模型中的图像进行预测。在将图像发送到我们的模型之前，我们需要再次减少 0 和 1 之间的像素值并将其形状更改为 (1,32,32,3)，因为我们的模型期望输入仅采用这种形式。为了让事情变得简单，让我们从数据集本身获取一张图像。它已经是减少像素格式，我们仍然必须使用 reshape()函数对其进行整形（1,32,32,3）。由于我们使用的是数据集中的数据，我们可以比较预测输出和原始输出。

蟒蛇3

# label mapping
 
labels = '''airplane automobile bird cat deerdog frog horseship truck'''.split()
 
# select the image from our test dataset
image_number = 0
 
# display the image
plt.imshow(x_test[image_number])
 
# load the image in an array
n = np.array(x_test[image_number])
 
# reshape it
p = n.reshape(1, 32, 32, 3)
 
# pass in the network for prediction and
# save the predicted label
predicted_label = labels[model.predict(p).argmax()]
 
# load the original label
original_label = labels[y_test[image_number]]
 
# display the result
print("Original label is {} and predicted label is {}".format(
    original_label, predicted_label))

输出：

现在我们的输出是原始标签是猫，预测标签也是猫。

让我们检查它是否有一些被我们的模型错误分类的标签，例如，对于图像编号 5722，我们收到如下内容：

最后，让我们使用 model.save()函数将模型保存为 h5 文件。如果您使用的是 Google collab，您可以从文件部分下载您的模型。

蟒蛇3

# save the model
model.save('geeksforgeeks.h5')

因此，通过这种方式，人们可以使用 Tensorflow 对图像进行分类。