📜  多标签图像分类——图像标签的预测

📅  最后修改于: 2022-05-13 01:54:19.946000             🧑  作者: Mango

多标签图像分类——图像标签的预测

我们可以使用计算机视觉算法做很多事情:

  • 物体检测
  • 图像分割
  • 图片翻译
  • 对象跟踪(实时)等等。

什么是多标签图像分类?
让我们通过一个直观的例子来理解多标签图像分类的概念。如果我给你看一个球的图像,你会很容易在你的脑海中把它归类为一个球。我给你看的下一张图片是一个露台。现在我们可以将这两个图像分为两类,即有球或无球。
当我们只有两个可以对图像进行分类的类别时,这被称为二值图像分类问题。

  • 当有两个以上的类别可以对图像进行分类时。
  • 一张图片不属于一个以上的类别

如果同时满足上述两个条件,则称为多类图像分类问题。
先决条件:
让我们从一些先决条件开始:
在这里,我们将使用以下语言和编辑器:

  • 语言/解释器:来自Python .org 的Python 3(最好是Python 3.8)
  • 编辑:Jupyter iPython Notebook
  • 操作系统:Windows 10 x64
  • 数据集:请从 Kaggle 或 Internet 下载任何图像数据集。
  • Python要求:该项目需要通过 pip 安装以下库:Numpy、Pandas、MatPlotLib、Scikit Learn、Scikit Image。

应遵循的步骤:

标签分类步骤

第 1 步:导入我们需要的库。

python3
# system libraries
import os
import warnings
 
# ignoring all the warnings
warnings.simplefilter('ignore')
 
# import data handling libraries
import numpy as np
import pandas as pd
 
# importing data visualisation libraries
import matplotlib.pyplot as plt
% matplotlib inline
 
# import image processing library
from skimage.io import imread, imshow
from skimage.transform import resize
from skimage.color import rgb2grey


python3
r = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\r")
# This is the path to the image folder
 
v = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\v")
d = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\d")
 
print(r[0:10])


python3
limit = 100
# Creating the list of blank spaces that can potentially hold data:
ra_images = [None]*limit
 
# Creating loop variables:
i, j = 0, 0
 
# This part of the code loops over all the images
# in the list "r" and reads it into the jth element
# of the array we made in line 2.
for i in r:
    if(j


python3
# Finding out the number of columns per image in our dataset.
# We will use the shape function on any one image in our array
# and use the dimensions we get as the number of columns in row.
number_of_columns = ra_grey[1].shape[0] * ra_grey[1].shape[1]
print(number_of_columns)
 
# This means we will be using this many columns
# per row in our dataset.
# Our dataset has 300 images, so:
# Our dataset will be an array of dimensions
# 784 x 300 => 300 images of 784 pixels each.


python3
print(ra_grey[0].shape)
for i in range(limit):
    ra_grey[i] = np.ndarray.flatten(ra_grey[i]).reshape(number_of_columns, 1)
print(ra_grey[0].shape)
 
# We will use NumPy's dstack and rollaxis to remove the extra axis(the 1 part in last output) that we saw in the above code output.
 
ra_grey = np.dstack(ra_grey)
print(ra_grey.shape)
ra_grey = np.rollaxis(ra_grey, axis = 2, start = 0)
print(ra_grey.shape)
ra_grey = ra_grey.reshape(limit, number_of_columns)
print(ra_grey.shape)


python3
ra_data = pd.DataFrame(ra_grey)
dh_data = pd.DataFrame(dh_grey)
vi_data = pd.DataFrame(vi_grey)
 
ra_data
 
print(ra_data)


python3
ra_data["label"]="R"
dh_data["label"]="D"
vi_data["label"]="V"
 
vi_data
 
# Joining and mixing the data into one dataframe.
# First, we will start with joining all 3 dataframes
# made in 3.2 into a single dataframe, using concat function.
# Note: It is recommended to join the first 2,
# then join the last one into the first pair.
 
act = pd.concat([ra_data, dh_data])
actor = pd.concat([act, vi_data])
 
actor


python3
from sklearn.utils import shuffle
out = shuffle(actor).reset_index()
 
out
 
# Drop the column named index
out = out.drop(['index'], axis = 1)
out


python3
# First, we will extract the x and y values of our dataset
 
x = out.values[:, :-1]
y = out.values[:, -1]
 
print(x[0:3])
print(y[0:3])
 
# From the above output, we can see that:
# x - stores the image data.
# y - stores the label data.


python3
from sklearn.decomposition import PCA                  
from sklearn.svm import SVC                            
from sklearn.pipeline import make_pipeline             
from sklearn.model_selection import train_test_split   
from sklearn.model_selection import GridSearchCV       
from sklearn import metrics                          
 
# Here we will use train_test_split to create our training and testing data.
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 0)
 
pca = PCA(n_components = 150, whiten = True, random_state = 0)
svc = SVC(kernel ='rbf', class_weight ='balanced')
model = make_pipeline(pca, svc)
 
params = {'svc__C': [x for x in range(1, 6)],
          'svc__gamma': [0.001, 0.005, 0.006, 0.01, 0.05, 0.06, 0.004, 0.04]}
 
grid = GridSearchCV(model, params)
% time grid.fit(x_train, y_train)
print(grid.best_params_)
 
model = grid.best_estimator_
ypred = model.predict(x_test)
 
ypred[0:3]


python3
fig, ax = plt.subplots(4, 4, sharex = True,
                             sharey = True,
                        figsize = (10, 10))
 
for i, axi in enumerate(ax.flat):
    axi.imshow(x_test[i].reshape(imsize).astype(np.float64),
                   cmap = "gray", interpolation = "nearest")
 
    axi.set_title('Label : {}'.format(ypred[i]))
     
# Finally, we test our accuracy in using the following code:
print(metrics.accuracy_score(y_test, ypred) * 100)


第 2 步:将目标图像读入项目
在本文的这一部分中,我们将指示Python一张一张地读取图像,然后将图像的像素数据插入我们可以使用的数组中。然后我们将通过 Python 的 os 库创建文件列表。

蟒蛇3

r = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\r")
# This is the path to the image folder
 
v = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\v")
d = os.listdir(r"C:\Users\Garima Singh\Desktop\data\mugshots\d")
 
print(r[0:10])

第 3 步:从图像创建和导入数据并设置限制。
在这里,我们将使用 NumPy 和 scikit-image 的 imread函数。由于我们有下载的数据,我们可以快速计算每个主题有多少张图像。例如,假设您在每个文件夹(r、v 和 d)中有 100 张图像,您可以将变量限制设置为 100。下一步是为此数据创建空数组并用数据填充这些数组。我们将快速制作 3 个数组来容纳一系列图像的数据。我们使用以下代码片段创建一个填充“None”值的数组:

蟒蛇3

limit = 100
# Creating the list of blank spaces that can potentially hold data:
ra_images = [None]*limit
 
# Creating loop variables:
i, j = 0, 0
 
# This part of the code loops over all the images
# in the list "r" and reads it into the jth element
# of the array we made in line 2.
for i in r:
    if(j

第 4 步:数据集的组装以及数组的展平和重塑。
在本节中,我们将使用 pandas Data Frame 将这 3 个数据数组合并为一个数据数组。现在我们的图像数组大小为 28×28。我们需要将这个数组变成一个 28^2 x 1 的数组。这基本上意味着我们必须拍摄每张图像并将其转换为数据集中的一行数据。

蟒蛇3

# Finding out the number of columns per image in our dataset.
# We will use the shape function on any one image in our array
# and use the dimensions we get as the number of columns in row.
number_of_columns = ra_grey[1].shape[0] * ra_grey[1].shape[1]
print(number_of_columns)
 
# This means we will be using this many columns
# per row in our dataset.
# Our dataset has 300 images, so:
# Our dataset will be an array of dimensions
# 784 x 300 => 300 images of 784 pixels each.

第 5 步:展平和重塑数据。
这是首先将 28×28 数组转换为列向量(即 784 x 1 矩阵)的代码部分。

蟒蛇3

print(ra_grey[0].shape)
for i in range(limit):
    ra_grey[i] = np.ndarray.flatten(ra_grey[i]).reshape(number_of_columns, 1)
print(ra_grey[0].shape)
 
# We will use NumPy's dstack and rollaxis to remove the extra axis(the 1 part in last output) that we saw in the above code output.
 
ra_grey = np.dstack(ra_grey)
print(ra_grey.shape)
ra_grey = np.rollaxis(ra_grey, axis = 2, start = 0)
print(ra_grey.shape)
ra_grey = ra_grey.reshape(limit, number_of_columns)
print(ra_grey.shape)

第 6 步:将数组转换为数据帧
如前所述,pandas 为我们的表格制作了一个类似电子表格软件的环境。让我们将数组转换为数据帧:

蟒蛇3

ra_data = pd.DataFrame(ra_grey)
dh_data = pd.DataFrame(dh_grey)
vi_data = pd.DataFrame(vi_grey)
 
ra_data
 
print(ra_data)

第 7 步:为图像添加名称。
在这一步中,我们添加一个包含主题名称的列。
这称为标记我们的图像。该模型将尝试根据这些值进行预测,并将输出这些标签之一。

蟒蛇3

ra_data["label"]="R"
dh_data["label"]="D"
vi_data["label"]="V"
 
vi_data
 
# Joining and mixing the data into one dataframe.
# First, we will start with joining all 3 dataframes
# made in 3.2 into a single dataframe, using concat function.
# Note: It is recommended to join the first 2,
# then join the last one into the first pair.
 
act = pd.concat([ra_data, dh_data])
actor = pd.concat([act, vi_data])
 
actor

第 8 步:打乱数据并打印最终数据集
这是本节的最后一个阶段。我们将对数据进行洗牌,使其看起来混杂。

蟒蛇3

from sklearn.utils import shuffle
out = shuffle(actor).reset_index()
 
out
 
# Drop the column named index
out = out.drop(['index'], axis = 1)
out

第 9 步:编码机器学习算法 + 测试准确性。
在本节中,我们将编写机器学习算法并找出我们算法的准确性。

蟒蛇3

# First, we will extract the x and y values of our dataset
 
x = out.values[:, :-1]
y = out.values[:, -1]
 
print(x[0:3])
print(y[0:3])
 
# From the above output, we can see that:
# x - stores the image data.
# y - stores the label data.

第 10 步:导入 ML 库和 ML 编码
我们将导入一些 ML 库,所有这些都来自 sklearn 及其类。

蟒蛇3

from sklearn.decomposition import PCA                  
from sklearn.svm import SVC                            
from sklearn.pipeline import make_pipeline             
from sklearn.model_selection import train_test_split   
from sklearn.model_selection import GridSearchCV       
from sklearn import metrics                          
 
# Here we will use train_test_split to create our training and testing data.
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 0)
 
pca = PCA(n_components = 150, whiten = True, random_state = 0)
svc = SVC(kernel ='rbf', class_weight ='balanced')
model = make_pipeline(pca, svc)
 
params = {'svc__C': [x for x in range(1, 6)],
          'svc__gamma': [0.001, 0.005, 0.006, 0.01, 0.05, 0.06, 0.004, 0.04]}
 
grid = GridSearchCV(model, params)
% time grid.fit(x_train, y_train)
print(grid.best_params_)
 
model = grid.best_estimator_
ypred = model.predict(x_test)
 
ypred[0:3]

我们将使用 PCA 类和 SVC 类来创建我们的模型对象。 make_pipeline 将帮助我们创建一个可以由 GridSearchSV 测试的简单模型。

既然我们已经有了适合我们数据的最佳参数的模型,我们在模型中使用这些参数并测试其准确性。
第 11 步:图表并获得准确性
让我们看一张人脸与预测标签的可视化图:

蟒蛇3

fig, ax = plt.subplots(4, 4, sharex = True,
                             sharey = True,
                        figsize = (10, 10))
 
for i, axi in enumerate(ax.flat):
    axi.imshow(x_test[i].reshape(imsize).astype(np.float64),
                   cmap = "gray", interpolation = "nearest")
 
    axi.set_title('Label : {}'.format(ypred[i]))
     
# Finally, we test our accuracy in using the following code:
print(metrics.accuracy_score(y_test, ypred) * 100)

结论:
标记图像以创建用于机器学习或 AI 的训练数据并不是一项艰巨的任务。你只需要正确的技术。这篇文章展示了一个从零到精通的图像标注过程。