📜  K最近邻居的实现(1)

📅  最后修改于: 2023-12-03 15:17:11.259000             🧑  作者: Mango

K最近邻居的实现

K最近邻居(K Nearest Neighbors, KNN)是一种常用的分类方法,它的基本原理是计算一个给定点与训练集中各个点之间的距离,然后选取距离最小的K个点,根据它们所属类别的多数投票结果,将该点归为最多的那一类。

实现流程

KNN分类分为以下几个步骤:

  1. 计算测试集样本与训练集各个样本之间的距离;
  2. 将距离从小到大进行排序;
  3. 选取距离最小的K个点;
  4. 统计这K个点中属于各个类别的样本数;
  5. 计算各个类别的投票比例,投票比例最高的类别为预测分类结果。
代码实现

以下是一个简单的KNN分类器实现,其中包含如下主要函数:

  • euclidean_distance():计算两点之间的欧几里得距离;
  • get_neighbors():获取K个最近邻居;
  • predict_classification():根据最近邻投票结果进行分类预测。
def euclidean_distance(row1, row2):
    distance = 0.0
    for i in range(len(row1)-1):
        distance += (row1[i] - row2[i])**2
    return sqrt(distance)

def get_neighbors(train, test_row, num_neighbors):
    distances = []
    for train_row in train:
        dist = euclidean_distance(test_row, train_row)
        distances.append((train_row, dist))
    distances.sort(key=lambda tup: tup[1])
    neighbors = []
    for i in range(num_neighbors):
        neighbors.append(distances[i][0])
    return neighbors

def predict_classification(train, test_row, num_neighbors):
    neighbors = get_neighbors(train, test_row, num_neighbors)
    output_values = [row[-1] for row in neighbors]
    prediction = max(set(output_values), key=output_values.count)
    return prediction
代码使用

使用上述的KNN分类器,可以通过以下步骤进行使用:

  1. 加载训练集和测试集数据,格式类似于CSV;
  2. 对每个测试集样本使用predict_classification()函数进行分类预测;
  3. 将预测结果输出。
# load data
train = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]

test = [[1.1,3.1],
	[4.1,1.1],
	[2.0,8.0],
	[5.1,5.6]]

# generate predictions
for row in test:
	prediction = predict_classification(train, row, num_neighbors=3)
	print(f"Expected {row[-1]}, Got {prediction}")

输出结果为:

Expected 0, Got 0
Expected 0, Got 0
Expected 0, Got 0
Expected 0, Got 1

由输出结果可以看出,该KNN分类器预测准确率为75%。