数据挖掘中分类和预测方法的区别

分类和预测是用于挖掘数据的两种主要方法。我们使用这两种技术来分析数据，更多地探索未知数据。

分类：

分类是寻找描述数据类或概念的良好模型的过程，分类的目的是预测类标签未知的对象的类别。简单来说，我们可以将分类视为根据我们当前或过去所做的假设以及我们已经拥有的数据对传入的新数据进行分类。

我们可以认为预测就像未来可能发生的事情。就像在预测中一样，我们根据我们拥有的先前数据和未来假设来识别或预测新观察的缺失或不可用数据。在预测中，输出是一个连续值。

Sr.No.	Prediction	Classification
1.	Prediction is about predicting a missing/unknown element(continuous value) of a dataset	Classification is about determining a (categorial) class (or label) for an element in a dataset
2.	Eg. We can think of prediction as predicting the correct treatment for a particular disease for an individual person.	Eg. Whereas the grouping of patients based on their medical records can be considered classification.
3.	The model used to predict the unknown value is called a predictor.	The model used to classify the unknown value is called a classifier.
4.	The predictor is constructed from a training set and its accuracy refers to how well it can estimate the value of new data.	A classifier is also constructed from a training set composed of the records of databases and their corresponding class names

以下是我们将用于比较分类和预测方法的几个标准：

在应用分类或预测方法之前，我们必须对数据执行主要 2 项主要操作：

数据清洗： 用 Layman 的话来说，数据清洗是指对数据进行预处理，去除数据中的噪声，清洗数据，修复数据中缺失或未知的值。
相关性分析：清洗数据后，我们要对数据进行分析，根据问题找到相关数据。例如，我们使用相关性分析来比较分类方法中的各个类。在清理数据和分析数据之后，我们可能需要对结果数据进行归一化，因为归一化的数据在预测未知值的同时提供了更高的准确性。标准化可以通过将数据集中的所有值在范围内从 0 缩放到 1 来实现。