📜  标准化与标准化

📅  最后修改于: 2021-09-12 11:30:02             🧑  作者: Mango

特征缩放是机器学习中最重要的数据预处理步骤之一。如果数据未按比例缩放,则计算特征之间距离的算法会偏向于数值较大的值。

基于树的算法对特征的规模相当不敏感。此外,特征缩放有助于机器学习,深度学习算法训练和收敛速度更快。

有一些特征缩放技术,例如标准化和标准化,它们是最受欢迎的,同时也是最令人困惑的。

让我们解决这个困惑。

归一化或 Min-Max Scaling用于将特征转换为类似的尺度。新点计算如下:

X_new = (X - X_min)/(X_max - X_min)

这将范围缩放到 [0, 1] 或有时 [-1, 1]。从几何上讲,变换将 n 维数据压缩成 n 维单位超立方体。当没有异常值时标准化很有用,因为它无法处理它们。通常,我们会按年龄而不是收入来衡量,因为只有少数人收入高,但年龄接近统一。

标准化或 Z-Score 归一化是通过从均值中减去并除以标准差来转换特征。这通常称为 Z 分数。

X_new = (X - mean)/Std

在数据遵循高斯分布的情况下,标准化可能会有所帮助。然而,这不一定是真的。从几何上讲,它将数据转换为原始数据的均值向量到原点,如果 std 分别为 1,则压缩或扩展点。我们可以看到,我们只是将均值和标准差更改为仍然正常的标准正态分布,因此分布的形状不受影响。

标准化不会受到异常值的影响,因为没有预定义的转换特征范围。

归一化和标准化的区别

S.NO. Normalisation Standardisation
1. Minimum and maximum value of features are used for scaling Mean and standard deviation is used for scaling.
2. It is used when features are of different scales. It is used when we want to ensure zero mean and unit standard deviation.
3. Scales values between [0, 1] or [-1, 1]. It is not bounded to a certain range.
4. It is really affected by outliers. It is much less affected by outliers.
5. Scikit-Learn provides a transformer called MinMaxScaler for Normalization. Scikit-Learn provides a transformer called StandardScaler for standardization.
6. This transformation squishes the n-dimensional data into an n-dimensional unit hypercube. It translates the data to the mean vector of original data to the origin and squishes or expands.
7. It is useful when we don’t know about the distribution It is useful when the feature distribution is Normal or Gaussian.
8. It is a often called as Scaling Normalization It is a often called as Z-Score Normalization.