📜  pandas normalize df - Python (1)

📅  最后修改于: 2023-12-03 15:03:28.439000             🧑  作者: Mango

Pandas - Normalize Dataframe in Python

Pandas is a powerful data manipulation tool for Python. It provides a wide range of functions that can be used to transform, filter, and aggregate data. One common task in data preprocessing is to normalize the data, which means scaling data to a common range. In this tutorial, we will demonstrate how to normalize a Pandas dataframe in Python.

Normalization Methods

Normalization is a process of rescaling data to a common range. Normalization techniques can be used to transform a dataset to fit within a range of 0-1 or -1 to 1 scale. Two common methods for normalization are Min-Max normalization and Z-score normalization.

  • Min-Max normalization: This method scales the data to a range of 0 to 1. The formula to calculate Min-Max normalization is:

    X' = (X - Xmin) / (Xmax - Xmin)
    
  • Z-score normalization: This method scales the data to have a mean of 0 and a standard deviation of 1. The formula to calculate Z-score normalization is:

    X' = (X - mean(X)) / std(X)
    
Normalizing Pandas Dataframe

In Pandas, we can use the apply function to apply a normalization function to a dataframe. Let's start by creating a sample dataframe:

import pandas as pd

# sample dataframe
df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [0, 5, 10, 15],
    'C': [23, 26, 29, 32]
})

print(df)

Output:

    A   B   C
0  10   0  23
1  20   5  26
2  30  10  29
3  40  15  32

We can now create a normalization function for our dataframe. Let's use the Min-Max normalization method:

# min-max normalization function
def min_max_normalize(x):
    return (x - x.min()) / (x.max() - x.min())

# apply min-max normalization to dataframe
df_normalized = df.apply(min_max_normalize)
print(df_normalized)

Output:

      A     B    C
0  0.00  0.00  0.0
1  0.25  0.25  0.5
2  0.50  0.50  0.8
3  1.00  1.00  1.0

We can see that the values in the dataframe are now scaled to a range of 0 to 1.

Similarly, we can create a normalization function for Z-score normalization method:

# z-score normalization function
def z_score_normalize(x):
    return (x - x.mean()) / x.std()

# apply z-score normalization to dataframe
df_normalized = df.apply(z_score_normalize)
print(df_normalized)

Output:

          A         B         C
0 -1.341641 -1.341641 -1.341641
1 -0.447214 -0.447214  0.447214
2  0.447214  0.447214  1.341641
3  1.341641  1.341641  1.341641

We can see that the values in the dataframe are now scaled to have a mean of 0 and a standard deviation of 1.

Conclusion

In this tutorial, we demonstrated how to normalize a Pandas dataframe in Python. We used the apply function to apply a normalization function to a dataframe. We discussed two common normalization methods: Min-Max normalization and Z-score normalization. Understanding normalization techniques is important in data preprocessing to ensure that data is in a common range and can be used for analysis.