📜  检查 Pandas DataFrame 中的 NaN

📅  最后修改于: 2022-05-13 01:55:38.989000             🧑  作者: Mango

检查 Pandas DataFrame 中的 NaN

NaN 代表 Not A Number,是表示数据中缺失值的常用方法之一。它是一个特殊的浮点值,不能转换为浮点以外的任何其他类型。 NaN 值是数据分析中的主要问题之一。为了得到想要的结果,处理 NaN 是非常必要的。

图像

在 Pandas DataFrame 中检查 NaN 的方法如下:

  • 检查单个 DataFrame 列下的 NaN:
  • 计算单个 DataFrame 列下的 NaN:
  • 检查整个 DataFrame 下的 NaN:
  • 计算整个 DataFrame 下的 NaN:

方法一:使用 isnull().values.any() 方法
例子:

Python3
# importing libraries
import pandas as pd
import numpy as np
  
  
num = {'Integers': [10, 15, 30, 40, 55, np.nan,
                    75, np.nan, 90, 150, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(num, columns=['Integers'])
  
# Applying the method
check_nan = df['Integers'].isnull().values.any()
  
# printing the result
print(check_nan)


Python3
check_nan = df['Integers'].isnull()


Python3
# importing libraries
import pandas as pd
import numpy as np
  
  
num = {'Integers': [10, 15, 30, 40, 55, np.nan,
                    75, np.nan, 90, 150, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(num, columns=['Integers'])
  
# applying the method
count_nan = df['Integers'].isnull().sum()
  
# printing the number of values present
# in the column
print('Number of NaN values present: ' + str(count_nan))


Python3
# importing libraries
import pandas as pd
import numpy as np
  
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
                       np.nan, 90, 150, np.nan],
        'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
                       np.nan, 26, np.nan, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
  
# applying the method
nan_in_df = df.isnull().values.any()
  
# Print the dataframe
print(nan_in_df)


Python3
# importing libraries
import pandas as pd
import numpy as np
  
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
                       np.nan, 90, 150, np.nan],
        'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
                       np.nan, 26, np.nan, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
  
# applying the method
nan_in_df = df.isnull().sum().sum()
  
# printing the number of values present in
# the whole dataframe
print('Number of NaN values present: ' + str(nan_in_df))


输出:

图像

也可以获得存在 NaN 值的确切位置。我们可以通过从 isnull().values.any() 中删除 .values.any() 来做到这一点。

Python3

check_nan = df['Integers'].isnull()

输出:

图像

方法二:使用 isnull().sum() 方法
例子:

Python3

# importing libraries
import pandas as pd
import numpy as np
  
  
num = {'Integers': [10, 15, 30, 40, 55, np.nan,
                    75, np.nan, 90, 150, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(num, columns=['Integers'])
  
# applying the method
count_nan = df['Integers'].isnull().sum()
  
# printing the number of values present
# in the column
print('Number of NaN values present: ' + str(count_nan))

输出:

图像

方法三:使用 isnull().values.any() 方法

例子:

Python3

# importing libraries
import pandas as pd
import numpy as np
  
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
                       np.nan, 90, 150, np.nan],
        'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
                       np.nan, 26, np.nan, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
  
# applying the method
nan_in_df = df.isnull().values.any()
  
# Print the dataframe
print(nan_in_df)

输出:

图像

要获得存在 NaN 值的确切位置,我们可以通过从 isnull().values.any() 中删除 .values.any() 来实现。

方法四:使用 isnull().sum().sum() 方法
例子:

Python3

# importing libraries
import pandas as pd
import numpy as np
  
nums = {'Integers_1': [10, 15, 30, 40, 55, np.nan, 75,
                       np.nan, 90, 150, np.nan],
        'Integers_2': [np.nan, 21, 22, 23, np.nan, 24, 25,
                       np.nan, 26, np.nan, np.nan]}
  
# Create the dataframe
df = pd.DataFrame(nums, columns=['Integers_1', 'Integers_2'])
  
# applying the method
nan_in_df = df.isnull().sum().sum()
  
# printing the number of values present in
# the whole dataframe
print('Number of NaN values present: ' + str(nan_in_df))

输出:

图像