📌  相关文章
📜  从 Pandas 数据框中删除列中缺少值或 NaN 的行

📅  最后修改于: 2022-05-13 01:54:55.527000             🧑  作者: Mango

从 Pandas 数据框中删除列中缺少值或 NaN 的行

Pandas 提供了各种数据结构和操作来操作数值数据和时间序列。但是,在某些情况下,某些数据可能会丢失。在 Pandas 中,缺失数据由两个值表示:

  • None: None 是一个Python单例对象,通常用于Python代码中的缺失数据。
  • NaN: NaN(Not a Number 的首字母缩写词),是所有使用标准 IEEE 浮点表示的系统都可以识别的特殊浮点值

Pandas 将NoneNaN视为本质上可以互换以指示缺失值或空值。为了从数据框中删除空值,我们使用dropna()函数,该函数以不同的方式删除具有空值的数据集的行/列。

代码 #1:删除至少有 1 个空值的行。

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df


现在我们删除具有至少一个 Nan 值(Null 值)的行

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
  
# using dropna() function  
df.dropna()

输出:

代码 #2:如果该行中的所有值都丢失,则删除行。

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df


现在我们删除所有数据丢失或包含空值(NaN)的行

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
df = pd.DataFrame(dict)
  
# using dropna() function    
df.dropna(how = 'all')

输出:

代码 #3:删除至少有 1 个空值的列。

# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary 
df = pd.DataFrame(dict)
     
df


现在我们删除至少有 1 个缺失值的列

# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary  
df = pd.DataFrame(dict)
  
# using dropna() function     
df.dropna(axis = 1)

输出 :

代码 #4:删除 CSV 文件中至少有 1 个空值的行。

注意:在此,我们使用的是 CSV 文件,要下载使用的 CSV 文件,请单击此处。

# importing pandas module 
import pandas as pd 
    
# making data frame from csv file 
data = pd.read_csv("employees.csv") 
    
# making new data frame with dropped NA values 
new_data = data.dropna(axis = 0, how ='any') 
    
new_data

输出:

现在我们比较数据帧的大小,以便我们可以知道有多少行至少有 1 个 Null 值

print("Old data frame length:", len(data))
print("New data frame length:", len(new_data)) 
print("Number of rows with at least 1 NA value: ",
      (len(data)-len(new_data)))

输出 :

Old data frame length: 1000
New data frame length: 764
Number of rows with at least 1 NA value:  236

由于差异为 236,因此有 236 行在任何列中至少有 1 个 Null 值。