📜  Pandas 按行或列的总和过滤数据框

📅  最后修改于: 2022-05-13 01:54:26.818000             🧑  作者: Mango

Pandas 按行或列的总和过滤数据框

在本文中,我们将看到如何通过行或列的总和过滤 Pandas DataFrame。这在某些情况下很有用。假设您有一个由客户及其购买的水果组成的数据框。行由不同的客户组成,列包含不同类型的水果。您想根据他们的购买过滤数据框。要了解有关根据条件按列值和行过滤 Pandas DataFrame 的更多信息,请参阅文章链接。 Pandas dataframe.sum()函数已用于返回值的总和。

需要的步骤:

  1. 创建或导入数据框
  2. 对行求和:这可以使用 .sum()函数并传递参数 axis=1 来完成
  3. 对列求和:通过使用 .sum()函数并传递参数 axis=0
  4. 根据所需条件过滤

根据行和列的总和进行过滤

如果要删除未购买任何水果的客户或任何客户未购买的任何特定水果。在这种情况下,我们需要根据行或列总和的值进行过滤。下面是上述方法的代码实现。

Python3
# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Apple': [1, 1, 0, 0, 0, 0],
                   'Orange': [0, 1, 1, 0, 0, 1],
                   'Grapes': [1, 1, 0, 0, 1, 1],
                   'Peach': [1, 1, 0, 0, 1, 1],
                   'Watermelon': [0, 0, 0, 0, 0, 0],
                   'Guava': [1, 0, 0, 0, 0, 0],
                   'Mango': [1, 0, 1, 0, 1, 0],
                   'Kiwi': [0, 0, 0, 0, 0, 0]})
  
print("Dataframe before filtering\n")
print(df)
  
# filtering on the basis of rows
df = df[df.sum(axis=1) > 0]
  
# filtering on the basis of columns
df = df.loc[:, df.sum(axis=0) > 0]
  
print("\nDataframe after filtering\n")
print(df)


Python3
# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Apple': [1, 1, 0, 0, 0, 0],
                   'Orange': [0, 1, 1, 0, 0, 1],
                   'Grapes': [1, 1, 0, 0, 1, 1],
                   'Peach': [1, 1, 0, 0, 1, 1],
                   'Watermelon': [0, 0, 0, 0, 0, 0],
                   'Guava': [1, 0, 0, 0, 0, 0],
                   'Mango': [1, 0, 1, 0, 1, 0],
                   'Kiwi': [0, 0, 0, 0, 0, 0]})
  
print("Dataframe before filtering\n")
print(df)
  
# list of columns to be considered
columns = ['Grapes', 'Guava', 'Peach']
  
# filtering rows on basis of certain columns
df = df[df[columns].sum(axis=1) > 0]
  
print("\nDataframe after filtering\n")
print(df)


Python3
# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Apple': [1, 1, 0, 0, 0, 0],
                   'Orange': [0, 1, 1, 0, 0, 1],
                   'Grapes': [1, 1, 0, 0, 1, 1],
                   'Peach': [1, 1, 0, 0, 1, 1],
                   'Watermelon': [0, 0, 0, 0, 0, 0],
                   'Guava': [1, 0, 0, 0, 0, 0],
                   'Mango': [1, 0, 1, 0, 1, 0],
                   'Kiwi': [0, 0, 0, 0, 0, 0]})
  
print("Dataframe before filtering\n")
print(df)
  
# list of columns to be considered
columns = ['Apple', 'Mango', 'Guava', 'Watermelon']
  
# iterating through the columns and dropping
# columns with sum less than equals to 0
for column in columns:
    if (df[column].sum() <= 0):
        df.drop(column, inplace=True, axis=1)
  
print("\nDataframe after filtering\n")
print(df)


输出:

根据几列总和过滤行

现在,如果我们想从有限列表中过滤那些没有购买任何一种水果的客户,例如,没有购买葡萄、番石榴或桃子的客户应该从数据框中删除。在这里,我们根据某些列(在这种情况下是葡萄、桃子和番石榴)来过滤行。

在计算这三列的所有行的总和时,我们发现索引 2 和 3 的总和为零。

蟒蛇3

# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Apple': [1, 1, 0, 0, 0, 0],
                   'Orange': [0, 1, 1, 0, 0, 1],
                   'Grapes': [1, 1, 0, 0, 1, 1],
                   'Peach': [1, 1, 0, 0, 1, 1],
                   'Watermelon': [0, 0, 0, 0, 0, 0],
                   'Guava': [1, 0, 0, 0, 0, 0],
                   'Mango': [1, 0, 1, 0, 1, 0],
                   'Kiwi': [0, 0, 0, 0, 0, 0]})
  
print("Dataframe before filtering\n")
print(df)
  
# list of columns to be considered
columns = ['Grapes', 'Guava', 'Peach']
  
# filtering rows on basis of certain columns
df = df[df[columns].sum(axis=1) > 0]
  
print("\nDataframe after filtering\n")
print(df)

输出:



根据总和从整个数据集中过滤几列

如果要从总和为零的列列表中删除任何列。我们只对这些列求和并对它们应用条件。

蟒蛇3

# importing pandas library
import pandas as pd
  
# creating dataframe
df = pd.DataFrame({'Apple': [1, 1, 0, 0, 0, 0],
                   'Orange': [0, 1, 1, 0, 0, 1],
                   'Grapes': [1, 1, 0, 0, 1, 1],
                   'Peach': [1, 1, 0, 0, 1, 1],
                   'Watermelon': [0, 0, 0, 0, 0, 0],
                   'Guava': [1, 0, 0, 0, 0, 0],
                   'Mango': [1, 0, 1, 0, 1, 0],
                   'Kiwi': [0, 0, 0, 0, 0, 0]})
  
print("Dataframe before filtering\n")
print(df)
  
# list of columns to be considered
columns = ['Apple', 'Mango', 'Guava', 'Watermelon']
  
# iterating through the columns and dropping
# columns with sum less than equals to 0
for column in columns:
    if (df[column].sum() <= 0):
        df.drop(column, inplace=True, axis=1)
  
print("\nDataframe after filtering\n")
print(df)

输出:

这样,我们就可以根据一些情况,通过在行和列上应用一些条件来修改我们在 Pandas 中的数据框。