📌  相关文章
📜  根据 PySpark 数据框中的列值过滤行

📅  最后修改于: 2022-05-13 01:54:23.773000             🧑  作者: Mango

根据 PySpark 数据框中的列值过滤行

在本文中,我们将根据 PySpark 数据框中的列值过滤行。

创建用于演示的数据框:

Python3
# importing module
import spark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["1", "sravan", "company 1"],
        ["4", "sridevi", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
dataframe.show()


Python3
# get the data where ID=1
dataframe.where(dataframe.ID=='1').show()


Python3
# get the data where name not 'sravan'
dataframe.where(dataframe.NAME != 'sravan').show()


Python3
# filter rows where ID greater than 2
# and college is vvit
dataframe.where((dataframe.ID>'2') & (dataframe.college=='vvit')).show()


Python3
# get the data where college is  'vvit'
dataframe.filter(dataframe.college=='vvit').show()


Python3
# get the data where id > 3
dataframe.filter(dataframe.ID>'3').show()


Python3
# filter rows where ID greater
# than 2 and college is vignan
dataframe.filter((dataframe.ID>'2') &
                 (dataframe.college=='vignan')).show()


输出:

方法一:使用 where()函数



此函数用于检查条件并给出结果

我们将通过条件使用列值来过滤行,其中条件是数据框条件

示例 1:过滤数据框中 ID = 1 的行

蟒蛇3

# get the data where ID=1
dataframe.where(dataframe.ID=='1').show()

输出:

示例 2:



蟒蛇3

# get the data where name not 'sravan'
dataframe.where(dataframe.NAME != 'sravan').show()

输出:

示例 3: Where 子句多列值过滤。

Python程序过滤ID大于2且大学为vvit的行

蟒蛇3

# filter rows where ID greater than 2
# and college is vvit
dataframe.where((dataframe.ID>'2') & (dataframe.college=='vvit')).show()

输出:

方法二:使用 filter()函数

该函数用于检查条件并给出结果。



示例1:获取列值的Python代码= vvit学院

蟒蛇3

# get the data where college is  'vvit'
dataframe.filter(dataframe.college=='vvit').show()

输出:

例2:过滤id>3的数据。

蟒蛇3

# get the data where id > 3
dataframe.filter(dataframe.ID>'3').show()

输出:

示例 3:多列值过滤。

过滤ID大于2且大学为vignan的行的Python程序

蟒蛇3

# filter rows where ID greater
# than 2 and college is vignan
dataframe.filter((dataframe.ID>'2') &
                 (dataframe.college=='vignan')).show()

输出: