根据 Pyspark Dataframe 中的条件计算行数
在本文中,我们将讨论如何根据 Pyspark 数据帧中的条件计算行数。
为此,我们将使用这些方法:
- 使用 where()函数。
- 使用 filter()函数。
创建用于演示的数据框:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data =[["1","sravan","vignan"],
["2","ojaswi","vvit"],
["3","rohith","vvit"],
["4","sridevi","vignan"],
["1","sravan","vignan"],
["5","gnanesh","iit"]]
# specify column names
columns = ['ID','NAME','college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
print('Actual data in dataframe')
dataframe.show()
Python3
print('Total rows in dataframe')
dataframe.count()
Python3
# condition to get rows in dataframe
# where ID =1
print('Total rows in dataframe where\
ID = 1 with where clause')
print(dataframe.where(dataframe.ID == '1').count())
print('They are ')
dataframe.where(dataframe.ID == '1').show()
Python3
# condition to get rows in dataframe
# where ID not equal to 1
print('Total rows in dataframe where\
ID except 1 with where clause')
print(dataframe.where(dataframe.ID != '1').count())
# condition to get rows in dataframe
# where college is equal to vignan
print('Total rows in dataframe where\
college is vignan with where clause')
print(dataframe.where(dataframe.college == 'vignan').count())
# condition to get rows in dataframe
# where id greater than 2
print('Total rows in dataframe where ID greater\
than 2 with where clause')
print(dataframe.where(dataframe.ID > 2).count())
Python3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID \
not equal to 1 and name is sridevi')
print(dataframe.where((dataframe.ID != '1') &
(dataframe.NAME == 'sridevi')
).count())
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college is\
vignan or iit with where clause')
print(dataframe.where((dataframe.college == 'vignan') |
(dataframe.college == 'iit')).count())
Python3
# condition to get rows in
# dataframe where ID =1
print('Total rows in dataframe where\
ID = 1 with filter clause')
print(dataframe.filter(dataframe.ID == '1').count())
print('They are ')
dataframe.filter(dataframe.ID == '1').show()
Python3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID not\
equal to 1 and name is sridevi')
print(dataframe.filter((dataframe.ID != '1') &
(dataframe.NAME == 'sridevi')).count())
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college\
is vignan or iit with filter clause')
print(dataframe.filter((dataframe.college == 'vignan') |
(dataframe.college == 'iit')).count())
输出:
注意:如果我们想获得所有行数,我们可以使用count()函数
Syntax: dataframe.count()
Where, dataframe is the pyspark input dataframe
示例:获取所有行数的Python程序
蟒蛇3
print('Total rows in dataframe')
dataframe.count()
输出:
Total rows in dataframe
6
方法一:使用 where()
where():该子句用于检查条件并给出结果
Syntax: dataframe.where(condition)
Where the condition is the dataframe condition
示例 1:在 ID = 1 的数据框中获取行的条件
蟒蛇3
# condition to get rows in dataframe
# where ID =1
print('Total rows in dataframe where\
ID = 1 with where clause')
print(dataframe.where(dataframe.ID == '1').count())
print('They are ')
dataframe.where(dataframe.ID == '1').show()
输出:
示例 2:在具有多个条件的数据框中获取行的条件。
蟒蛇3
# condition to get rows in dataframe
# where ID not equal to 1
print('Total rows in dataframe where\
ID except 1 with where clause')
print(dataframe.where(dataframe.ID != '1').count())
# condition to get rows in dataframe
# where college is equal to vignan
print('Total rows in dataframe where\
college is vignan with where clause')
print(dataframe.where(dataframe.college == 'vignan').count())
# condition to get rows in dataframe
# where id greater than 2
print('Total rows in dataframe where ID greater\
than 2 with where clause')
print(dataframe.where(dataframe.ID > 2).count())
输出:
Total rows in dataframe where ID except 1 with where clause
4
Total rows in dataframe where college is vignan with where clause
3
Total rows in dataframe where ID greater than 2 with where clause
3
示例 3:多条件的Python程序
蟒蛇3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID \
not equal to 1 and name is sridevi')
print(dataframe.where((dataframe.ID != '1') &
(dataframe.NAME == 'sridevi')
).count())
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college is\
vignan or iit with where clause')
print(dataframe.where((dataframe.college == 'vignan') |
(dataframe.college == 'iit')).count())
输出:
Total rows in dataframe where ID not equal to 1 and name is sridevi
1
Total rows in dataframe where college is vignan or iit with where clause
4
方法 2:使用 filter()
filter():该子句用于检查条件并给出结果,两者相似
Syntax: dataframe.filter(condition)
示例 1:获取 id = 1 行的Python程序
蟒蛇3
# condition to get rows in
# dataframe where ID =1
print('Total rows in dataframe where\
ID = 1 with filter clause')
print(dataframe.filter(dataframe.ID == '1').count())
print('They are ')
dataframe.filter(dataframe.ID == '1').show()
输出:
示例 2:多条件的Python程序
蟒蛇3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID not\
equal to 1 and name is sridevi')
print(dataframe.filter((dataframe.ID != '1') &
(dataframe.NAME == 'sridevi')).count())
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college\
is vignan or iit with filter clause')
print(dataframe.filter((dataframe.college == 'vignan') |
(dataframe.college == 'iit')).count())
输出:
Total rows in dataframe where ID not equal to 1 and name is sridevi
1
Total rows in dataframe where college is vignan or iit with filter clause
4