📜  根据 Pyspark Dataframe 中的条件计算行数

📅  最后修改于: 2022-05-13 01:55:36.099000             🧑  作者: Mango

根据 Pyspark Dataframe 中的条件计算行数

在本文中,我们将讨论如何根据 Pyspark 数据帧中的条件计算行数。

为此,我们将使用这些方法:

  • 使用 where()函数。
  • 使用 filter()函数。

创建用于演示的数据框:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data 
data =[["1","sravan","vignan"],
       ["2","ojaswi","vvit"],
       ["3","rohith","vvit"],
       ["4","sridevi","vignan"],
       ["1","sravan","vignan"], 
       ["5","gnanesh","iit"]]
  
# specify column names
columns = ['ID','NAME','college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
print('Actual data in dataframe')
dataframe.show()


Python3
print('Total rows in dataframe')
dataframe.count()


Python3
# condition to get rows in dataframe 
# where ID =1
print('Total rows in dataframe where\
ID = 1 with where clause')
print(dataframe.where(dataframe.ID == '1').count())
  
print('They are  ')
dataframe.where(dataframe.ID == '1').show()


Python3
# condition to get rows in dataframe
# where ID not equal to 1
print('Total rows in dataframe where\
ID except 1 with where clause')
  
print(dataframe.where(dataframe.ID != '1').count())
  
# condition to get rows in dataframe
# where college is equal to vignan
print('Total rows in dataframe where\
college is vignan with where clause')
print(dataframe.where(dataframe.college == 'vignan').count())
  
  
# condition to get rows in dataframe
# where id greater than 2
print('Total rows in dataframe where ID greater\
than 2 with where clause')
print(dataframe.where(dataframe.ID > 2).count())


Python3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID \
not equal to 1 and name is sridevi')
print(dataframe.where((dataframe.ID != '1') &
                      (dataframe.NAME == 'sridevi')
                     ).count())
  
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college is\
vignan or iit with where clause')
print(dataframe.where((dataframe.college == 'vignan') |
                      (dataframe.college == 'iit')).count())


Python3
# condition to get rows in
# dataframe where ID =1
print('Total rows in dataframe where\
ID = 1 with filter clause')
print(dataframe.filter(dataframe.ID == '1').count())
  
print('They are  ')
dataframe.filter(dataframe.ID == '1').show()


Python3
# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID not\
equal to 1 and name is sridevi')
print(dataframe.filter((dataframe.ID != '1') &
                       (dataframe.NAME == 'sridevi')).count())
  
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college\
is vignan or iit with filter clause')
print(dataframe.filter((dataframe.college == 'vignan') |
                       (dataframe.college == 'iit')).count())


输出:



注意:如果我们想获得所有行数,我们可以使用count()函数

示例:获取所有行数的Python程序

蟒蛇3

print('Total rows in dataframe')
dataframe.count()

输出:

Total rows in dataframe
6

方法一:使用 where()

where():该子句用于检查条件并给出结果



示例 1:在 ID = 1 的数据框中获取行的条件

蟒蛇3

# condition to get rows in dataframe 
# where ID =1
print('Total rows in dataframe where\
ID = 1 with where clause')
print(dataframe.where(dataframe.ID == '1').count())
  
print('They are  ')
dataframe.where(dataframe.ID == '1').show()

输出:

示例 2:在具有多个条件的数据框中获取行的条件。

蟒蛇3

# condition to get rows in dataframe
# where ID not equal to 1
print('Total rows in dataframe where\
ID except 1 with where clause')
  
print(dataframe.where(dataframe.ID != '1').count())
  
# condition to get rows in dataframe
# where college is equal to vignan
print('Total rows in dataframe where\
college is vignan with where clause')
print(dataframe.where(dataframe.college == 'vignan').count())
  
  
# condition to get rows in dataframe
# where id greater than 2
print('Total rows in dataframe where ID greater\
than 2 with where clause')
print(dataframe.where(dataframe.ID > 2).count())

输出:

示例 3:多条件的Python程序

蟒蛇3

# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID \
not equal to 1 and name is sridevi')
print(dataframe.where((dataframe.ID != '1') &
                      (dataframe.NAME == 'sridevi')
                     ).count())
  
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college is\
vignan or iit with where clause')
print(dataframe.where((dataframe.college == 'vignan') |
                      (dataframe.college == 'iit')).count())

输出:

方法 2:使用 filter()

filter():该子句用于检查条件并给出结果,两者相似

示例 1:获取 id = 1 行的Python程序

蟒蛇3

# condition to get rows in
# dataframe where ID =1
print('Total rows in dataframe where\
ID = 1 with filter clause')
print(dataframe.filter(dataframe.ID == '1').count())
  
print('They are  ')
dataframe.filter(dataframe.ID == '1').show()

输出:

示例 2:多条件的Python程序

蟒蛇3



# condition to get rows in dataframe
# where ID not equal to 1 and name is sridevi
print('Total rows in dataframe where ID not\
equal to 1 and name is sridevi')
print(dataframe.filter((dataframe.ID != '1') &
                       (dataframe.NAME == 'sridevi')).count())
  
# condition to get rows in dataframe
# where college is equal to vignan or iit
print('Total rows in dataframe where college\
is vignan or iit with filter clause')
print(dataframe.filter((dataframe.college == 'vignan') |
                       (dataframe.college == 'iit')).count())

输出: