📌  相关文章
📜  查找 PySpark Dataframe 列的最小值、最大值和平均值

📅  最后修改于: 2022-05-13 01:55:32.656000             🧑  作者: Mango

查找 PySpark Dataframe 列的最小值、最大值和平均值

在本文中,我们将在 PySpark 数据框中查找特定列的最大值、最小值和平均值。

为此,我们将使用 agg()函数。这个函数 Compute 聚合并将结果作为 DataFrame 返回。

创建 DataFrame 进行演示:



Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql 
# module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app
# name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data
data = [["1", "sravan", "vignan", 67, 89],
        ["2", "ojaswi", "vvit", 78, 89],
        ["3", "rohith", "vvit", 100, 80],
        ["4", "sridevi", "vignan", 78, 80],
        ["1", "sravan", "vignan", 89, 98],
        ["5", "gnanesh", "iit", 94, 98]]
  
# specify column names
columns = ['student ID', 'student NAME',
           'college', 'subject 1', 'subject 2']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# display dataframe
dataframe.show()


Python3
# find average of subjects column
dataframe.agg({'subject 1': 'avg'}).show()


Python3
# find average of multiple  column
dataframe.agg({'subject 1': 'avg',
               'student ID': 'avg',
               'subject 2': 'avg'}).show()


Python3
# minimum value from student ID column
dataframe.agg({'student ID': 'min'}).show()


Python3
# minimum value from multiple column
dataframe.agg({'college': 'min',
               'student NAME': 'min',
               'student ID':'min'}).show()


Python3
# maximum value from student ID column
dataframe.agg({'student ID': 'max'}).show()


Python3
# maximum value from multiple column
dataframe.agg({'college': 'max',
               'student NAME': 'max',
               'student ID':'max'}).show()


输出:

求平均值

示例 1: Python程序查找数据框列的平均值

蟒蛇3

# find average of subjects column
dataframe.agg({'subject 1': 'avg'}).show()

输出:

示例 2:从多列中获取平均值



蟒蛇3

# find average of multiple  column
dataframe.agg({'subject 1': 'avg',
               'student ID': 'avg',
               'subject 2': 'avg'}).show()

输出:

寻找最小值

示例 1:在数据帧列中查找最小值的Python程序。

蟒蛇3

# minimum value from student ID column
dataframe.agg({'student ID': 'min'}).show()

输出:

示例 2:从多列中获取最小值

蟒蛇3

# minimum value from multiple column
dataframe.agg({'college': 'min',
               'student NAME': 'min',
               'student ID':'min'}).show()

输出:



寻找最大值

示例 1:在数据框列中查找最大值的Python程序

蟒蛇3

# maximum value from student ID column
dataframe.agg({'student ID': 'max'}).show()

输出:

示例 2:从多列中获取最大值

蟒蛇3

# maximum value from multiple column
dataframe.agg({'college': 'max',
               'student NAME': 'max',
               'student ID':'max'}).show()

输出: