📌  相关文章
📜  按升序或降序对 PySpark DataFrame 列进行排序

📅  最后修改于: 2022-05-13 01:55:51.910000             🧑  作者: Mango

按升序或降序对 PySpark DataFrame 列进行排序

在本文中,我们将对 pyspark 中的数据框列进行排序。为此,我们在升序和降序排序中使用sort()orderBy()函数。

让我们创建一个示例数据框。

Python3
# importing module
import pyspark
  
# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["1", "sravan", "company 1"],
        ["4", "sridevi", "company 1"]]
  
# specify column names
columns = ['Employee_ID', 'Employee NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# display data in the dataframe
dataframe.show()


Python3
# sort the dataframe based on 
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
               ascending = True).show()


Python3
# sort the dataframe based on 
# employee name column in descending order
dataframe.sort(['Employee NAME'],
               ascending = False).show()


Python3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
               ascending = True).show()


Python3
# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
                'Company'], ascending = False).show()


Python3
dataframe.sort(dataframe.Employee_ID.asc()).show()


Python3
dataframe.sort(dataframe.Employee_ID.desc()).show()


Python3
# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending=False).show()


Python3
# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending = False).show()


Python3
# sort the dataframe based on employee ID 
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
                  ascending = False).show()


Python3
# sort the dataframe based on employee ID 
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
                  ascending =True).show()


输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          1|       sravan|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

使用 sort()函数

sort函数用于对数据框列进行排序。



示例 1:使用 Sort() 对一列进行升序排列

根据员工姓名按升序对数据进行排序

蟒蛇3

# sort the dataframe based on 
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
               ascending = True).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

根据员工姓名按降序对数据进行排序:

代码:



蟒蛇3

# sort the dataframe based on 
# employee name column in descending order
dataframe.sort(['Employee NAME'],
               ascending = False).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
+-----------+-------------+---------+

示例 2:对多列使用 Sort()

我们将根据员工 ID 和员工姓名按升序对数据框进行排序。

蟒蛇3

# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
               ascending = True).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

我们将根据员工 ID、公司和员工姓名按降序对数据框进行排序

蟒蛇3

# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
                'Company'], ascending = False).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

示例 3:按 ASC 方法排序。



Column函数的ASC 方法,它根据给定列名的升序返回一个排序表达式。

蟒蛇3

dataframe.sort(dataframe.Employee_ID.asc()).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

示例 4:按 DESC 方法排序。

Column函数的DESC 方法,它根据给定列名的降序返回一个排序表达式。

蟒蛇3

dataframe.sort(dataframe.Employee_ID.desc()).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

使用 OrderBy()函数

orderBy()函数按一列或多列排序。默认情况下,它按升序排序。

示例 1:一列升序

Python程序根据员工ID按升序对数据框进行排序



蟒蛇3

# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending=False).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Python程序根据员工ID按降序对数据框进行排序

蟒蛇3

# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending = False).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

示例 2:升序多列

使用 orderBy 根据员工 ID 和员工姓名列按降序对数据框进行排序。

蟒蛇3

# sort the dataframe based on employee ID 
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
                  ascending = False).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

根据员工 ID 和员工姓名列按升序对数据框进行排序

蟒蛇3

# sort the dataframe based on employee ID 
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
                  ascending =True).show()

输出:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+