📜  如何按列列表订购 Pyspark 数据框?

📅  最后修改于: 2022-05-13 01:55:46.134000             🧑  作者: Mango

如何按列列表订购 Pyspark 数据框?

在本文中,我们将在Python的pyspark 数据帧上应用具有多列的 OrderBy。对行进行排序意味着按升序或降序排列行。

方法 1:使用 OrderBy()

OrderBy()函数用于按对象的索引值对对象进行排序。

让我们创建一个示例数据框

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data
data = [["1", "sravan", "vignan"], ["2", "ojaswi", "vvit"],
        ["3", "rohith", "vvit"], ["4", "sridevi", "vignan"],
        ["1", "sravan", "vignan"], ["5", "gnanesh", "iit"]]
  
# specify column names
columns = ['student ID', 'student NAME', 'college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
print("Actual data in dataframe")
# show dataframe
dataframe.show()


Python3
# show dataframe by sorting the dataframe 
# based on two columns in ascending
# order using orderby() function
dataframe.orderBy(['student ID', 'student NAME'],
                  ascending=True).show()


Python3
# show dataframe by sorting the dataframe
# based on two columns in descending
# order using orderby() function
dataframe.orderBy(['student ID', 'student NAME'],
                  ascending=False).show()


Python3
# show dataframe by sorting the dataframe
# based on two columns in descending order
dataframe.sort(['college', 'student NAME'], ascending=False).show()


输出:

对多列应用 OrderBy

蟒蛇3

# show dataframe by sorting the dataframe 
# based on two columns in ascending
# order using orderby() function
dataframe.orderBy(['student ID', 'student NAME'],
                  ascending=True).show()

输出:

蟒蛇3



# show dataframe by sorting the dataframe
# based on two columns in descending
# order using orderby() function
dataframe.orderBy(['student ID', 'student NAME'],
                  ascending=False).show()

输出:

方法 2:使用 sort()

它以布尔值作为参数以升序或降序排序。

蟒蛇3

# show dataframe by sorting the dataframe
# based on two columns in descending order
dataframe.sort(['college', 'student NAME'], ascending=False).show()

输出: