📜  如何更改 PySpark 中的数据框列名称?

📅  最后修改于: 2022-05-13 01:54:26.073000             🧑  作者: Mango

如何更改 PySpark 中的数据框列名称?

在这篇文章中,我们将看到如何 更改 pyspark 数据框中的列名称。

让我们创建一个 Dataframe 进行演示:

Python3
# Importing necessary libraries
from pyspark.sql import SparkSession
 
# Create a spark session
spark = SparkSession.builder.appName('pyspark - example join').getOrCreate()
 
# Create data in dataframe
data = [(('Ram'), '1991-04-01', 'M', 3000),
        (('Mike'), '2000-05-19', 'M', 4000),
        (('Rohini'), '1978-09-05', 'M', 4000),
        (('Maria'), '1967-12-01', 'F', 4000),
        (('Jenis'), '1980-02-17', 'F', 1200)]
 
# Column names in dataframe
columns = ["Name", "DOB", "Gender", "salary"]
 
# Create the spark dataframe
df = spark.createDataFrame(data=data,
                           schema=columns)
 
# Print the dataframe
df.show()


Python3
# Rename the column name from DOB to DateOfBirth
# Print the dataframe
df.withColumnRenamed("DOB","DateOfBirth").show()


Python3
# Rename the column name 'Gender' to 'Sex'
# Then for the returning dataframe
# again rename the 'salary' to 'Amount'
df.withColumnRenamed("Gender","Sex").
withColumnRenamed("salary","Amount").show()


Python3
# Selcet the 'Name' as 'name'
# Select remaining with their original name
data = df.selectExpr("Name as name","DOB","Gender","salary")
 
# Print the dataframe
data.show()


Python3
# Import col method from pyspark.sql.functions
from pyspark.sql.functions import col
 
# Select the 'salary' as 'Amount' using aliasing
# Select remainging with their original name
data = df.select(col("Name"),col("DOB"),
                 col("Gender"),
                 col("salary").alias('Amount'))
 
# Print the dataframe
data.show()


Python3
Data_list = ["Emp Name","Date of Birth",
             " Gender-m/f","Paid salary"]
 
new_df = df.toDF(*Data_list)
new_df.show()


输出 :



方法 1:使用 withColumnRenamed()

我们将使用 withColumnRenamed() 方法来更改 pyspark 数据框的列名。

示例 1:重命名数据框中的单列

在这里,我们将列名“DOB”重命名为“DateOfBirth”。

蟒蛇3

# Rename the column name from DOB to DateOfBirth
# Print the dataframe
df.withColumnRenamed("DOB","DateOfBirth").show()

输出 :



示例 2:重命名多个列名

蟒蛇3

# Rename the column name 'Gender' to 'Sex'
# Then for the returning dataframe
# again rename the 'salary' to 'Amount'
df.withColumnRenamed("Gender","Sex").
withColumnRenamed("salary","Amount").show()

输出 :

方法 2:使用 selectExpr()

重命名 使用selectExpr()方法的列名

在这里,我们将 Name 重命名为名称。

蟒蛇3



# Selcet the 'Name' as 'name'
# Select remaining with their original name
data = df.selectExpr("Name as name","DOB","Gender","salary")
 
# Print the dataframe
data.show()

输出 :

方法 3:使用 select() 方法

这里我们将列名“salary”重命名为“Amount”

蟒蛇3

# Import col method from pyspark.sql.functions
from pyspark.sql.functions import col
 
# Select the 'salary' as 'Amount' using aliasing
# Select remainging with their original name
data = df.select(col("Name"),col("DOB"),
                 col("Gender"),
                 col("salary").alias('Amount'))
 
# Print the dataframe
data.show()

输出 :



方法 4:使用 toDF()

此函数返回一个具有新指定列名称的新 DataFrame。

在这个例子中,我们将创建一个新列名的顺序列表并将其传递给 toDF函数

蟒蛇3

Data_list = ["Emp Name","Date of Birth",
             " Gender-m/f","Paid salary"]
 
new_df = df.toDF(*Data_list)
new_df.show()

输出: