📜  在 PySpark 中将 Row 转换为列表 RDD

📅  最后修改于: 2022-05-13 01:55:02.236000             🧑  作者: Mango

在 PySpark 中将 Row 转换为列表 RDD

在本文中,我们将在 Pyspark 中将 Row 转换为列表 RDD。

从 Row 创建 RDD 以进行演示:

Python3
# import Row and SparkSession
from pyspark.sql import SparkSession, Row
  
# create sparksession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
  
# create student data with Row function
data = [Row(name="sravan kumar",
            subjects=["Java", "python", "C++"],
            state="AP"),
  
        Row(name="Ojaswi",
            lang=["Spark", "Java", "C++"],
            state="Telangana"),
  
        Row(name="rohith",
            subjects=["DS", "PHP", ".net"],
            state="AP"),
  
        Row(name="bobby",
            lang=["Python", "C", "sql"],
            state="Delhi"),
  
        Row(name="rohith",
            lang=["CSharp", "VB"],
            state="Telangana")]
rdd = spark.sparkContext.parallelize(data)
  
# display actual rdd
rdd.collect()


Python3
# convert rdd to list by using map() method
b = rdd.map(list)
  
# display the data in b with collect method
for i in b.collect():
    print(i)


输出:

[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'),
Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'),
Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'),
Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'),
Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]

使用 map()函数,我们可以将其转换为列表 RDD



最后,通过使用 collect 方法,我们可以将数据显示在列表 RDD 中。

蟒蛇3

# convert rdd to list by using map() method
b = rdd.map(list)
  
# display the data in b with collect method
for i in b.collect():
    print(i)

输出:

['sravan kumar', ['Java', 'python', 'C++'], 'AP']
['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana']
['rohith', ['DS', 'PHP', '.net'], 'AP']
['bobby', ['Python', 'C', 'sql'], 'Delhi']
['rohith', ['CSharp', 'VB'], 'Telangana']