数据未插入到pyspark Dataframe 中

oxiaedzo  于 2023-04-19  发布在  Spark
关注(0)|答案(2)|浏览(132)

我试图手动创建一个pysaprk数据框。但数据没有插入数据框。代码如下:

from pyspark import SparkContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.appName('PySpark DataFrame From RDD').getOrCreate()

column = ["language","users_count"]
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
rdd = sc.parallelize(data)
print(type(rdd))
sparkDF = spark.createDataFrame(data, schema=column)
print(sparkDF)

输出:DataFrame[语言:string,users_count:字符串]
数据框应插入数据

9vw9lbht

9vw9lbht1#

你没有打印你创建的数据框。忘记rdd吧。

>>> column = ["language","users_count"]
>>> data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
>>> sparkDF = spark.createDataFrame(data, schema=column)

>>> 
>>> sparkDF.show()
+--------+-----------+                                                          
|language|users_count|
+--------+-----------+
|    Java|      20000|
|  Python|     100000|
|   Scala|       3000|
+--------+-----------+

>>>
l7wslrjt

l7wslrjt2#

问题在于导入,我们需要导入如下内容:

import os
import sys
from pyspark.sql import SparkSession

相关问题