在pyspark抛出错误中从restapi调用json数据

cqoc49vn  于 2021-05-29  发布在  Spark
关注(0)|答案(0)|浏览(402)

我正在尝试使用pyspark查询restapi以获取数据到dataframe。但这是一个错误

File "C:/Users/QueryRestapi.py", line 30, in <module>
    df = parse_dataframe(json_data)
  File "C:/Users/QueryRestapi.py", line 22, in parse_dataframe
    rdd = SparkContext.parallelize(mylist)

TypeError: unbound method parallelize() must be called with SparkContext instance as first argument (got list instance instead)

代码:

from pyspark import SparkConf,SparkContext
from pyspark.sql import SparkSession
from urllib import urlopen
import json
spark = SparkSession \
     .builder \
     .appName("DataCleansing") \
     .getOrCreate()

def convert_single_object_per_line(json_list):
    json_string = ""
    for line in json_list:
        json_string += json.dumps(line) + "\n"
    return json_string

def parse_dataframe(json_data):
    r = convert_single_object_per_line(json_data)
    mylist = []
    for line in r.splitlines():
        mylist.append(line)
    rdd = SparkContext.parallelize(mylist)
    df = sqlContext.jsonRDD(rdd)
    return df

url = "https://"mylink"
response = urlopen(url)
data = str(response.read())
json_data = json.loads(data)
df = parse_dataframe(json_data)

如果我遗漏了什么,请帮帮我……。非常感谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题