我正在尝试使用pyspark查询restapi以获取数据到dataframe。但这是一个错误
File "C:/Users/QueryRestapi.py", line 30, in <module>
df = parse_dataframe(json_data)
File "C:/Users/QueryRestapi.py", line 22, in parse_dataframe
rdd = SparkContext.parallelize(mylist)
TypeError: unbound method parallelize() must be called with SparkContext instance as first argument (got list instance instead)
代码:
from pyspark import SparkConf,SparkContext
from pyspark.sql import SparkSession
from urllib import urlopen
import json
spark = SparkSession \
.builder \
.appName("DataCleansing") \
.getOrCreate()
def convert_single_object_per_line(json_list):
json_string = ""
for line in json_list:
json_string += json.dumps(line) + "\n"
return json_string
def parse_dataframe(json_data):
r = convert_single_object_per_line(json_data)
mylist = []
for line in r.splitlines():
mylist.append(line)
rdd = SparkContext.parallelize(mylist)
df = sqlContext.jsonRDD(rdd)
return df
url = "https://"mylink"
response = urlopen(url)
data = str(response.read())
json_data = json.loads(data)
df = parse_dataframe(json_data)
如果我遗漏了什么,请帮帮我……。非常感谢
暂无答案!
目前还没有任何答案,快来回答吧!