在pyspark中从restapi创建Dataframe时出错

yb3bgrhw  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(479)

我有个密码。我正在从restapi读取json数据,并尝试使用pyspark进行加载。但是我无法读取spark中Dataframe中的数据。有人能帮我吗。

import urllib
from pyspark.sql.types import StructType,StructField,StringType

schema = StructType([StructField('dropoff_latitude',StringType(),True),\
                     StructField('dropoff_longitude',StringType(),True),
                     StructField('extra',StringType(),True),\
                     StructField('fare_amount',StringType(),True),\
                     StructField('improvement_surcharge',StringType(),True),\
                     StructField('lpep_dropoff_datetime',StringType(),True),\
                     StructField('mta_tax',StringType(),True),\
                     StructField('passenger_count',StringType(),True),\
                     StructField('payment_type',StringType(),True),\
                     StructField('pickup_latitude',StringType(),True),\
                     StructField('ratecodeid',StringType(),True),\
                     StructField('tip_amount',StringType(),True),\
                     StructField('tolls_amount',StringType(),True),\
                     StructField('total_amount',StringType(),True),\
                     StructField('trip_distance',StringType(),True),\
                     StructField('trip_type',StringType(),True),\
                     StructField('vendorid',StringType(),True)
                    ])
url = 'https://data.cityofnewyork.us/resource/pqfs-mqru.json'
data =  urllib.request.urlopen(url).read().decode('utf-8')

rdd = sc.parallelize(data)
df = spark.createDataFrame(rdd,schema)
df.show()```

**The Error message is TypeError: StructType can not accept object '[' in type <class 'str'>**
**I have been able to do using dataset in scala but i am not able to understand why its not possible using python**

导入spark.implicits_
//从纽约市出租车数据rest api加载2016绿色出租车出行数据val url=“https://data.cityofnewyork.us/resource/pqfs-mqru.json“val result=scala.io.source.fromurl(url).mkstring
//从json数据val taxidf=spark.read.json(seq(result).tods)创建一个Dataframe
//显示包含行程数据taxidf.show()的Dataframe

waxmsbnn

waxmsbnn1#

只是为了别人。。这是为我工作的代码。request.get返回一个列表

import  requests
import json
from pyspark.sql.types import StructType,StructField,StringType

schema = StructType([StructField('dropoff_latitude',StringType(),True),\
                     StructField('dropoff_longitude',StringType(),True),
                     StructField('extra',StringType(),True),\
                     StructField('fare_amount',StringType(),True),\
                     StructField('improvement_surcharge',StringType(),True),\
                     StructField('lpep_dropoff_datetime',StringType(),True),\
                     StructField('mta_tax',StringType(),True),\
                     StructField('passenger_count',StringType(),True),\
                     StructField('payment_type',StringType(),True),\
                     StructField('pickup_latitude',StringType(),True),\
                     StructField('ratecodeid',StringType(),True),\
                     StructField('tip_amount',StringType(),True),\
                     StructField('tolls_amount',StringType(),True),\
                     StructField('total_amount',StringType(),True),\
                     StructField('trip_distance',StringType(),True),\
                     StructField('trip_type',StringType(),True),\
                     StructField('vendorid',StringType(),True)
                    ])
url = 'https://data.cityofnewyork.us/resource/pqfs-mqru.json'
r = requests.get(url)
data_json = r.json()
df = spark.createDataFrame(data_json,schema)
display(df)

相关问题