我尝试使用sql创建spark Dataframe 。在我的代码中,我认为使用sql创建spark Dataframe 成功,因为当我按“type(result)”检查“result”类型时,它具有spark Dataframe 类型。问题是“result.write.csv(“./result”)”当我尝试这样做时,出现以下错误。我如何修复它?
enter image description here
import pandas as pd
import numpy as np
from pyspark import SparkConf, SparkContext
import pyspark.pandas as ps
from pyspark.sql import SparkSession, Row
# spark session
spark = SparkSession.builder.appName("Pyspark Read Parquet").getOrCreate()
# Read parquet
path = "./data/fhvhv_tripdata_2022-01.parquet"
parquet_df = spark.read.parquet(path)
# name table
parquet_df.createOrReplaceTempView("ParquetTable")
# query for spend_time related to trip_miles
query = """SELECT on_scene_datetime, request_datetime, (on_scene_datetime - request_datetime) as spend_time, trip_miles, sales_tax
FROM ParquetTable
WHERE on_scene_datetime IS NOT NULL AND
request_datetime IS NOT NULL AND
INT(on_scene_datetime - request_datetime) > 0
ORDER BY 4 DESC"""
result = spark.sql(query)
result.show(3, truncate = False)
result.write.csv("./result")
1条答案
按热度按时间dfty9e191#
错误消息中还有其他内容吗?
我想到的主要事情是尝试写入
./result/