pyspark 如何将SQL输出转换为Dataframe?

wmvff8tz  于 2023-01-16  发布在  Spark
关注(0)|答案(2)|浏览(187)

我有一个Dataframe,从其中创建一个临时视图,以便运行SQL查询。在几个SQL查询之后,我想将SQL查询的输出转换为一个新的Dataframe。我希望数据回到Dataframe中的原因是,我可以将其保存到blob存储。
所以,问题是:什么是正确的方法来转换SQL查询输出到Dataframe?
下面是我目前拥有的代码:

%scala
//read data from Azure blob
...
var df = spark.read.parquet(some_path)

// create temp view
df.createOrReplaceTempView("data_sample")

%sql
//have some sqlqueries, the one below is just an example
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date

//Now I want to have a dataframe  that has the above sql output. How to do that?
Preferably the code would be in python or scala.
nimxete2

nimxete21#

斯卡拉:

var df = spark.sql(s"""
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date
""")

PySpark:

df = spark.sql(f'''
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date
''')
7fyelxc5

7fyelxc52#

你可以在%%sql代码中创建临时视图,然后从pysark或scala代码中引用它,如下所示:

%sql
create temporary view sql_result as
SELECT ...

%scala
var df = spark.sql("SELECT * FROM sql_result")

相关问题