我有一个Dataframe,从其中创建一个临时视图,以便运行SQL查询。在几个SQL查询之后,我想将SQL查询的输出转换为一个新的Dataframe。我希望数据回到Dataframe中的原因是,我可以将其保存到blob存储。
所以,问题是:什么是正确的方法来转换SQL查询输出到Dataframe?
下面是我目前拥有的代码:
%scala
//read data from Azure blob
...
var df = spark.read.parquet(some_path)
// create temp view
df.createOrReplaceTempView("data_sample")
%sql
//have some sqlqueries, the one below is just an example
SELECT
date,
count(*) as cnt
FROM
data_sample
GROUP BY
date
//Now I want to have a dataframe that has the above sql output. How to do that?
Preferably the code would be in python or scala.
2条答案
按热度按时间nimxete21#
斯卡拉:
PySpark:
7fyelxc52#
你可以在%%sql代码中创建临时视图,然后从pysark或scala代码中引用它,如下所示: