pyspark 通过模块调用时查询返回空 Dataframe [已关闭]

gupuwyp2 于 2023-02-21 发布在 Spark

关注(0)|答案(1)|浏览(185)

已关闭。此问题需要details or clarity。当前不接受答案。
**想要改进此问题？**添加详细信息并通过editing this post阐明问题。

5天前关闭。
Improve this question
我有以下设置。一个模块“ExampleModule”通过蛋文件安装，其中包含以下代码。

def fetch_data(sql, sql_query_text):
  data = sql(sql_query_text).toPandas()
  print(data) # this gives me an EmptyDataframe with 0 rows and 28 columns

在我的jupyter笔记本上运行着pyspark内核，我有下面的代码：

from pyspark.sql import SQLContext
sqlContext = SQLContext(spark)
sql = sqlContext.sql
from ExampleModule import *
sql_text = "<THE SELECT QUERY>" 
fetch_data(sql, sql_text)

这给了我一个空的 Dataframe 。但是，如果我定义了一个本地函数“fetch_data_local”，它运行良好，并给了我预期的43k行。

def fetch_data_local(sql, sql_text):
  data = sql(sql_text).toPandas()
  print(data.size)

fetch_data_local(sql, sql_text)

上面的函数工作正常，给我43k行。

pyspark

来源：https://stackoverflow.com/questions/75454602/query-returning-empty-dataframe-when-called-through-a-module

1条答案

按热度按时间

gwbalxhn1#

我已经用Databricks社区版试过了，对我很有效

spark.sparkContext.addPyFile("dbfs:/FileStore/shared_uploads/********@gmail.com/CustomModule.py")
from CustomModule import *

df = [{"Category": 'A', "date": '01/01/2022', "Indictor": 1},
        {"Category": 'A', "date": '02/01/2022', "Indictor": 0},
        {"Category": 'A', "date": '03/01/2022', "Indictor": 1},
        {"Category": 'A', "date": '04/01/2022', "Indictor": 1},
        {"Category": 'A', "date": '05/01/2022', "Indictor": 1},
        {"Category": 'B', "date": '01/01/2022', "Indictor": 0},
        {"Category": 'B', "date": '02/01/2022', "Indictor": 1},
        {"Category": 'B', "date": '03/01/2022', "Indictor": 1},
        {"Category": 'B', "date": '04/01/2022', "Indictor": 0},
        {"Category": 'B', "date": '05/01/2022', "Indictor": 0},
        {"Category": 'B', "date": '06/01/2022', "Indictor": 1}]

df = spark.createDataFrame(df)
df.write.mode("overwrite").saveAsTable("sample")

from pyspark.sql import SQLContext
sqlContext = SQLContext(spark)
sql = sqlContext.sql
sql_text = "select * from sample" 
fetch_data(sql, sql_text)

产出

df:pyspark.sql.dataframe.DataFrame = [Category: string, Indictor: long ... 1 more field]
/databricks/spark/python/pyspark/sql/context.py:117: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
  warnings.warn(
   Category  Indictor        date
0         A         1  03/01/2022
1         A         1  04/01/2022
2         B         1  02/01/2022
3         B         1  03/01/2022
4         B         0  05/01/2022
5         B         1  06/01/2022
6         A         1  01/01/2022
7         A         0  02/01/2022
8         A         1  05/01/2022
9         B         0  01/01/2022
10        B         0  04/01/2022

赞(0）回复(0）举报 2023-02-21

我来回答

pyspark 通过模块调用时查询返回空 Dataframe [已关闭]

1条答案

相关问题

热门标签

最新问答