我是spark的初学者,我被困在如何使用dataframe发出sql请求。
我有以下两个Dataframe。
dataframe_1
+-----------------+-----------------+----------------------+---------------------+
|id |geometry_tyme |geometry |rayon |
+-----------------+-----------------+----------------------+---------------------+
|50 |Polygon |[00 00 00 00 01 0...] |200 |
|54 |Point |[00 00 00 00 01 0.. ] |320179 |
+-----------------+-----------------+----------------------+---------------------+
dataframe_2
+-----------------+-----------------+----------------------+
|id2 |long |lat |
+-----------------+-----------------+----------------------+
|[70,50,600,] | -9.6198783 |44.5942549 |
|[20,140,39,] |-6.6198783 |44.5942549 |
+-----------------+-----------------+----------------------+
我想执行以下请求。
"SELECT dataframe_1.* FROM dataframe_1 WHERE dataframe_1.id IN ("
+ id2
+ ") AND ((dataframe_1.geometry_tyme='Polygon' AND (ST_WITHIN(ST_GeomFromText(CONCAT('POINT(',"
+ long
+ ",' ',"
+ lat
+ ",')'),4326),dataframe_1.geometry))) OR ( (dataframe_1.geometry_tyme='LineString' OR dataframe_1.geomType='Point') AND ST_Intersects(ST_buffer(dataframe_1.geom,(dataframe_1.rayon/100000)),ST_GeomFromText(CONCAT('POINT(',"
+ long
+ ",' ',"
+ lat
+ ",')'),4326)))) "
我真的被卡住了,我应该加入两个Dataframe还是什么?我尝试用id和idzone连接两个Dataframe,如下所示:
dataframe_2.select(explode(col("id2").as ("id2"))).join(dataframe_1,col("id2").equalTo(dataframe_1.col("id")));
但在我看来,加入并不是正确的选择。
我需要你的帮助。
谢谢您
1条答案
按热度按时间omtl5h9j1#
1.从Dataframe创建临时视图。
2.运行sql作为
final_df = spark.sql("your SQL here")