访问我刚刚在spark中创建的表时遇到问题

ql3eal8s 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(404)

我试图过滤出星期天发生在旧金山市中心地区的犯罪数据，但是，在我成功地将它们标记为1（市中心）和0（不在市中心）之后，我尝试在“旧金山市中心”统计每个“星期天”的犯罪数。但它说由于某种原因找不到表或视图。我不明白，因为我刚刚创建了（在同一个框中）我试图访问的表。

因此，我尝试使用“createorreplacetempview（）”使其成为一个表。在我使用它之后，我得到了一个新的错误而不是旧的错误。有人能告诉我这是否是解决“找不到表”错误的正确方法吗？而且，我对如何计算标记为1的行感到困惑。这就是我现在拥有的，我不明白为什么我的不起作用。下面是两张table的样子：

我得到的错误是：

from pyspark.sql.functions import when
import pyspark.sql.functions as F

# First, pick out the crime cases that happens on Sunday

q3_sunday = spark.sql("SELECT * FROM sf_crime WHERE DayOfWeek='Sunday'")

# Then, we add a new column for us to filter out(identify) if the crime is in DT

q3_final = q3_sunday.withColumn("isDT",F.when(((q3_sunday.X.between(-122.4313,-122.4213))& 
(q3_sunday.Y.between(37.7540,37.7740))),1).otherwise(0))

# Last but not least, I count the crimes that happens each Sunday at SF downtown with the newly added

column as well as the True(1) and False(0) column
q3_final.createOrReplaceTempView("q3final_tbl")
sunday_dt = spark.sql("SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY DayofWeek ORDER BY Count DESC")

sql apache-spark

来源：https://stackoverflow.com/questions/62683967/having-trouble-accessing-the-table-i-just-created-in-spark

1条答案

按热度按时间

3yhwsihp1#

你需要改正 GROUP BY sql中的子句。应该是的-

SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY isDT ORDER BY Count DESC

赞(0）回复(0）举报 2021-05-27

我来回答

访问我刚刚在spark中创建的表时遇到问题

1条答案

相关问题

热门标签

最新问答