访问我刚刚在spark中创建的表时遇到问题

ql3eal8s  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(405)

我试图过滤出星期天发生在旧金山市中心地区的犯罪数据,但是,在我成功地将它们标记为1(市中心)和0(不在市中心)之后,我尝试在“旧金山市中心”统计每个“星期天”的犯罪数。但它说由于某种原因找不到表或视图。我不明白,因为我刚刚创建了(在同一个框中)我试图访问的表。

因此,我尝试使用“createorreplacetempview()”使其成为一个表。在我使用它之后,我得到了一个新的错误而不是旧的错误。有人能告诉我这是否是解决“找不到表”错误的正确方法吗?而且,我对如何计算标记为1的行感到困惑。这就是我现在拥有的,我不明白为什么我的不起作用。下面是两张table的样子:


我得到的错误是:

from pyspark.sql.functions import when
import pyspark.sql.functions as F

# First, pick out the crime cases that happens on Sunday

q3_sunday = spark.sql("SELECT * FROM sf_crime WHERE DayOfWeek='Sunday'")

# Then, we add a new column for us to filter out(identify) if the crime is in DT

q3_final = q3_sunday.withColumn("isDT",F.when(((q3_sunday.X.between(-122.4313,-122.4213))& 
(q3_sunday.Y.between(37.7540,37.7740))),1).otherwise(0))

# Last but not least, I count the crimes that happens each Sunday at SF downtown with the newly added

column as well as the True(1) and False(0) column
q3_final.createOrReplaceTempView("q3final_tbl")
sunday_dt = spark.sql("SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY DayofWeek ORDER BY Count DESC")
3yhwsihp

3yhwsihp1#

你需要改正 GROUP BY sql中的子句。应该是的-

SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY isDT ORDER BY Count DESC

相关问题