我试图过滤出星期天发生在旧金山市中心地区的犯罪数据,但是,在我成功地将它们标记为1(市中心)和0(不在市中心)之后,我尝试在“旧金山市中心”统计每个“星期天”的犯罪数。但它说由于某种原因找不到表或视图。我不明白,因为我刚刚创建了(在同一个框中)我试图访问的表。
因此,我尝试使用“createorreplacetempview()”使其成为一个表。在我使用它之后,我得到了一个新的错误而不是旧的错误。有人能告诉我这是否是解决“找不到表”错误的正确方法吗?而且,我对如何计算标记为1的行感到困惑。这就是我现在拥有的,我不明白为什么我的不起作用。下面是两张table的样子:
我得到的错误是:
from pyspark.sql.functions import when
import pyspark.sql.functions as F
# First, pick out the crime cases that happens on Sunday
q3_sunday = spark.sql("SELECT * FROM sf_crime WHERE DayOfWeek='Sunday'")
# Then, we add a new column for us to filter out(identify) if the crime is in DT
q3_final = q3_sunday.withColumn("isDT",F.when(((q3_sunday.X.between(-122.4313,-122.4213))&
(q3_sunday.Y.between(37.7540,37.7740))),1).otherwise(0))
# Last but not least, I count the crimes that happens each Sunday at SF downtown with the newly added
column as well as the True(1) and False(0) column
q3_final.createOrReplaceTempView("q3final_tbl")
sunday_dt = spark.sql("SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY DayofWeek ORDER BY Count DESC")
1条答案
按热度按时间3yhwsihp1#
你需要改正
GROUP BY
sql中的子句。应该是的-