如何使日期的年数和周数对齐

4xrmg8kj  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(401)

在尝试获取跨越多年的一系列日期的年数和周数时,我遇到了一些关于年初/年末的问题。
我理解你的逻辑 weeknumber 其中一个呢 year 当他们分开跑的时候。但是,当它们结合在一起时,在某些情况下,它们不会带来一致的结果,我想知道spark中的最佳方法是什么,以确保在给定的周数内以一致的年份处理这些场景,
例如,运行:

spark.sql("select year('2017-01-01') as year, weekofyear('2017-01-01') as weeknumber").show(false)

输出:

+----+----------+
|year|weeknumber|
+----+----------+
|2017|52        |
+----+----------+

但想要的结果是:

+----+----------+
|year|weeknumber|
+----+----------+
|2016|52        |
+----+----------+

跑步:

spark.sql("select year('2018-12-31') as year, weekofyear('2018-12-31') as weeknumber").show(false)

生产:

+----+----------+
|year|weeknumber|
+----+----------+
|2018|1         |
+----+----------+

但人们期待的是:

+----+----------+
|year|weeknumber|
+----+----------+
|2019|1         |
+----+----------+

代码正在spark 2.4.2上运行。

zpgglvta

zpgglvta1#

这种Spark行为与 ISO 8601 定义。你不能改变它。不过,我可以想出一个解决办法。
你可以先确定 dayOfWeek ,如果它小于4,你就把年份增加1,如果它等于4,就保持年份不变。否则每年减少一次。
示例 2017-01-01 ```
sql("select case when date_format('2017-01-01', 'u') < 4 then year('2017-01-01')+1 when date_format('2017-01-01', 'u') = 4 then year('2017-01-01') else year('2017-01-01')- 1 end as year, weekofyear('2017-01-01') as weeknumber, date_format('2017-01-01', 'u') as dayOfWeek").show(false)

+----+----------+---------+
|year|weeknumber|dayOfWeek|
+----+----------+---------+
|2016|52 |7 |
+----+----------+---------+

示例 `2018-12-31` ```
sql("select case when date_format('2018-12-31', 'u') < 4 then year('2018-12-31')+1 when date_format('2018-12-31', 'u') = 4 then year('2018-12-31') else year('2018-12-31')- 1 end as year, weekofyear('2018-12-31') as weeknumber, date_format('2018-12-31', 'u') as dayOfWeek").show(false)

+----+----------+---------+
|year|weeknumber|dayOfWeek|
+----+----------+---------+
|2019|1         |1        |
+----+----------+---------+
1cklez4t

1cklez4t2#

val df = Seq(("2017-01-01"), ("2018-12-31")).toDF("dateval")

+----------+
|   dateval|
+----------+
|2017-01-01|
|2018-12-31|
+----------+

df.createOrReplaceTempView("date_tab")

val newDF = spark.sql("""select dateval, 
                    case when weekofyear(dateval)=1 and month(dateval)=12 then struct((year(dateval)+1) as yr, weekofyear(dateval) as wk)
                         when weekofyear(dateval)=52 and month(dateval)=1 then struct((year(dateval)-1) as yr, weekofyear(dateval) as wk)
                         else struct((year(dateval)) as yr, weekofyear(dateval) as wk) end as week_struct
             from date_tab""");

newDF.select($"dateval", $"week_struct.yr", $"week_struct.wk").show()

+----------+----+---+
|   dateval|  yr| wk|
+----------+----+---+
|2017-01-01|2016| 52|
|2018-12-31|2019|  1|
+----------+----+---+

相关问题