pyspark-从日期开始的月周数

rqcrx0a6  于 2021-05-16  发布在  Spark
关注(0)|答案(2)|浏览(506)

我有一个这样的Dataframe,和列的格式 dateyyyy-mm-dd :

+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id|      date|pieces_sold|       bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
|     668|    886059|     3205|2015-01-14|         91|pbdbzvpqzqvtzxone...|
|    6608|    541277|     1917|2012-09-02|         44|cjucgejlqnmfpfcmg...|
|   12962|    613131|     2407|2016-08-26|         90|cgqhggsjmrgkrfevc...|
|   14223|    774215|     1196|2010-03-04|         46|btujmkfntccaewurg...|
|   15131|    769255|     1546|2018-11-28|         13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+

我想创建并附加一个列,它的周数为月份。意思是一个月一个星期,我想计算我所有的日期。
以下是我所做的:

sales_table.select(
    '*',
    F.date_format("date", "W").alias('week_month')
).show(5)

错误是:

An error occurred while calling o140.showString.
: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
    at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:176)
    at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:165)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.validatePatternString(TimestampFormatter.scala:110)
    at org.apache.spark.sql.catalyst.util.TimestampFormatter$.getFormatter(TimestampFormatter.scala:279)
    at org.apache.spark.sql.catalyst.util.TimestampFormatter$.apply(TimestampFormatter.scala:313)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.$anonfun$formatter$1(datetimeExpressions.scala:646)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter$lzycompute(datetimeExpressions.scala:641)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter(datetimeExpressions.scala:639)
    at org.apache.spark.sql.catalyst.expressions.DateFormatClass.doGenCode(datetimeExpressions.scala:665)
    at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
    at scala.Option.getOrElse(Option.scala:189)
..
..
..

如何从日期开始获取月份的周数?

w51jfk4q

w51jfk4q1#

添加行

spark.sql.legacy.timeParserPolicy LEGACY

$SPARK_HOME/conf/spark-defaults.conf .
不幸的是,在最新版本的spark中,datetime格式不再支持“w”。但是您仍然可以通过上述设置恢复遗留行为。

tzdcorbm

tzdcorbm2#

如错误日志中所示,在 spark session . Example: ```
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

sales_table.show()

+----------+

| date|

+----------+

|2015-01-14|

+----------+

sales_table.select('*',F.date_format("date", "W").alias('week_month')).show(5)

+----------+----------+

| date|week_month|

+----------+----------+

|2015-01-14| 3|

+----------+----------+

相关问题