我有一个这样的Dataframe,和列的格式 date
是 yyyy-mm-dd
:
+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id| date|pieces_sold| bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
| 668| 886059| 3205|2015-01-14| 91|pbdbzvpqzqvtzxone...|
| 6608| 541277| 1917|2012-09-02| 44|cjucgejlqnmfpfcmg...|
| 12962| 613131| 2407|2016-08-26| 90|cgqhggsjmrgkrfevc...|
| 14223| 774215| 1196|2010-03-04| 46|btujmkfntccaewurg...|
| 15131| 769255| 1546|2018-11-28| 13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+
我想创建并附加一个列,它的周数为月份。意思是一个月一个星期,我想计算我所有的日期。
以下是我所做的:
sales_table.select(
'*',
F.date_format("date", "W").alias('week_month')
).show(5)
错误是:
An error occurred while calling o140.showString.
: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'W' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:176)
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkLegacyFormatter$1.applyOrElse(DateTimeFormatterHelper.scala:165)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.validatePatternString(TimestampFormatter.scala:110)
at org.apache.spark.sql.catalyst.util.TimestampFormatter$.getFormatter(TimestampFormatter.scala:279)
at org.apache.spark.sql.catalyst.util.TimestampFormatter$.apply(TimestampFormatter.scala:313)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.$anonfun$formatter$1(datetimeExpressions.scala:646)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter$lzycompute(datetimeExpressions.scala:641)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.formatter(datetimeExpressions.scala:639)
at org.apache.spark.sql.catalyst.expressions.DateFormatClass.doGenCode(datetimeExpressions.scala:665)
at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
at scala.Option.getOrElse(Option.scala:189)
..
..
..
如何从日期开始获取月份的周数?
2条答案
按热度按时间w51jfk4q1#
添加行
至
$SPARK_HOME/conf/spark-defaults.conf
.不幸的是,在最新版本的spark中,datetime格式不再支持“w”。但是您仍然可以通过上述设置恢复遗留行为。
tzdcorbm2#
如错误日志中所示,在
spark session
.Example:
```spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
sales_table.show()
+----------+
| date|
+----------+
|2015-01-14|
+----------+
sales_table.select('*',F.date_format("date", "W").alias('week_month')).show(5)
+----------+----------+
| date|week_month|
+----------+----------+
|2015-01-14| 3|
+----------+----------+