pyspark选择列等于当前行中参数值的行

7gcisfzg  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(416)

我有一个数据框,它有一个当前的日期值,我在数据框中创建一个新的列,一个月前这样计算,

spark_df = spark_df.withColumn("oneMonthAgo", expr("add_months(calendarday, -1)"))

我想返回并找到一行,该行的asofdate与onemonthago列匹配,并将其值作为名为1monthagovalue的新列包含。最初,我试图使用窗口函数返回,但由于没有一个monthHago列与日期匹配,因此它将在1monthagovalue列中返回自己的值。

Beginning Data Frame
 +----------+-----------+-----------+

 |  AsofDate|oneMonthAgo|      value|

 +----------+-----------+-----------+

 |2019-02-23| 2019-02-20|          2|

 |2019-03-20| 2019-02-20|          7|

 |2019-03-21| 2019-02-21|         12|

 |2019-03-22| 2019-02-22|         27|

 |2019-03-23| 2019-02-23|         91|

 +----------+-----------+-----------+

 Data Frame to end up with
 +----------+-----------+-----------+-----------------------+

 |  AsofDate|oneMonthAgo|      value|         1MonthAgoValue|

 +----------+-----------+-----------+-----------------------+

 |2019-02-23| 2019-02-20|          2|                  null |

 |2019-03-20| 2019-02-20|          7|                  null |

 |2019-03-21| 2019-02-21|         12|                  null |

 |2019-03-22| 2019-02-22|         27|                  null |

 |2019-03-23| 2019-02-23|         91|                     2 |  (oneMonthAgo matches first rows AsofDate include first rows value column in current row as 1MonthAgoValue )

 +----------+-----------+-----------+-----------------------+

 +---------------+-----------------------+-----------+--------------+-----+-------+----+------------+--------------+------------------+---------------+---------------+-----------------+---------------+-------------------+-------------------------+---------------+
  |accountname|             calendarday|     market|returnposition| year| month| day|yearUnique|last_monday|firstDayOfMonth|oneMonthAgo|twoMonthAgo| threeMonthAgo| sixMonthAgo|twelveMonthAgo|                       indexCP|inceptionDate| 
 +--------------+-----------------------+-----------+--------------+-----+-------+----+------------+--------------+------------------+---------------+----------------+-----------------+---------------+-------------------+-------------------------+---------------+
  |          Giants|2015-01-02 00:00:00|           null|               null| 2015|       01|   02|2015-01-02| 2014-12-29|        2015-01-01|   2014-12-02|      2014-11-02|      2014-10-02|    2014-07-02|         2014-01-02|                           100.0|    2015-01-02|
  |          Giants|2015-01-05 00:00:00|110086.25|         0.0105| 2015|       01|   05|2015-01-05| 2015-01-05|        2015-01-01|   2014-12-05|      2014-11-05|      2014-10-05|    2014-07-05|         2014-01-05|                         101.05|    2015-01-02|
  |          Giants|2015-01-06 00:00:00|  201251.5|         2.0E-4| 2015|       01|   06|2015-01-06| 2015-01-05|        2015-01-01|   2014-12-06|      2014-11-06|      2014-10-06|    2014-07-06|         2014-01-06|  101.07020999999999|   2015-01-02|
  |          Giants|2015-01-07 00:00:00|  216786.5|         -0.006| 2015|       01|   07|2015-01-07| 2015-01-05|        2015-01-01|   2014-12-07|      2014-11-07|      2014-10-07|    2014-07-07|         2014-01-07|              100.46378874|    2015-01-02|
  |          Giants|2015-01-08 00:00:00|  215464.5|       -0.0063| 2015|       01|   08|2015-01-08| 2015-01-05|        2015-01-01|   2014-12-08|      2014-11-08|      2014-10-08|    2014-07-08|         2014-01-08|        99.830866870938|    2015-01-02|
  |          Giants|2015-01-09 00:00:00|214103.25|        0.0052| 2015|       01|   09|2015-01-09| 2015-01-05|        2015-01-01|   2014-12-09|      2014-11-09|      2014-10-09|    2014-07-09|         2014-01-09|  100.34998737866687|    2015-01-02|
  |          Giants|2015-01-12 00:00:00|  215218.5|       -4.0E-4| 2015|       01|   12|2015-01-12| 2015-01-12|        2015-01-01|   2014-12-12|      2014-11-12|      2014-10-12|    2014-07-12|         2014-01-12|  100.30984738371541|    2015-01-02|
  |          Giants|2015-01-13 00:00:00|215125.25|        0.0036| 2015|       01|   13|2015-01-13| 2015-01-12|        2015-01-01|   2014-12-13|      2014-11-13|      2014-10-13|    2014-07-13|         2014-01-13|  100.67096283429677|    2015-01-02|
  |          Giants|2015-01-14 00:00:00|  215919.5|        8.0E-4| 2015|       01|   14|2015-01-14| 2015-01-12|        2015-01-01|   2014-12-14|      2014-11-14|      2014-10-14|    2014-07-14|         2014-01-14|    100.7514996045642|    2015-01-02|
  |          Giants|2015-01-15 00:00:00|216103.75|        4.0E-4| 2015|       01|   15|2015-01-15| 2015-01-12|        2015-01-01|   2014-12-15|      2014-11-15|      2014-10-15|    2014-07-15|         2014-01-15|  100.79180020440602|    2015-01-02|
  |          Giants|2015-01-16 00:00:00|  216205.5|       0.0052| 2015|       01|   16|2015-01-16| 2015-01-12|        2015-01-01|   2014-12-16|      2014-11-16|      2014-10-16|    2014-07-16|         2014-01-16|  101.31591756546894|    2015-01-02|
  |          Giants|2015-01-19 00:00:00|  347334.0|      -0.0045| 2015|       01|   19|2015-01-19| 2015-01-19|        2015-01-01|   2014-12-19|      2014-11-19|      2014-10-19|    2014-07-19|         2014-01-19|  100.85999593642434|    2015-01-02|
  |          Giants|2015-01-20 00:00:00|  345767.0|       0.0015| 2015|       01|   20|2015-01-20| 2015-01-19|        2015-01-01|   2014-12-20|      2014-11-20|      2014-10-20|    2014-07-20|         2014-01-20|  101.01128593032898|    2015-01-02|
  |          Giants|2015-01-21 00:00:00|  346314.5|       2.0E-4| 2015|       01|   21|2015-01-21| 2015-01-19|        2015-01-01|   2014-12-21|      2014-11-21|      2014-10-21|    2014-07-21|         2014-01-21|  101.03148818751504|    2015-01-02|
  |          Giants|2015-01-22 00:00:00|346399.75|       0.0029| 2015|       01|   22|2015-01-22| 2015-01-19|        2015-01-01|   2014-12-22|      2014-11-22|      2014-10-22|    2014-07-22|         2014-01-22| 101.32447950325883|    2015-01-02|
  |          Giants|2015-01-23 00:00:00|347412.75|      -6.0E-4| 2015|       01|   23|2015-01-23| 2015-01-19|        2015-01-01|   2014-12-23|      2014-11-23|      2014-10-23|    2014-07-23|         2014-01-23| 101.26368481555686|    2015-01-02|
  |          Giants|2015-01-26 00:00:00|348303.75|      -6.0E-4| 2015|       01|   26|2015-01-26| 2015-01-26|        2015-01-01|   2014-12-26|      2014-11-26|      2014-10-26|    2014-07-26|         2014-01-26| 101.20292660466752|    2015-01-02|
  |          Giants|2015-01-27 00:00:00|  348541.0|      -0.0044| 2015|       01|   27|2015-01-27| 2015-01-26|        2015-01-01|   2014-12-27|      2014-11-27|      2014-10-27|    2014-07-27|         2014-01-27| 100.75763372760697|    2015-01-02|
  |          Giants|2015-01-28 00:00:00|347579.25|       0.0015| 2015|       01|   28|2015-01-28| 2015-01-26|        2015-01-01|   2014-12-28|      2014-11-28|      2014-10-28|    2014-07-28|         2014-01-28|   100.9087701781984|    2015-01-02|
  |          Giants|2015-01-29 00:00:00|348431.75|       2.0E-4| 2015|       01|   29|2015-01-29| 2015-01-26|        2015-01-01|   2014-12-29|      2014-11-29|      2014-10-29|    2014-07-29|         2014-01-29|  100.92895193223403|    2015-01-02|
  |        Yankees|2015-01-02 00:00:00|           null|               null| 2015|       01|   02|2015-01-02| 2014-12-29|        2015-01-01|   2014-12-02|      2014-11-02|      2014-10-02|    2014-07-02|         2014-01-02|                           100.0|    2015-01-02|
  |        Yankees|2015-01-05 00:00:00|110086.25|         0.0105| 2015|       01|   05|2015-01-05| 2015-01-05|        2015-01-01|   2014-12-05|      2014-11-05|      2014-10-05|    2014-07-05|         2014-01-05|                         101.05|    2015-01-02|
  |        Yankees|2015-01-06 00:00:00|  201251.5|         2.0E-4| 2015|       01|   06|2015-01-06| 2015-01-05|        2015-01-01|   2014-12-06|      2014-11-06|      2014-10-06|    2014-07-06|         2014-01-06|  101.07020999999999|   2015-01-02|
  |        Yankees|2015-01-07 00:00:00|  216786.5|         -0.006| 2015|       01|   07|2015-01-07| 2015-01-05|        2015-01-01|   2014-12-07|      2014-11-07|      2014-10-07|    2014-07-07|         2014-01-07|              100.46378874|    2015-01-02|
  |        Yankees|2015-01-08 00:00:00|  215464.5|       -0.0063| 2015|       01|   08|2015-01-08| 2015-01-05|        2015-01-01|   2014-12-08|      2014-11-08|      2014-10-08|    2014-07-08|         2014-01-08|        99.830866870938|    2015-01-02|
  |        Yankees|2015-01-09 00:00:00|214103.25|        0.0052| 2015|       01|   09|2015-01-09| 2015-01-05|        2015-01-01|   2014-12-09|      2014-11-09|      2014-10-09|    2014-07-09|         2014-01-09|  100.34998737866687|    2015-01-02|
  |        Yankees|2015-01-12 00:00:00|  215218.5|       -4.0E-4| 2015|       01|   12|2015-01-12| 2015-01-12|        2015-01-01|   2014-12-12|      2014-11-12|      2014-10-12|    2014-07-12|         2014-01-12|  100.30984738371541|    2015-01-02|
  |        Yankees|2015-01-13 00:00:00|215125.25|        0.0036| 2015|       01|   13|2015-01-13| 2015-01-12|        2015-01-01|   2014-12-13|      2014-11-13|      2014-10-13|    2014-07-13|         2014-01-13|  100.67096283429677|    2015-01-02|
  |        Yankees|2015-01-14 00:00:00|  215919.5|        8.0E-4| 2015|       01|   14|2015-01-14| 2015-01-12|        2015-01-01|   2014-12-14|      2014-11-14|      2014-10-14|    2014-07-14|         2014-01-14|    100.7514996045642|    2015-01-02|
  |        Yankees|2015-01-15 00:00:00|216103.75|        4.0E-4| 2015|       01|   15|2015-01-15| 2015-01-12|        2015-01-01|   2014-12-15|      2014-11-15|      2014-10-15|    2014-07-15|         2014-01-15|  100.79180020440602|    2015-01-02|
  |        Yankees|2015-01-16 00:00:00|  216205.5|       0.0052| 2015|       01|   16|2015-01-16| 2015-01-12|        2015-01-01|   2014-12-16|      2014-11-16|      2014-10-16|    2014-07-16|         2014-01-16|  101.31591756546894|    2015-01-02|
  |        Yankees|2015-01-19 00:00:00|  347334.0|      -0.0045| 2015|       01|   19|2015-01-19| 2015-01-19|        2015-01-01|   2014-12-19|      2014-11-19|      2014-10-19|    2014-07-19|         2014-01-19|  100.85999593642434|    2015-01-02|
  |        Yankees|2015-01-20 00:00:00|  345767.0|       0.0015| 2015|       01|   20|2015-01-20| 2015-01-19|        2015-01-01|   2014-12-20|      2014-11-20|      2014-10-20|    2014-07-20|         2014-01-20|  101.01128593032898|    2015-01-02|
  |        Yankees|2015-01-21 00:00:00|  346314.5|       2.0E-4| 2015|       01|   21|2015-01-21| 2015-01-19|        2015-01-01|   2014-12-21|      2014-11-21|      2014-10-21|    2014-07-21|         2014-01-21|  101.03148818751504|    2015-01-02|
  |        Yankees|2015-01-22 00:00:00|346399.75|       0.0029| 2015|       01|   22|2015-01-22| 2015-01-19|        2015-01-01|   2014-12-22|      2014-11-22|      2014-10-22|    2014-07-22|         2014-01-22| 101.32447950325883|    2015-01-02|
  |        Yankees|2015-01-23 00:00:00|347412.75|      -6.0E-4| 2015|       01|   23|2015-01-23| 2015-01-19|        2015-01-01|   2014-12-23|      2014-11-23|      2014-10-23|    2014-07-23|         2014-01-23| 101.26368481555686|    2015-01-02|
  |        Yankees|2015-01-26 00:00:00|348303.75|      -6.0E-4| 2015|       01|   26|2015-01-26| 2015-01-26|        2015-01-01|   2014-12-26|      2014-11-26|      2014-10-26|    2014-07-26|         2014-01-26| 101.20292660466752|    2015-01-02|
  |        Yankees|2015-01-27 00:00:00|  348541.0|      -0.0044| 2015|       01|   27|2015-01-27| 2015-01-26|        2015-01-01|   2014-12-27|      2014-11-27|      2014-10-27|    2014-07-27|         2014-01-27| 100.75763372760697|    2015-01-02|
  |        Yankees|2015-01-28 00:00:00|347579.25|       0.0015| 2015|       01|   28|2015-01-28| 2015-01-26|        2015-01-01|   2014-12-28|      2014-11-28|      2014-10-28|    2014-07-28|         2014-01-28|   100.9087701781984|    2015-01-02|
  |        Yankees|2015-01-29 00:00:00|348431.75|       2.0E-4| 2015|       01|   29|2015-01-29| 2015-01-26|        2015-01-01|   2014-12-29|      2014-11-29|      2014-10-29|    2014-07-29|         2014-01-29|  100.92895193223403|    2015-01-02|
  |        Yankees|2015-02-28 00:00:00|348431.75|       2.0E-4| 2015|       02|   28|2015-02-28| 2015-01-26|        2015-01-01|   2014-12-28|      2014-11-28|      2014-10-28|    2014-07-28|         2014-01-28|  100.92895193223403|    2015-01-02|
 +--------------+-----------------------+-----------+--------------+-----+-------+----+------------+--------------+------------------+---------------+----------------+-----------------+---------------+-------------------+-------------------------+---------------+
oyxsuwqo

oyxsuwqo1#

你可以试试看 left_outer self join 这样地。

from pyspark.sql import function as F
df.join(df.select(F.col("AsofDate").alias("oneMonthAgo"),\
                  F.col("value").alias("1MonthAgoValue")),['oneMonthAgo'],'left_outer')\
  .orderBy("AsofDate")\
  .show()

# +-----------+----------+-----+--------------+

# |oneMonthAgo|  AsofDate|value|1MonthAgoValue|

# +-----------+----------+-----+--------------+

# | 2019-02-20|2019-02-23|    2|          null|

# | 2019-02-20|2019-03-20|    7|          null|

# | 2019-02-21|2019-03-21|   12|          null|

# | 2019-02-22|2019-03-22|   27|          null|

# | 2019-02-23|2019-03-23|   91|             2|

# +-----------+----------+-----+--------------+

``` `UPDATE:` 试试这个:

from pyspark.sql import functions as F
from pyspark.sql.window import Window

w=Window().partitionBy(F.dayofmonth("AsofDate"))
.orderBy(F.to_timestamp("AsofDate").cast("long"))
.rangeBetween(86400*-30,0)

first=F.first("value").over(w)

df.withColumn("1MonthAgoValue", F.when(first!=F.col("value"), first)
.otherwise(F.lit(None))).show()

lfapxunr

lfapxunr2#

这是可行的,一个窗口函数是正确的方法,但是你不需要asofdate列

from pyspark.sql import function as F
from pyspark.sql import Window

w = Window.partitionBy(F.dayofmonth('AsofDate'))
w = w.orderBy(F.Year('AsofDate'),F.month('AsofDate'))

df.withColumn('1MonthAgoValue', F.lag('value').over(w))

在实际的数据集中,您可能希望使其更为唯一,并按id/为每个id执行此操作,如果是这种情况,请将id列也添加到partitionby中。
ps:如果lag()得到错误的结果,请使用lead(),我总是忘记顺序,最后不得不同时尝试这两种方法。

相关问题