otherwise值作为pyspark中的选项

0mkxixxg 于 2021-07-13 发布在 Spark

关注(0)|答案(1)|浏览(387)

我有以下代码：

session = (spark.table(f'nn_team5_{country}.fact_table')
              .filter(f.col('date_key').between(start,end))
              .filter(f.col('is_client_plus')==1)
              .filter(f.col('source')=='session')
              .filter(f.col('subtype')=='events')
              .groupby('customer_id')
              .agg(f.countDistinct('ga_session_id').alias('total_sessions'))
              .withColumn('session_count',
                         f.when(f.col('total_sessions')>=3,'+3').otherwise('total_sessions'))
             )

display(session)

我有以下输出：

customer_id  sessions  session_count
484635        2        total_sessions
483635        40        +3
484005        1        total_sessions
484688        3        +3
184635        4        +3

我的理想输出是：

customer_id  sessions  session_count
484635        2          2
483635        40        +3
484005        1          1
484688        3         +3

有没有人知道如何做，以得到只是计数在其他部分？我使用了在别名中创建的新列，但它将其视为文本，而不是不同单元格中的值。
谢谢！

python DataFrame apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/66231159/otherwise-values-as-option-in-pyspark

1条答案

按热度按时间

fjaof16o1#

只需添加 f.col 指定您想要的是列，而不是字符串文本。

session = (spark.table(f'nn_team5_{country}.fact_table')
              .filter(f.col('date_key').between(start,end))
              .filter(f.col('is_client_plus')==1)
              .filter(f.col('source')=='session')
              .filter(f.col('subtype')=='events')
              .groupby('customer_id')
              .agg(f.countDistinct('ga_session_id').alias('total_sessions'))
              .withColumn('session_count',
                         f.when(f.col('total_sessions')>=3,'+3').otherwise(f.col('total_sessions')))
             )

赞(0）回复(0）举报 2021-07-13

我来回答

otherwise值作为pyspark中的选项

1条答案

相关问题

热门标签

最新问答