过滤数组值列上的pysparkDataframe内容

wvt8vs2t 于 2021-05-18 发布在 Spark

关注(0)|答案(0)|浏览(317)

我的模式如下：

DataFrame[record_id: string, months: array<decimal(2,0)>, max_amount: decimal(12,2)]

数据如下所示：

+----------------+------------------+-------------------+
|       record_id|            months|         max_amount|
+----------------+------------------+-------------------+
|3535345345345343|   [4, 5, 6, 7, 9]|              17.33|
|3535345345345344|         [7, 8, 9]|               9.57|         
|3535345345345345|               [4]|               1.00| 
|3535345345345346|[4, 5, 6, 7, 8, 9]|              15.08|         
|3535345345345347|[4, 5, 6, 7, 8, 9]|              17.11|         
|3535345345345348|      [4, 5, 7, 9]|              12.99|         
|3535345345345349|[4, 5, 6, 7, 8, 9]|              16.95|         
|3535345345345340|   [4, 5, 6, 7, 8]|              12.99|        
|3535345345345311|[4, 5, 6, 7, 8, 9]|              12.99|         
|3535345345345542|[4, 5, 6, 7, 8, 9]|              12.99|       
+----------------+------------------+-------------------+

我想在months列下过滤数组中存在的值的数据（例如：获取列表中具有month值6的所有行）。我尝试了以下方法，可以很好地处理字符串值：

import pyspark.sql.functions as sf

my_df.filter(sf.array_contains(my_df['months'], 6)).show()

但在int数组的情况下，我得到以下错误：

org.apache.spark.sql.AnalysisException: cannot resolve 'array_contains(`months`, 6)' due to data type mismatch: Input to function array_contains should have been array followed by a value with same element type, but it's [array<decimal(2,0)>, int].

我也试过用 isin() ，但它不起作用。我是否必须修改作为中的第二个参数传递的整数值 array_contains() 为了让它工作？好心的建议。

DataFrame apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/64582761/filter-pyspark-dataframe-content-on-array-value-column

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

过滤数组值列上的pysparkDataframe内容

暂无答案！

相关问题

热门标签

最新问答