如何在计算中使用PySpark Dataframe 中包含的操作数？

zwghvu4y 于 2023-04-11 发布在 Spark

关注(0)|答案(1)|浏览(109)

我有一个PySpark数据框，其中包含操作数（具体来说是大于，小于等）。我想使用这些操作数来计算数据框中其他值的结果，并使用此新数据创建一个新列。例如：

from pyspark.sql import Row
from pyspark.sql.functions import expr, when

df = spark.createDataFrame([
    Row(id=1, value=3.0, operand='>', threshold=2. ),
    Row(id=2, value=2.3, operand='>=', threshold=3. ),
    Row(id=3, value=0.0, operand='==', threshold=0.0 )
])

df = df.withColumn('result', when(expr("value " + df.operand + " threshold"), True).otherwise(False))

df.show()

我希望得到以下结果：

|id|value|operand|threshold|result|
|--|-----|-------|---------|------|
|1 |  3.0|      >|      2.0|  true|
|2 |  2.3|     >=|      3.0| false|
|3 |  0.0|     ==|      0.0|  true|

我尝试了不同的机制来提取操作数值（即col("operand")），但都没有成功。
注意-我很欣赏使用==来确定双精度是否相等并不总是可靠的，但用例允许这样做。

pyspark

来源：https://stackoverflow.com/questions/75956066/how-do-i-use-an-operand-contained-in-a-pyspark-dataframe-within-a-calculation

1条答案

按热度按时间

2izufjch1#

以下是我的动态解决方案：

from pyspark.sql.functions import udf

dynamic_condition_udf = udf(lambda value, operand, threshold: eval(f"{value} {operand} {threshold}"))

df.withColumn("result", dynamic_condition_udf("value", "operand", "threshold")).show()

输出

+---+-----+-------+---------+------+
| id|value|operand|threshold|result|
+---+-----+-------+---------+------+
|  1|  3.0|      >|      2.0|  true|
|  2|  2.3|     >=|      3.0| false|
|  3|  0.0|     ==|      0.0|  true|
+---+-----+-------+---------+------+

赞(0）回复(0）举报 2023-04-11

我来回答

如何在计算中使用PySpark Dataframe 中包含的操作数？

1条答案

相关问题

热门标签

最新问答