我想处理每一行Dataframe。这里是专栏 feat
包含许多格式为的元素 idx:value
. 我想保持沉默 idx
我想要的。
例如,我想保留 idx=1
或者 idx=5
.
df = spark.createDataFrame([("u1","1:a 2:k 5:c 6:i"),("u2","2:k 4:p 5:b 6:k")],["id","feat"])
``` `Input:` ```
+---+---------------+
| id| feat|
+---+---------------+
| u1|1:a 2:k 5:c 6:i|
| u2|2:k 4:p 5:b 6:k|
+---+---------------+
``` `Expected` :
+---+---------------+
| id| feat|
+---+---------------+
| u1|1:a 5:c |
| u2|5:b |
+---+---------------+
1条答案
按热度按时间z2acfund1#
下面是我对几个函数的尝试。