如何在Spark中获取向量类型列元素?

9ceoxa92  于 2023-03-19  发布在  Apache
关注(0)|答案(1)|浏览(144)
+-----------------+--------------------+--------------------+----------+
|               id|       rawPrediction|         probability|prediction|
+-----------------+--------------------+--------------------+----------+
|1C3LC45K68N224432|[7.22879886627197...|[0.99927513417787...|       0.0|
|1D7HU18D14S572618|[8.62613201141357...|[0.99982067510427...|       0.0|
|1FTEW1EP1JFB92236|[5.51067543029785...|[0.99597290763631...|       0.0|
|1G1RA6S57JU118890|[6.31579494476318...|[0.99819573306012...|       0.0|
|1GMDU03L36D140830|[6.60290288925170...|[0.99864541261922...|       0.0|
|2C3CDZFJ3HH605972|[6.98962211608886...|[0.99907945352606...|       0.0|
|2C4RDGBGXER222234|[4.78376197814941...|[0.99170491099357...|       0.0|
|2GCEK19R7W1131527|[8.05116367340087...|[0.99968137074029...|       0.0|
|2HGFA1E4XAH013202|[6.45138216018676...|[0.99842414807062...|       0.0|
|2HGFB2F41DH041346|[4.87959384918212...|[0.99245722545310...|       0.0|
|2T1BR32EX7C734489|[7.98803615570068...|[0.99966061508166...|       0.0|
|2T1BU4EE8BC625148|[5.24141168594360...|[0.99473508633673...|       0.0|
|3GTEK14X96G191256|[5.94854307174682...|[0.99739715270698...|       0.0|
|3KPC24A30KE056134|[5.82482624053955...|[0.99705537920817...|       0.0|
|5N1AT2MV0FC788987|[4.29053592681884...|[0.98648750595748...|       0.0|
|5NPEB4AC5CH487882|[6.25585126876831...|[0.99808448471594...|       0.0|
|5TBBT44103S355433|[8.68789100646972...|[0.99983141316624...|       0.0|
|5TDBK3EH6CS162428|[4.95779943466186...|[0.99302067607641...|       0.0|
|JTDBBRBE0LJ006511|[5.03314828872680...|[0.99352395581081...|       0.0|
|KM8NU13C09U092234|[6.17661666870117...|[0.99792686221189...|       0.0|
+-----------------+--------------------+--------------------+----------+

我用xgboost4j来做推理,得到上面的 Dataframe 。如何在spark scala中得到列probability的第二个元素?有没有udf可以简洁地实现这个?

root
 |-- id: string (nullable = true)
 |-- rawPrediction: vector (nullable = true)
 |-- probability: vector (nullable = true)
 |-- prediction: double (nullable = false)
7kqas0il

7kqas0il1#

使用向量数组

.withColumn("probLabel00",
        vector_to_array(col("probability")).getItem(0))

但我想用更有效的方法

相关问题