+-----------------+--------------------+--------------------+----------+
| id| rawPrediction| probability|prediction|
+-----------------+--------------------+--------------------+----------+
|1C3LC45K68N224432|[7.22879886627197...|[0.99927513417787...| 0.0|
|1D7HU18D14S572618|[8.62613201141357...|[0.99982067510427...| 0.0|
|1FTEW1EP1JFB92236|[5.51067543029785...|[0.99597290763631...| 0.0|
|1G1RA6S57JU118890|[6.31579494476318...|[0.99819573306012...| 0.0|
|1GMDU03L36D140830|[6.60290288925170...|[0.99864541261922...| 0.0|
|2C3CDZFJ3HH605972|[6.98962211608886...|[0.99907945352606...| 0.0|
|2C4RDGBGXER222234|[4.78376197814941...|[0.99170491099357...| 0.0|
|2GCEK19R7W1131527|[8.05116367340087...|[0.99968137074029...| 0.0|
|2HGFA1E4XAH013202|[6.45138216018676...|[0.99842414807062...| 0.0|
|2HGFB2F41DH041346|[4.87959384918212...|[0.99245722545310...| 0.0|
|2T1BR32EX7C734489|[7.98803615570068...|[0.99966061508166...| 0.0|
|2T1BU4EE8BC625148|[5.24141168594360...|[0.99473508633673...| 0.0|
|3GTEK14X96G191256|[5.94854307174682...|[0.99739715270698...| 0.0|
|3KPC24A30KE056134|[5.82482624053955...|[0.99705537920817...| 0.0|
|5N1AT2MV0FC788987|[4.29053592681884...|[0.98648750595748...| 0.0|
|5NPEB4AC5CH487882|[6.25585126876831...|[0.99808448471594...| 0.0|
|5TBBT44103S355433|[8.68789100646972...|[0.99983141316624...| 0.0|
|5TDBK3EH6CS162428|[4.95779943466186...|[0.99302067607641...| 0.0|
|JTDBBRBE0LJ006511|[5.03314828872680...|[0.99352395581081...| 0.0|
|KM8NU13C09U092234|[6.17661666870117...|[0.99792686221189...| 0.0|
+-----------------+--------------------+--------------------+----------+
我用xgboost4j来做推理,得到上面的 Dataframe 。如何在spark scala中得到列probability
的第二个元素?有没有udf可以简洁地实现这个?
root
|-- id: string (nullable = true)
|-- rawPrediction: vector (nullable = true)
|-- probability: vector (nullable = true)
|-- prediction: double (nullable = false)
1条答案
按热度按时间7kqas0il1#
使用向量数组
但我想用更有效的方法