spark-stringindexer与onehotcoderestimator之比较

ctehm74n 于 2021-05-19 发布在 Spark

关注(0)|答案(1)|浏览(361)

我正在学习spark，我在其中一个教程中有下面的代码。我知道Dataframe是一个热编码在下面的代码，但我不明白的是为什么要使用stringindexer？stringindexer是否应与onehotencoderestimator结合使用？val si=new stringindexer（）.sethandleinvalid（“keep”）.setinputcol（procttypecol）.setoutputcol（procttypesioutcol）

val ohe = new OneHotEncoderEstimator()
      .setHandleInvalid("keep")
      .setInputCols(Array(si.getOutputCol))
      .setOutputCols(Array(ProductTypeOHEOutCol))

val pipeline = new Pipeline()
  .setStages(Array(si, ohe))

谢谢

scala apache-spark

来源：https://stackoverflow.com/questions/64433009/spark-stringindexer-vs-onehotencoderestimator