在pyspark中使用stringindexer时如何将列名作为变量

06odsfpq 于 2021-07-13 发布在 Spark

关注(0)|答案(1)|浏览(355)

{simpleDF.columns 
 #output :['color', 'lab', 'value1', 'value2']
 indexer = simpleDF.select('lab')

 from pyspark.ml.feature import StringIndexer
 # Let us create an object of the class StringIndexer
 lblindexer=StringIndexer().setInputCol(indexer).setOutputCol("LabelIndexed")
 idxRes=lblindexer.fit(simpleDF).transform(simpleDF)

 idxRes.show(5)}

这是工作与这行代码，但我希望它更一般


# lblindexer=StringIndexer().setInputCol('lab').setOutputCol("LabelIndexed")

获取错误：typeerror:为param“inputcol”给定的param值无效。无法将<class'pyspark.sql.dataframe.dataframe'>转换为字符串类型

apache-spark pyspark apache-spark-ml

来源：https://stackoverflow.com/questions/66076849/how-to-place-column-name-as-variable-when-using-stringindexer-in-pyspark

1条答案

按热度按时间

dl5txlt91#

为输入列使用列名，而不是Dataframe：

lblindexer=StringIndexer().setInputCol('lab').setOutputCol("LabelIndexed")

如果你想使用一个变量，

indexer = 'lab'
lblindexer=StringIndexer().setInputCol(indexer).setOutputCol("LabelIndexed")

赞(0）回复(0）举报 2021-07-13

我来回答

在pyspark中使用stringindexer时如何将列名作为变量

1条答案

相关问题

热门标签

最新问答