在ml中添加交互项

pepwfjgg 于 2021-05-18 发布在 Spark

关注(0)|答案(0)|浏览(250)

我试图在ml模型训练中添加两个变量之间的交互项，但是最终的结果是将所有的组合与base结合在一起。我所期望的是，最终的估计可能有mn-1个系数，而不是mn。m和n是每个变量中类别的级别。
以下是我使用的代码：

stringIndexer = StringIndexer(inputCols=['a','b'], outputCols=['aIndex1','bIndex1'], stringOrderType='frequencyAsc')
data_index = stringIndexer.fit(data).transform(data)

encoder = OneHotEncoder(inputCols=['aIndex1','bIndex1'], outputCols=['aVec1','bVec1'])
data_encoder = encoder.fit(data_index).transform(data_index)

interaction = Interaction(inputCols=['aVec1','bVec1'], outputCol="interactedCol")
data_interacted = interaction.transform(data_encoder)

a和b变量都有5个级别。
当我使用最终的数据集data\u interactived并以‘interactivedcol’作为特征来运行logistic回归模型时，最终的变量数是25。我所期望的是，我应该看到24个变量估计，因为其中一个将被视为基础。
你知道我做错了什么吗？提前谢谢你的建议。

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml

来源：https://stackoverflow.com/questions/64602060/add-interaction-term-to-ml

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

在ml中添加交互项

暂无答案！

相关问题

热门标签

最新问答