pandas XGBoost plot_importance不显示要素名称

w7t8yxp5 于 2023-01-11 发布在其他

关注(0)|答案(9)|浏览(211)

我在Python中使用XGBoost，并使用XGBoost train()函数对DMatrix数据进行了成功的训练。矩阵是从Pandas Dataframe 创建的，其中列具有特征名称。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)

model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
                  early_stopping_rounds=50, maximize=False, verbose_eval=10)

fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)

现在，我想使用xgboost.plot_importance()函数查看特性的重要性，但结果图没有显示特性名称，而是将特性列为f1、f2、f3等，如下所示。

我认为问题是我把原始的Pandas数据框转换成了一个DMatrix。我怎样才能正确地关联特征名称，以便特征重要性图显示它们呢？

pandas

来源：https://stackoverflow.com/questions/46943314/xgboost-plot-importance-doesnt-show-feature-names

9条答案

按热度按时间

31moq8wy1#

如果你使用的是scikit-learn Package 器，你需要访问底层的XGBoost Booster并在上面设置特性名称，而不是scikit模型，如下所示：

model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())

赞(0）回复(0）举报 2023-01-11

ldxq2e6h2#

您希望在创建xgb.DMatrix时使用feature_names参数

dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)

赞(0）回复(0）举报 2023-01-11

yptwkmov3#

train_test_split将 Dataframe 转换为numpy数组，该数组不再具有列信息。
您可以按照@piRSquared的建议，将这些特性作为参数传递给DMatrix构造函数，也可以将从train_test_split返回的numpy数组转换为Dataframe，然后使用您的代码。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)

# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)

dtrain = xgb.DMatrix(Xtrain, label=ytrain)

赞(0）回复(0）举报 2023-01-11

enyaitl34#

使用Scikit-Learn Package 器接口“XGBClassifier”，plot_importance返回类“matplotlib Axes”，因此我们可以使用axes.set_yticklabels。
第一个月

赞(0）回复(0）举报 2023-01-11

55ooxyrt5#

当我在玩feature_names的时候，我发现了另一种方法。当我在玩它的时候，我写了这个，它可以在我现在运行的XGBoost v0.80上工作。

## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
    f.write('\n'.join(model.feature_names))

## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")

with open("foo_fnames.txt", "r") as f:
    feature_names2 = f.read().split("\n")

model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)

这是单独保存feature_names，然后再添加回去，由于某种原因，feature_types也需要初始化，即使值是None。

赞(0）回复(0）举报 2023-01-11

ajsxfq5m6#

示例化XGBoost分类器时，应指定feature_names：

xgb = xgb.XGBClassifier(feature_names=feature_names)

请注意，如果您将xgb分类器 Package 在sklearn管道中，该管道对列执行任何选择（例如VarianceThreshold），则xgb分类器在尝试拟合或转换时将失败。

赞(0）回复(0）举报 2023-01-11

c9qzyr3d7#

如果接受培训

model = XGBClassifier(
    max_depth = 8, 
    learning_rate = 0.25, 
    n_estimators = 50, 
    objective = "binary:logistic",
    n_jobs = 4
)

# x, y are pandas DataFrame
model.fit(train_data_x, train_data_y)

您可以执行model.get_booster().get_fscore()来获取特性名称和特性重要性作为python dict

赞(0）回复(0）举报 2023-01-11

hkmswyz68#

你也可以不使用DMatrix来简化代码。列名被用作标签：

from xgboost import XGBClassifier, plot_importance
model = XGBClassifier()
model.fit(Xtrain, ytrain)
plot_importance(model)

赞(0）回复(0）举报 2023-01-11

dpiehjr49#

使用feature_names重命名ytick标注，作为传递到matplotlib.axes.Axes.set_yticklabels的字符串列表

fig, ax = plt.subplots(1,1,figsize=(10,10))
 xgb.plot_importance(model, max_num_features=5, ax=ax)
 ax.set_yticklabels(feature_names)
 plt.show()

赞(0）回复(0）举报 2023-01-11

我来回答

pandas XGBoost plot_importance不显示要素名称

9条答案

相关问题

热门标签

最新问答