pandas 在XGBoost Python中使用序数变量作为类别

wn9m85ua 于 2022-11-20 发布在 Python

关注(0)|答案(2)|浏览(564)

我正在尝试使用XGBoost训练一个多类分类器。数据包含4个独立变量，它们在本质上是有序的。我想使用这些变量，因为它们是编码的。数据如下所示
| 数据行名称|数值|
| - -|- -|
| 目标物|['高'，'中'，'低']|
| 功能_1|取值范围为1-5|
| 功能_2|取值范围为1-5|
| 功能_3|取值范围为1-5|
| 功能_4|取值范围为1-5|
我的代码目前看起来如下

y = data['target']
X = data.drop(['target'], axis=1)

X = X.fillna(0)
X = X.astype('int').astype('category')

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=random_state, stratify=y)

# Create instance of model
xgb_model = XGBClassifier()

# Create the random grid
xgb_grid = {'n_estimators': [int(x) for x in np.linspace(start = 100, stop = 500, num = 5)],
            'max_depth': [3, 5, 8, 10],
            'learning_rate': [0.01, 0.05, 0.1, 0.2, 0.3]}

xgb_model_tuned = RandomizedSearchCV(estimator = xgb_model, param_distributions = xgb_grid, n_iter = 50, cv = 5, scoring='roc_auc', verbose=2, random_state=random_state, n_jobs = -1)

# Pass training data into model
xgb_model_tuned.fit(x_train, y_train)

当我运行此

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical.  When
                categorical type is supplied, DMatrix parameter
                `enable_categorical` must be set to `True`.feature_1, feature_2, 
                feature_3, feature_4

所有变量的dtype都是category。这在RandomForest分类器中很好用，但在XGBoost中就不行了。如果我不能使用数据类型category，我怎么能将有序变量作为类别传递呢？

pandas

来源：https://stackoverflow.com/questions/74478807/using-ordinal-variables-as-categories-in-xgboost-python

2条答案

按热度按时间

rkue9o1l1#

你就快到了！
基于XGBoost Documentation，需要设置enable_categorical=True，支持的树方法有gpu_hist、approx、hist。

# Create instance of model
xgb_model = XGBClassifier(tree_method="gpu_hist", enable_categorical=True)

此外，请确保您的XGBoost版本为1.5及以上。

赞(0）回复(0）举报 2022-11-20

u4vypkhs2#

如果希望将它们视为序数，则只需将列类型设置为int：xgboost将进行分裂，就好像它们是连续的一样，这保持了有序的性质。

赞(0）回复(0）举报 2022-11-20

我来回答

pandas 在XGBoost Python中使用序数变量作为类别

2条答案

相关问题

热门标签

最新问答