numpy XGBoost:检查失败:valid:输入数据包含“inf”或“nan”

fd3cxomn  于 2023-02-19  发布在  其他
关注(0)|答案(3)|浏览(519)

我尝试在Windows 10上运行XGBoost。代码的相关部分如下所示:

model = XGBClassifier()
print(x_train.shape)
print(y_train.shape)

print(np.isnan(x_train).any())
print(np.isnan(y_train).any())
print(np.isinf(x_train).any())
print(np.isinf(y_train).any())
print(np.isfinite(x_train).all())
print(np.isfinite(y_train).all())

model.fit(x_train, y_train)

并产生以下结果:

(4116, 37)  
(4116,)
False  
False
False
False
True
True

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. Traceback (most recent call last): [...]
    model.fit(x_train, y_train)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1173, in fit
    label_transform=label_transform,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 244, in _wrap_evaluation_matrices
    missing=missing,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1172, in <lambda>
    create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 547, in
__init__
    enable_categorical=enable_categorical,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 565, in dispatch_data_backend
    feature_types)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 169, in
_from_numpy_array
    ctypes.c_int(nthread)))   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 210, in
_check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [14:21:29] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`

我的数据显然不包含任何“inf”或“nan”值。任何关于如何从这里继续的想法都非常感谢。

iyzzxitl

iyzzxitl1#

使用scikit-learn的StandardScaler解决了我的问题。感谢Antoine Messager的回答,我最终完成了以下操作:

from sklearn.preprocessing import StandardScaler
model = XGBClassifier()
scaler = StandardScaler()
x_trainScaled = scaler.fit_transform(x_train)
model.fit(x_trainScaled, y_train)
wooyq4lh

wooyq4lh2#

我刚刚遇到了同样的错误,似乎是由于存在非常大的浮点数(1e300)。我使用对数变换修复了它。

2uluyalo

2uluyalo3#

使用xgboost.DMatrix检查每一列,尝试将数据转换到xgboost.DMatrix,如果转换失败,获取转换错误列数据的value_counts以查找该列中的异常数据

相关问题