我尝试在Windows 10上运行XGBoost。代码的相关部分如下所示:
model = XGBClassifier()
print(x_train.shape)
print(y_train.shape)
print(np.isnan(x_train).any())
print(np.isnan(y_train).any())
print(np.isinf(x_train).any())
print(np.isinf(y_train).any())
print(np.isfinite(x_train).all())
print(np.isfinite(y_train).all())
model.fit(x_train, y_train)
并产生以下结果:
(4116, 37)
(4116,)
False
False
False
False
True
True
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. Traceback (most recent call last): [...]
model.fit(x_train, y_train) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1173, in fit
label_transform=label_transform, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 244, in _wrap_evaluation_matrices
missing=missing, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1172, in <lambda>
create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs), File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 547, in
__init__
enable_categorical=enable_categorical, File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 565, in dispatch_data_backend
feature_types) File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 169, in
_from_numpy_array
ctypes.c_int(nthread))) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 210, in
_check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [14:21:29] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`
我的数据显然不包含任何“inf”或“nan”值。任何关于如何从这里继续的想法都非常感谢。
3条答案
按热度按时间iyzzxitl1#
使用scikit-learn的StandardScaler解决了我的问题。感谢Antoine Messager的回答,我最终完成了以下操作:
wooyq4lh2#
我刚刚遇到了同样的错误,似乎是由于存在非常大的浮点数(1e300)。我使用对数变换修复了它。
2uluyalo3#
使用xgboost.DMatrix检查每一列,尝试将数据转换到xgboost.DMatrix,如果转换失败,获取转换错误列数据的value_counts以查找该列中的异常数据