Python-Classifier-Xgboost -在GridSearchCV中显示包含参数、持续时间和分数的CV

sdnqo3pr  于 2023-01-01  发布在  Python
关注(0)|答案(1)|浏览(62)

我在xgboost中显示交叉验证信息时遇到了问题。当我使用GridSearchCV时,我会输出警告,如:

[CV 1/2] END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.812 total time=   5.3s[CV 2/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.824 total time=   6.3s[CV 2/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time=   7.7s[CV 2/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.843 total time=   7.6s[CV 1/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.833 total time=   9.3s[CV 1/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.832 total time=   9.7s[CV 1/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.833 total time=  13.0s[CV 2/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time=  12.8s

所以...我有参数+分数+时间+CV(数字部分)。
现在,当我尝试在xgboostverbose=3我没有这个。
我是这么做的:

from xgboost import XGBClassifier

params_str_dict = {'n_estimators': [10,30,50,100], 'max_depth': [50,100,300],  'learning_rate': [0.5, 1], 'objective': ['binary:logistic'], 'verbosity': [3]}

model = XGBClassifier()
step_name = "xgb"
step_param_name = 'xgb__' 

pipe = Pipeline(steps=[
                        # (scale_name, scale),
                        (step_name, model)
                        ])

model_GS = GridSearchCV(estimator=pipe, 
                    param_grid=params_str_dict, 
                    n_jobs=n_jobs, 
                    cv=custom_cv,
                    scoring=scoring,
                    verbose=4)

old_stdout = sys.stdout
log_file = open("cv.log","w")
sys.stdout = log_file
with parallel_backend('multiprocessing'):
    model_GS.fit(X_train, y_train)
    model_scoring_gs_train = model_GS.score(X_train, y_train)
            
    sys.stdout = old_stdout
    log_file.close()

我能用这个做点什么吗?
如何更改我的代码/警告/详细(只有1 - 3),以显示时间+分数+简历+参数?

5m1hhzi4

5m1hhzi41#

问题是在两个地方设置了详细程度,这一行控制XGBoost的详细程度,这可能会打印出与任务无关的信息:

params_str_dict = {
  # ...
  'verbosity': [3]
}

如果删除此设置,并在GridSearchCV对象中添加verbose=3,则结果应显示时间+评分+ cv倍数+相关参数:

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV 2/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.7s
[CV 1/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.968 total time=   0.8s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.8s
[CV 4/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.9s
[CV 5/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.974 total time=   1.1s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=30, xgb__objective=binary:logistic;, score=0.968 total time=   1.8s

最小重现性示例

import xgboost as xgb
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
X_train, y_train = make_classification(n_samples=10_000)

params_str_dict = {"xgb__n_estimators": [10, 30, 50, 100], "xgb__max_depth": [50, 100, 300], "xgb__learning_rate": [0.5, 1], "xgb__objective": ["binary:logistic"]}

pipe = Pipeline(steps=[("xgb", xgb.XGBClassifier())])
model_GS = GridSearchCV(
    estimator=pipe,
    param_grid=params_str_dict,
    n_jobs=-1,
    cv=5,
    verbose=3,
).fit(X_train, y_train)

相关问题