我用下面的代码来进行特性排名和一些交叉验证。
from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor(random_state = 42)
# Train model
model = dtr.fit(X_airbnb, y_airbnb)
feat_importances = pd.DataFrame(model.feature_importances_, index = X_airbnb.columns, columns=["Importance"])
feat_importances.sort_values(by='Importance', ascending=False, inplace=True)
output = cross_validate(dtr, X_airbnb, y_airbnb, cv=2, scoring = 'accuracy', return_estimator =True)
for idx,estimator in enumerate(output['estimator']):
print("Features sorted by their score for estimator {}:".format(idx))
feature_importances = pd.DataFrame(estimator.feature_importances_,
index = X_airbnb.columns,
columns=['importance']).sort_values('importance', ascending=False)
print(feature_importances)
输出给出了特性排名,但也给出了ValueError:不支持连续。完整错误代码如下
d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\model_selection\_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 220, in __call__
return self._score(
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 268, in _score
return self._sign * self._score_func(y_true, y_pred, **self._kwargs)
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\utils\_param_validation.py", line 192, in wrapper
return func(*args, **kwargs)
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\metrics\_classification.py", line 221, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "d:\ITU\CalculusandStatistics\KDS_Statistics_GroupProject\venv\lib\site-packages\sklearn\metrics\_classification.py", line 106, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: continuous is not supported
接着是所需的输出,然后是错误的重复。
主要的问题是我看不出结果有什么问题。我想把feature_importances保存到数据框中,但是它不允许。
1条答案
按热度按时间ecr0jaav1#
进一步的谷歌搜索已经告诉我如何修复这个错误。第一个问题是我不应该只把X称为X,而应该称为X.值,第二个错误是我寻找的评分是准确性指标,它在分类任务中有用,但在回归任务中无效,我可以用www.example.com中显示的任何指标来代替它https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics