我有同样的问题,这是概述多年前在这里:https://github.com/scikit-learn/scikit-learn/issues/10197
这个问题似乎还没有得到解决,所以我正在寻找一个解决办法。那里给出的例子不再起作用,所以这里是我基于https://scikit-learn.org/stable/auto_examples/inspection/plot_partial_dependence.html写的一个例子
from sklearn.datasets import fetch_openml
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
from time import time
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import HistGradientBoostingRegressor
bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas")
# Make an explicit copy to avoid "SettingWithCopyWarning" from pandas
X, y = bikes.data.copy(), bikes.target
X["weather"].replace(to_replace="heavy_rain", value="rain", inplace=True)
mask_training = X["year"] == 0.0
X = X.drop(columns=["year"])
X_train, y_train = X[mask_training], y[mask_training]
X_test, y_test = X[~mask_training], y[~mask_training]
numerical_features = [
"temp",
"feel_temp",
"humidity",
"windspeed",
]
categorical_features = X_train.columns.drop(numerical_features)
hgbdt_preprocessor = ColumnTransformer(
transformers=[
("cat", OrdinalEncoder(), categorical_features),
("num", "passthrough", numerical_features),
],
sparse_threshold=1,
verbose_feature_names_out=False,
).set_output(transform="pandas")
hgbdt_model = make_pipeline(
hgbdt_preprocessor,
HistGradientBoostingRegressor(
categorical_features=categorical_features, random_state=0
),
)
hgbdt_model.fit(X_train, y_train)
staged_predict_train = [i for i in hgbdt_model.staged_predict(X_train)]
这将生成AttributeError:“管道”对象没有属性“staged_predict”
我尝试的第一件事就是将它直接传递给管道中的模型
staged_predict_train = [i for i in hgbdt_model['histgradientboostingregressor'].staged_predict(X_train)]
这会失败,因为X_train不再由流水线中的前一步骤编码。
1条答案
按热度按时间ar7v8xwq1#
让columntransformer转换列非常简单