pandas 管道对象中的分段预测

nle07wnf  于 2023-02-17  发布在  其他
关注(0)|答案(1)|浏览(104)

我有同样的问题,这是概述多年前在这里:https://github.com/scikit-learn/scikit-learn/issues/10197
这个问题似乎还没有得到解决,所以我正在寻找一个解决办法。那里给出的例子不再起作用,所以这里是我基于https://scikit-learn.org/stable/auto_examples/inspection/plot_partial_dependence.html写的一个例子

from sklearn.datasets import fetch_openml
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
from time import time
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import HistGradientBoostingRegressor

bikes = fetch_openml("Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas")
# Make an explicit copy to avoid "SettingWithCopyWarning" from pandas
X, y = bikes.data.copy(), bikes.target

X["weather"].replace(to_replace="heavy_rain", value="rain", inplace=True)

mask_training = X["year"] == 0.0
X = X.drop(columns=["year"])
X_train, y_train = X[mask_training], y[mask_training]
X_test, y_test = X[~mask_training], y[~mask_training]

numerical_features = [
    "temp",
    "feel_temp",
    "humidity",
    "windspeed",
]
categorical_features = X_train.columns.drop(numerical_features)

hgbdt_preprocessor = ColumnTransformer(
    transformers=[
        ("cat", OrdinalEncoder(), categorical_features),
        ("num", "passthrough", numerical_features),
    ],
    sparse_threshold=1,
    verbose_feature_names_out=False,
).set_output(transform="pandas")


hgbdt_model = make_pipeline(
    hgbdt_preprocessor,
    HistGradientBoostingRegressor(
        categorical_features=categorical_features, random_state=0
    ),
)
hgbdt_model.fit(X_train, y_train)

staged_predict_train = [i for i in hgbdt_model.staged_predict(X_train)]

这将生成AttributeError:“管道”对象没有属性“staged_predict”
我尝试的第一件事就是将它直接传递给管道中的模型

staged_predict_train = [i for i in hgbdt_model['histgradientboostingregressor'].staged_predict(X_train)]

这会失败,因为X_train不再由流水线中的前一步骤编码。

ar7v8xwq

ar7v8xwq1#

让columntransformer转换列非常简单

enc = clf['columntransformer']
X_train_encoded = enc.fit_transform(X_train)
X_test_encoded = enc.fit_transform(X_test)

staged_predict_train = [i.transpose()[1] for i in clf['histgradientboostingclassifier'].staged_predict_proba(X_train_encoded)]

相关问题