python 将Dropping Column示例添加到管道中

rjee0c15  于 2023-09-29  发布在  Python
关注(0)|答案(4)|浏览(72)

通常,我们将df.drop('column_name', axis=1)用于删除DataFrame中的列。我想将此Transformer添加到管道中
范例:

numerical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),
                                     ('scaler', StandardScaler(with_mean=False))
                                     ])

我该怎么做?

bxfogqkk

bxfogqkk1#

你可以像这样编写一个自定义的Transformer:

class columnDropperTransformer():
    def __init__(self,columns):
        self.columns=columns

    def transform(self,X,y=None):
        return X.drop(self.columns,axis=1)

    def fit(self, X, y=None):
        return self

并在管道中使用它:

import pandas as pd

# sample dataframe
df = pd.DataFrame({
"col_1":["a","b","c","d"],
"col_2":["e","f","g","h"],
"col_3":[1,2,3,4],
"col_4":[5,6,7,8]
})

# your pipline
pipeline = Pipeline([
    ("columnDropper", columnDropperTransformer(['col_2','col_3']))
])

# apply the pipeline to dataframe
pipeline.fit_transform(df)

输出量:

col_1 col_4
0    a    5
1    b    6
2    c    7
3    d    8
j5fpnvbx

j5fpnvbx2#

您可以将Pipeline封装到ColumnTransformer中,这样您就可以选择通过管道处理的数据,如下所示:

import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

from sklearn.compose import make_column_selector, make_column_transformer

col_to_exclude = 'A'
df = pd.DataFrame({'A' : [ 0]*10, 'B' : [ 1]*10, 'C' : [ 2]*10})

numerical_transformer = make_pipeline
    SimpleImputer(strategy='mean'),
    StandardScaler(with_mean=False)
)

transform = ColumnTransformer(
    (numerical_transformer, make_column_selector(pattern=f'^(?!{col_to_exclude})'))
)

transform.fit_transform(df)

注意:我在这里使用正则表达式模式来排除列A

dly7yett

dly7yett3#

最简单的方法是在sklearn.compose.ColumnTransformer中使用'drop'transformer特殊值:

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Specify columns to drop
columns_to_drop = ['feature1', 'feature3']

# Create a pipeline with ColumnTransformer to drop columns
preprocessor = ColumnTransformer(
    transformers=[
        ('column_dropper', 'drop', columns_to_drop),
    ]
)

pipeline = Pipeline(
    steps=[
        ('preprocessing', preprocessor),
    ]
)

# Transform the DataFrame using the pipeline
transformed_data = pipeline.fit_transform(df)
wgmfuz8q

wgmfuz8q4#

我认为这里所有的答案都过于复杂了。FunctionTransformer就是用于这种类型的过程的:

from sklearn.pipeline import FunctionTransformer, make_pipeline
pipeline = make_pipeline(
    FunctionTransformer(lambda df: df.drop(columns_to_drop, axis=1)),
)

顾名思义,您可以定义任意函数。因为这里的操作是最基本的,所以我不认为这需要一个独立的类。

相关问题