pandas 在Scikit Learn中运行SelectKBest后获取功能名称的最简单方法

4ioopgfo  于 2022-11-05  发布在  其他
关注(0)|答案(8)|浏览(168)

我想做监督学习。
到目前为止,我知道要对所有特征进行监督学习。
然而,我也想进行实验与K最好的功能。
我阅读了文档,发现在Scikit中学习到了SelectKBest方法。
不幸的是,我不知道在找到这些最佳特性后如何创建新的 Dataframe :
让我们假设我想用5个最佳特性进行实验:

from sklearn.feature_selection import SelectKBest, f_classif
select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)

现在,如果我添加下一行:

dataframe = pd.DataFrame(select_k_best_classifier)

我将收到一个没有特征名称的新 Dataframe (只有从0到4开始的索引)。
我应该将其替换为:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)

我的问题是如何创建features_names列表??
我知道我应该用途:

select_k_best_classifier.get_support()

返回布尔值数组。
数组中的true值表示右列中的索引。
我应该如何将这个布尔数组与我可以通过方法获得的所有功能名称的数组一起使用:

feature_names = list(features_dataframe.columns.values)
xyhw6mcr

xyhw6mcr1#

这不需要循环。


# Create and fit selector

selector = SelectKBest(f_classif, k=5)
selector.fit(features_df, target)

# Get columns to keep and create new dataframe with those only

cols = selector.get_support(indices=True)
features_df_new = features_df.iloc[:,cols]
sauutmhj

sauutmhj2#

对我来说,这段代码运行良好,而且更像是“Python”:

mask = select_k_best_classifier.get_support()
new_features = features_dataframe.columns[mask]
zsohkypk

zsohkypk3#

您可以执行以下操作:

mask = select_k_best_classifier.get_support() #list of booleans
new_features = [] # The list of your K best features

for bool, feature in zip(mask, feature_names):
    if bool:
        new_features.append(feature)

然后更改要素的名称:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)
ecbunoof

ecbunoof4#

下面的代码将帮助你找到前K个特征及其F-分数。令,X是Pandas Dataframe ,其列是所有特征,y是类标签列表。

import pandas as pd
from sklearn.feature_selection import SelectKBest, f_classif

# Suppose, we select 5 features with top 5 Fisher scores

selector = SelectKBest(f_classif, k = 5)

# New dataframe with the selected features for later use in the classifier. fit() method works too, if you want only the feature names and their corresponding scores

X_new = selector.fit_transform(X, y)
names = X.columns.values[selector.get_support()]
scores = selector.scores_[selector.get_support()]
names_scores = list(zip(names, scores))
ns_df = pd.DataFrame(data = names_scores, columns=['Feat_names', 'F_Scores'])

# Sort the dataframe for better visualization

ns_df_sorted = ns_df.sort_values(['F_Scores', 'Feat_names'], ascending = [False, True])
print(ns_df_sorted)
pu82cl6c

pu82cl6c5#

根据chi 2选择最佳10个特征;

from sklearn.feature_selection import SelectKBest, chi2

KBest = SelectKBest(chi2, k=10).fit(X, y)

使用get_support()获取功能

f = KBest.get_support(1) #the most important features

创建名为X_new的新df;

X_new = X[X.columns[f]] # final features`
eblbsuwk

eblbsuwk6#

在Scikit-learn 1.0中,转换器有get_feature_names_out方法,这意味着你可以写

dataframe = pd.DataFrame(fit_transformed_features, columns=transformer.get_features_names_out())
zbq4xfa0

zbq4xfa07#

还有另一种替代方法,然而,该方法不如上述解决方案快。


# Use the selector to retrieve the best features

X_new = select_k_best_classifier.fit_transform(train[feature_cols],train['is_attributed'])

# Get back the kept features as a DataFrame with dropped columns as all 0s

selected_features = pd.DataFrame(select_k_best_classifier.inverse_transform(X_new),
                            index=train.index,
                            columns= feature_cols)
selected_columns = selected_features.columns[selected_features.var() !=0]
mrfwxfqh

mrfwxfqh8#


# Fit the SelectKBest instance

select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit(features_dataframe, targeted_class)

# Extract the required features

new_features  = select_k_best_classifier.get_feature_names_out(features_names)

相关问题