性别列中有一些缺失值,希望使用KNN插补来插补这些值。但是我没有得到填充的结果!有人可以帮助吗?
import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer
data = {'ID': [1, 2, 3, 4, 5],
'Age': [20, 25, 30, 35, 40],
'Gender': ['M', 'F', np.nan, 'F', np.nan]}
df = pd.DataFrame(data)
imputer = KNNImputer(n_neighbors=2)
df['Gendermap'] = pd.factorize(df['Gender'])[0]
df['Gender_imputed_factorized'] = imputer.fit_transform(df[['Gendermap']])
df['Gender_imputed'] = pd.unique(df['Gender'])[df['Gender_imputed_factorized'].astype(int)]
df
输出:
ID Age Gender Gendermap Gender_imputed_factorized Gender_imputed
0 1 20 M 0 0.0 M
1 2 25 F 1 1.0 F
2 3 30 NaN -1 -1.0 NaN
3 4 35 F 1 1.0 F
4 5 40 NaN -1 -1.0 NaN
“性别插补”列不应包含Nan值。
2条答案
按热度按时间xwmevbvl1#
我认为是您使用的
factorize
函数导致了问题。它删除了NaN值,因此当您使用fit_transform时,没有什么可以估算。尝试使用map将性别转换为数字列,如下所示:
d6kp6zgx2#
有办法了谢谢。