pandas 如何在sklearn KNN.fit()方法中同时使用string和float DataType

cqoc49vn  于 2023-09-29  发布在  其他
关注(0)|答案(1)|浏览(114)

我有一个包含字符串和浮点数DataType的数据集,我想用这个数据集训练我的KNN模型,但它给出了一个ValueError

could not covert string to float
inputs=data.drop(['HeartDisease'],'columns')
output=data.drop(['Age', 'Sex', 'ChestPainType', 'RestingBP', 'Cholesterol', 'FastingBS', 'RestingECG', 'MaxHR', 'ExerciseAngina', 'Oldpeak', 'ST_Slope'],'columns')

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(inputs,output,train_size=0.8)

from sklearn.neighbors import KNeighborsClassifier
model=KNeighborsClassifier(n_neighbors=31)
model.fit(x_train,y_train)

我希望模型是用特定的数据集训练的

fnx2tebb

fnx2tebb1#

在每个ML模型中,您不能按原样使用数据字符串。你必须预处理你的输入,将它们转换成数字类型。除了自然语言处理之外,您可能有一些不同的文本值(分类特征)。
例如'ChestPainType'列,应该只有4个值:['ATA', 'NAP', 'ASY', 'TA']。现在你必须将这些字符串转换为数字:“ATA”:0,“NAP”:1,“ASY”:2,“TA”:3.在Pandas中,您可以使用pd.factorizepd.get_dummies来执行此操作,但如果您使用sklearn,请尝试LabelEncoder(特别是在需要时用于y目标)或OneHotEncoder(有时为OrdinalEncoder)。
最简单的方法是使用ColumnTransformer
可重复的示例:

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix

# https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction
data = pd.read_csv('heart.csv')

features = data.drop(columns=['HeartDisease'])
target = df['HeartDisease']

# Text features to convert as numeric. 'M': [1, 0], 'F': [0, 1]
feat_cols = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']

ct = ColumnTransformer(
    transformers=[('le', OrdinalEncoder(), feat_cols)],
    remainder='passthrough'
)

# Convert your data as numeric values
X = ct.fit_transform(features)
y = np.stack(target.values)

# Create 2 datasets for train and test
X_train, y_train, X_test, y_test = train_test_split(X, y, train_size=0.8)

# Missing step, use `StandardScaler` to normalize numeric values

# Train your model
model = KNeighborsClassifier(n_neighbors=31)
model.fit(X_train, y_train)

# Evaluate your model (63% here)
model.score(X_test, y_test)

相关问题