我向X_train数据和y_train数据中添加了更多数据,以便用更多数据重新训练模型。我使用pd. concat()完成了这一操作。但是,当我使用连接数据集训练模型时,我得到了以下错误:
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:1692:
FutureWarning: Feature names only support names that are all strings. Got feature
names with dtypes: ['int', 'str']. An error will be raised in 1.2.
FutureWarning,
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-166-a11464987b97> in <module>
----> 1 model1_pool_preds = model1(LinearSVC(class_weight='balanced',
random_state=42), OneVsRestClassifier, X_train_init_new, y_train_init_new,
X_test_init, y_test_init, X_pool)
6 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __array__(self,
dtype)
1991
1992 def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993 return np.asarray(self._values, dtype=dtype)
1994
1995 def __array_wrap__(
ValueError: could not convert string to float:
我想这是因为我添加到现有 Dataframe 中的数据包含一些字符串而不是浮点数。我如何将整个数据集转换为浮点数?我的代码如下:
y_train_init_new = pd.concat([y_train_init, X_pool_labeled.iloc[:, -7:]])
X_train_init_new = pd.concat([X_train_init, X_pool_labeled.iloc[:, 0:27446]])
def model1(model, classifier, X, y, X_test, y_test, X_pool):
m = model
clf = classifier(m)
clf.fit(X,y)
clf_predictions = clf.predict(X_test)
C_report = classification_report(y_test, clf_predictions, zero_division=0)
print(C_report)
clf_roc_auc = roc_auc_score(y_test, clf_predictions, multi_class='ovr')
print('AUC: ', clf_roc_auc)
clf_predictions_pool = clf.predict(X_pool)
return clf_predictions_pool
model1_pool_preds = model1(LinearSVC(class_weight='balanced', random_state=42),
OneVsRestClassifier, X_train_init, y_train_init, X_test_init, y_test_init, X_pool)
如何将连接数据集的所有数据转换为浮点数据?
1条答案
按热度按时间9gm1akwq1#
给定一个完全是字符串的 Dataframe ,但它可以毫无错误地转换为数字,您只需对整个批次调用
df.astype(float)
。如果你有混合的非数值列,这就更困难了,因为这样的列无论如何都不能使用,只要删除它们,然后对剩余的列调用
astype(float)
。