numpy enc时获取值错误,transform,其中enc是OneHotEncoder(sparse_output=False),在pandas中

vnjpjtjt  于 2023-04-30  发布在  其他
关注(0)|答案(1)|浏览(84)

我有一个timeseries数据集名称临时有4列;日期、分钟、问题、原因编号其中;

temp['REASON NO'].value_counts()

显示以下输出。

R13    158
R14    123
R4     101
R7      81
R2      40
R3      35
R5      31
R8      11
R15      9
R12      3
R6       2
R10      2
R9       1

我之前运行过这段代码,运行良好;

reason_no = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1))

***在建立模型之后。我想预测分钟、问题、原因编号的值。下个星期。**我试过这个代码;

seq_length=7
last_week = df.iloc[-seq_length:, :]
last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))
last_issue = enc.transform(last_week['Issue'].values.reshape(-1, 1))
last_minutes = scaler.transform(last_week['Minutes'].values.reshape(-1, 1))
last_X = np.hstack([last_reason_no, last_issue, last_minutes])
next_X = last_X.reshape(1, last_X.shape[0], last_X.shape[1])
for i in range(7):
    pred = model.predict(next_X)
    pred_minutes = scaler.inverse_transform(pred[:, 2].reshape(-1, 1))[0][0]
    pred_issue = enc.inverse_transform([np.argmax(pred[:, 1])])[0]
    pred_reason_no = enc.inverse_transform([np.argmax(pred[:, 0])])[0]
    print(f'Date: {last_week.iloc[-1, 0]}')
    print(f'Predicted Reason Number: {pred_reason_no}')
    print(f'Predicted Issue: {pred_issue}')
    print(f'Predicted Minutes: {pred_minutes}')

但是当我运行这段代码时,我得到了一个错误:
ValueError Traceback(最近一次调用)
in〈cell line:1〉()----〉1 last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1,1))
2帧
/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self,X,handle_unknown,force_all_finite,warn_on_unknown)172“during transform”.format(diff,i)173)--〉174 raise ValueError(msg)175 else:176如果warn_on_unknown:

**ValueError:在转换过程中在列0中发现未知类别['R5','R4']。**善意地寻求帮助,以了解为什么我得到这个错误,以及如何修复它。

brjng4g3

brjng4g31#

您无法对转换过程中从未见过的类别进行编码:

from sklearn.preprocessing import OneHotEncoder

# Something like X_train, X_test = test_train_split(X, ...)
X_train = pd.DataFrame({'REASON NO': ['R13', 'R14', 'R7']})
X_test = pd.DataFrame({'REASON NO': ['R4', 'R7', 'R5']})

enc = OneHotEncoder()

输出:

>>> enc.fit_transform(X_train).toarray()
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

>>> enc.transform(X_test)
...
ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform

相关问题