pandas 训练RNN/LSTM模型得到的KeyError等于val的长度

bihw5rsg  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(132)

试图训练这个模型

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

length = 60
n_features = X_train_s.shape[1]
batch_size = 1

early_stop = EarlyStopping(monitor = 'val_accuracy', mode = 'max', verbose = 1, patience = 5)

generator = TimeseriesGenerator(data = X_train_s, 
                                targets = Y_train[['TARGET_KEEP_LONG', 
                                                   'TARGET_KEEP_SHORT', 
                                                   'TARGET_STAY_FLAT']], 
                                length = length, 
                                batch_size = batch_size)

RNN_model = Sequential()
RNN_model.add(LSTM(180, activation = 'relu', input_shape = (length, n_features)))
RNN_model.add(Dense(3))
RNN_model.compile(optimizer = 'adam', loss = 'binary_crossentropy')

validation_generator = TimeseriesGenerator(data = X_test_s, 
                                           targets = Y_test[['TARGET_KEEP_LONG', 
                                                             'TARGET_KEEP_SHORT', 
                                                             'TARGET_STAY_FLAT']], 
                                           length = length, 
                                           batch_size = batch_size)

RNN_model.fit(generator, 
              epochs=20, 
              validation_data = validation_generator,
              callbacks = [early_stop])

我收到错误消息“KeyError:60”,其中60实际上是变量“length”的值(如果我改变它,错误也会相应地改变)。
训练数据集的形状为

X_test_s.shape
(114125, 89)

对于X_train_s.shape以及n_features == 89也是如此。

rlcwz9us

rlcwz9us1#

由于错误信息不佳且容易误导,查找原因让人筋疲力尽。总之,问题出在目标数据集表单上,TimeseriesGenerator不接受panda Dataframe ,只接受np.arrays。因此,

generator = TimeseriesGenerator(data = X_train_s, 
                                targets = Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT',                                                    'TARGET_STAY_FLAT']], length = length, batch_size = batch_size)

应该写成

generator = TimeseriesGenerator(X_train_s, pd.DataFrame.to_numpy(Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT', 'TARGET_STAY_FLAT']]), length=length, batch_size=batch_size)

在只有一个目标的情况下,

generator = TimeseriesGenerator(data = X_train_s, targets = Y_train['TARGET_KEEP_LONG'], length = length, batch_size = batch_size)

只有一层方括号,而不是两层

相关问题