我正在尝试使用我开发的CNN架构来训练不平衡数据模型并评估其性能。
这是我的代码:
# Import necessary libraries
from sklearn.utils import shuffle
import numpy as np
import tensorflow as tf
# Convert one-hot encoded labels back to their integer form
train_labels_copy = np.argmax(train_labels, axis=1)
# Split data into airplane/car and others
airplane_car_indices = np.where((train_labels_copy == 0) | (train_labels_copy == 1))[0]
other_indices = np.where((train_labels_copy != 0) & (train_labels_copy != 1))[0]
# Separate airplane/car and other images and labels
airplane_car_images = train_images[airplane_car_indices]
airplane_car_labels = train_labels[airplane_car_indices] # use train_labels instead of train_labels_copy
other_images = train_images[other_indices]
other_labels = train_labels[other_indices] # use train_labels instead of train_labels_copy
# Shuffle the airplane/car indices
np.random.shuffle(airplane_car_indices)
# Calculate 20% of the airplane and car class data
remove_n = int(0.2 * len(airplane_car_indices))
# Keep only 20% of airplane and car class data
airplane_car_images = airplane_car_images[:remove_n]
airplane_car_labels = airplane_car_labels[:remove_n]
# Combine imbalanced airplane/car and other data
train_images_imbalanced = np.concatenate((airplane_car_images, other_images))
train_labels_imbalanced = np.concatenate((airplane_car_labels, other_labels))
# Shuffle the imbalanced data
train_images_imbalanced, train_labels_imbalanced = shuffle(train_images_imbalanced, train_labels_imbalanced)
# Train the same model on the imbalanced data
history_imbalanced = cnn.fit(train_images_imbalanced, train_labels_imbalanced, batch_size=32, epochs=20, validation_data=(test_images, test_labels))
# Plot the loss and accuracy graphs
plot_loss_and_accuracy(history_imbalanced)
# Predict the test data
predictions_imbalanced = cnn.predict(test_images)
# Convert prediction probabilities to class labels
predictions_imbalanced = np.argmax(predictions_imbalanced, axis=1)
# Print the classification report
print(classification_report(np.argmax(test_labels, axis=1), predictions_imbalanced, target_names=class_names))
错误是:
Epoch 1/20
1312/1313 [============================>.] - ETA: 0s - loss: 0.4442 - accuracy: 0.8443
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], line 37
34 train_images_imbalanced, train_labels_imbalanced = shuffle(train_images_imbalanced, train_labels_imbalanced)
36 # Train the same model on the imbalanced data
---> 37 history_imbalanced = cnn.fit(train_images_imbalanced, train_labels_imbalanced, batch_size=32, epochs=20, validation_data=(test_images, test_labels))
39 # Plot the loss and accuracy graphs
40 plot_loss_and_accuracy(history_imbalanced)
File c:\Python311\Lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~\AppData\Local\Temp\__autograph_generated_filesf8xdq9z.py:15, in outer_factory..inner_factory..tf__test_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
...
File "c:\Python311\Lib\site-packages\keras\backend.py", line 5559, in categorical_crossentropy
target.shape.assert_is_compatible_with(output.shape)
我尝试解决这个问题,使用Keras utils中的to_categorical
函数将train_labels_imbalanced
中的标签转换为分类格式。我替换了以下行:
train_labels_imbalanced = np.concatenate((airplane_car_labels, other_labels))
其中:
train_labels_imbalanced = np.concatenate((to_categorical(airplane_car_labels, num_classes=10), other_labels))
但我得到了这个错误:
ValueError Traceback (most recent call last)
Cell In[14], line 31
29 # Combine imbalanced airplane/car and other data
30 train_images_imbalanced = np.concatenate((airplane_car_images, other_images))
---> 31 train_labels_imbalanced = np.concatenate((to_categorical(airplane_car_labels, num_classes=10), other_labels))
33 # Shuffle the imbalanced data
34 train_images_imbalanced, train_labels_imbalanced = shuffle(train_images_imbalanced, train_labels_imbalanced)
File <__array_function__ internals>:200, in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 2 dimension(s)
1条答案
按热度按时间kx5bkwkv1#
我怀疑您的验证数据形状出现了问题:history_imbalanced = cnn.fit(train_images_imbalanced,train_labels_imbalanced,batch_size=32,epochs=20,validation_data=(test_images,test_labels))
那么在这里,你如何创建test_images和test_labels呢?正如你在第一个日志中看到的,epoch训练已经完成,但是当开始评估验证数据时,错误(可能你忘记在test_labels上应用hot_coding)被提升了。