我正在使用CNN来分类苹果类型。我在训练数据上取得了很高的准确率,但在测试数据上的准确率很低。数据被分成80:20。我不确定我的数据是否过拟合。
我有2个文件夹,分别包含TraningData
和TestData
,每个文件夹都有4
子文件夹braeburn, red_apple, red_delicious, rotten
(包含相应的图片)。
TRAIN_DIR = 'apple_fruit'
TEST_DIR = 'apple_fruit'
classes = ['braeburn','red_apples','red_delicious','rotten'] train_datagen = ImageDataGenerator(rescale = 1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory(TRAIN_DIR,
shuffle=True,
target_size = (100,100),
batch_size = 25,
classes =['braeburn','red_apples','red_delicious','rotten'])
test_set= test_datagen.flow_from_directory(TEST_DIR,
target_size = (100, 100),
shuffle=True,
batch_size = 25,classes = classes)
model =Sequential()
model.add(Conv2D(filters=128, kernel_size=(3,3),input_shape=(100,100,3), activation='relu', padding
= 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=16, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.6))
model.add(Dense(4,activation='softmax'))
model.compile(optimizer ='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
history = model.fit(x=training_set,#y=training_set.labels,
steps_per_epoch=len(training_set),
epochs =10)
model.save('Ripe2_model6.h5') # creates a HDF5 file 'my_model.h5'
model_path = "Ripe2_model6.h5"
loaded_model = keras.models.load_model(model_path)
classes = ['braeburn','red_apples','red_delicious','rotten']
predictions = model.predict(x=test_set, steps=len(test_set), verbose=True)
pred = np.round(predictions)
y_true=test_set.classes
y_pred=np.argmax(pred, axis=-1)
> cm = confusion_matrix(y_true=test_set.classes, y_pred=np.argmax(pred, axis=-1))
test_set.classes
np.argmax(pred, axis=-1)
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title,color = 'white')
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45,color = 'white')
plt.yticks(tick_marks, classes,color = 'white')
target_names = ['braeburn','red_apples','red_delicious','rotten']
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
for i, j in itertools.product(range(cm.shape[0]),
range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
else:
plt.text(j, i, "{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label',color = 'white')
plt.xlabel('Predicted label',color = 'white')
cm_plot_labels = ['braeburn','red_apples','red_delicious','rotten']
plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title='Confusion Matrix')
print(accuracy_score(y_true, y_pred))
print(recall_score(y_true, y_pred, average=None))
print(precision_score(y_true, y_pred, average=None))
混淆矩阵:
- 准确度-0.29090909090909
- 召回- [0.23484848 0.32319392 0.151515 0.36213992]
- 精密度- [0.23308271 0.32319392 0.151515 0.36363636]
我已经尝试改变许多功能,但仍然没有进展。
2条答案
按热度按时间jucafojl1#
它表示测试集中的数据与模型学习到的数据有很大不同。要了解它是过拟合还是单个不幸的拆分,请执行以下操作:
1.检查结果是否取决于初始培训/测试分割。要实现此目的,您可以:
1.您是否有足够数量的样本?请尝试添加更多样本并检查其对性能的影响。您还可以应用数据扩充技术。
xytpbqjk2#
如果训练数据的准确度较高,但测试数据的准确度较低,则模型可能会过拟合。原因可能是模型试图捕获包括噪声在内的所有数据点的简单数据集。在上述情况下,请尝试优化参数并设置较高的批处理,执行交叉验证以了解性能并执行数据扩充。