tensorflow 如何将mnist数据集拆分为更小的大小并添加扩展？

avwztpqn 于 2023-01-21 发布在其他

关注(0)|答案(1)|浏览(169)

我有这个问题分裂mnist数据集+添加增强数据。我想只采取总数为22000（包括训练+测试集）的数据从mnist数据集是70000。mnist数据集有10个标签。我只使用剪切，旋转，宽度移动，和高度移动的增强方法。
训练集--〉20000（总计）--〉20张图像+ 1980张增强图像（每个标签）
测试集--〉2000（总计）--〉200张图像（每个标签）
我还想确保在拆分中保留类别分布。
我真的很困惑如何分割这些数据。如果有人能提供代码会很高兴。
我试过这个代码：

# Load the MNIST dataset
(x_train_full, y_train_full), (x_test_full, y_test_full) = keras.datasets.mnist.load_data()

# Normalize the data
x_train_full = x_train_full / 255.0
x_test_full = x_test_full / 255.0

# Create a data generator for data augmentation
data_gen = ImageDataGenerator(shear_range=0.2, rotation_range=20,
                              width_shift_range=0.2, height_shift_range=0.2)

# Initialize empty lists for the training and test sets
x_train, y_train, x_test, y_test = [], [], [], []

# Loop through each class/label
for class_n in range(10):
    # Get the indices of the images for this class
    class_indices = np.where(y_train_full == class_n)[0]

    # Select 20 images for training
    train_indices = np.random.choice(class_indices, 20, replace=False)

    # Append the training images and labels to the respective lists
    x_train.append(x_train_full[train_indices])
    y_train.append(y_train_full[train_indices])

    # Select 200 images for test
    test_indices = np.random.choice(class_indices, 200, replace=False)

    # Append the test images and labels to the respective lists
    x_test.append(x_test_full[test_indices])
    y_test.append(y_test_full[test_indices])

    # Generate 100 augmented images for training
    x_augmented = data_gen.flow(x_train_full[train_indices], y_train_full[train_indices], batch_size=100)

    # Append the augmented images and labels to the respective lists
    x_train.append(x_augmented[0])
    y_train.append(x_augmented[1])

# Concatenate the list of images and labels to form the final training and test sets
x_train = np.concatenate(x_train)
y_train = np.concatenate(y_train)
x_test = np.concatenate(x_test)
y_test = np.concatenate(y_test)

print("training set shape: ", x_train.shape)
print("training label shape: ", y_train.shape)
print("test set shape: ", x_test.shape)
print("test label shape: ", y_test.shape)

但它一直这样说错误：

IndexError: index 15753 is out of bounds for axis 0 with size 10000

tensorflow

来源：https://stackoverflow.com/questions/75175819/how-to-split-mnist-dataset-into-smaller-size-and-adding-augmentation-to-it

1条答案

按热度按时间

abithluo1#

你混合了训练集和测试集，在循环中，你从训练集中得到class_indices：

# Get the indices of the images for this class
class_indices = np.where(y_train_full == class_n)[0]

但随后您将使用这些训练索引（可能是大于10000的数字！）来寻址测试集中的索引（只有10000个样本），再往下一些行：

# Select 200 images for test
test_indices = np.random.choice(class_indices, 200, replace=False)

因此，您将需要在测试集的循环中对标签执行相同的索引选择，它应该会成功。

赞(0）回复(0）举报 2023-01-21

我来回答

tensorflow 如何将mnist数据集拆分为更小的大小并添加扩展？

1条答案

相关问题

热门标签

最新问答