使用Pytorch在单独的文件夹中加载带有图像的train/瓦尔/test数据集

bvn4nwqk  于 2023-04-30  发布在  其他
关注(0)|答案(3)|浏览(144)

对于我的第一个Pytorch项目,我必须使用包含jpg云图像的数据集执行图像分类。我正在努力导入数据,因为训练/验证/测试集没有分开,图像根据它们的类位于不同的文件夹中。因此,文件夹结构看起来像这样:

-dataset_folder
    -Class_1
        img1
        img2
        ...
    -Class_2
        img1
        img2
        ...
    -Class_3
        img1
        img2
        ...
    -Class_4
        img1
        img2
        ...

我看到ImageFolder()类可以处理这种文件夹结构,但我不知道如何将其与将数据集分为3部分相结合。
有人能告诉我怎么做吗?

ttcibm8c

ttcibm8c1#

您可以编写一个自定义Dataset类来加载数据并在项目中使用它:

import os
import glob
import torch
from torch.utils.data import Dataset
from PIL import Image
from torchvision.transforms import ToTensor

class CustomImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.class_folders = [f for f in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, f))]
        self.image_paths = []
        self.labels = []

        for label, class_folder in enumerate(self.class_folders):
            img_paths = glob.glob(os.path.join(root_dir, class_folder, '*.jpg'))
            self.image_paths.extend(img_paths)
            self.labels.extend([label] * len(img_paths))

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, label

查看此链接以了解有关自定义数据集的更多详细信息。 www.example.com
之后,您可以将数据集拆分为任意多个部分。下面是如何使用SubsetRandomSampler将自定义数据集拆分为不同集合的一个很好的答案:How do I split a custom dataset into training and test datasets?

btxsgosb

btxsgosb2#

您可以使用ImageFolder创建数据集,然后将其传递给torch.utils.data.random_split。它将数据集作为输入。

e3bfsja2

e3bfsja23#

你可能指的是这个article或类似的东西,但是解决这个问题的最简单方法是将图像混合到一个文件夹中,创建一个train、validation和test文件夹,然后使用这个:

import os
import random
from shutil import copyfile

source_folder = 'path/to/folder'
train_folder = 'path/to/folder'
validation_folder = 'path/to/folder'
test_folder = 'path/to/folder'
image_filenames = os.listdir(source_folder)
random.shuffle(image_filenames)
num_train_images = int(len(image_filenames) * 0.7)
num_validation_images = int(len(image_filenames) * 0.2)
num_test_images = int(len(image_filenames) * 0.1)

for filename in image_filenames[:num_train_images]:
    source_path = os.path.join(source_folder, filename)
    target_path = os.path.join(train_folder, filename)
    copyfile(source_path, target_path)

for filename in image_filenames[num_train_images: -num_test_images]:
    source_path = os.path.join(source_folder, filename)
    target_path = os.path.join(validation_folder, filename)
    copyfile(source_path, target_path)

for filename in image_filenames[-num_test_images: ]:
    source_path = os.path.join(source_folder, filename)
    target_path = os.path.join(test_folder, filename)
    copyfile(source_path, target_path)

相关问题