我正在使用SubsetRandomSampler将分类数据集拆分为测试和验证。我们可以为每个类拆分数据集吗?
import numpy as np
import torch
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler
train_transforms = transforms.Compose([transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
dataset = datasets.ImageFolder( '/data/images/train', transform=train_transforms )
validation_split = .2
shuffle_dataset = True
random_seed= 42
batch_size = 20
dataset_size = len(dataset) #4996
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler)
validation_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=valid_sampler)
2条答案
按热度按时间vqlkdk9b1#
你的意思是培训和验证而不是测试和验证吗?
如果是这样,
SubsetRandomSampler
使用从索引中随机选择的样本。因此,您可以在将它们放入train_indices
和val_indices
之前随机拆分每个类的索引。喜欢
7qhs6swi2#
“#你不能使用
[[]] * len(dataset.classes)
.虽然可能有更好的方法,但我不知道”[[]*len(dataset.classes)]工作。