如何在PyTorch中为数据集添加标签？

rggaifut 于 2023-10-20 发布在其他

关注(0)|答案(2)|浏览(165)

我正在尝试使用pytorch + tensorflow创建机器学习模型（GNB和决策树模型）。数据集被分割成png文件格式的图像，每个图像都有一个带有标签的csv文件。每个图像在csv文件中被引用为例如img0001.png。我需要的标签与图像，而不是分开，这样我就可以使用的模型。最好的办法是什么？
“”labels = np.array（labels）labels = labels.reshape（1，-1）
model = GaussianNB（）model.fit（IMG_BLOG，labels）'''
错误：1203“y应该是一个一维数组，得到了一个形状为{}的数组。".format（shape）1204）
ValueError：y应该是一个一维数组，但得到了一个形状为（1，656）的数组。

pytorch

来源：https://stackoverflow.com/questions/77169367/how-to-attach-labels-to-dataset-in-pytorch

2条答案

按热度按时间

euoag5mw1#

我不确定你到底想实现什么，但看起来你想要一个类似于Torch中的数据集类的东西。我将为我以前做的工作链接，希望它能为你工作：
https://github.com/Khaliladib11/INM705-CW-Khalil-Aziz/blob/main/Semnatic%20Segmentation/CityScapes.py

赞(0）回复(0）举报 2023-10-20

u4vypkhs2#

你通常会把这些东西分开，但它们应该是相同的顺序。让X是图像，y是标签，那么你可以这样做：

# allows to concatenate path fragments
from os.path import join
# allows to search for all files of a type
from glob import glob
# allows you to handle csv files
import pandas as pd

IMG_DIR = "my/img/dir/"

# your sorted img files
img_paths = list(sorted(glob(join(IMG_DIR, "*.png")))

def read_img(path):
  # your image load method of choice
  img = ...
  return img

# all loaded imgs
imgs = [read_img(p) for p in img_paths]

# labels
df = pd.read_csv("my.csv", sep=",")
labels = list(sorted(df.sort_values(by="identifier that is the same as the image name")["mylabelcolumn"].tolist()))

# this does not work if there is no 1:1 mapping from imgs to labels
len(labels) == len(img_paths)

# here, label[123] corresponds to img_paths[123] / imgs[123]

注意：有很多方法可以加载图像数据（OpenCV、tensorflow、pytorch、PIL等）。
注意：这将把所有img加载到内存中...这对大数据集不起作用。

赞(0）回复(0）举报 2023-10-20

我来回答

如何在PyTorch中为数据集添加标签？

2条答案

相关问题

热门标签

最新问答