pytorch 加快Google Colab上的数据集加载速度

6qqygrtg 于 2023-06-23 发布在 Go

关注(0)|答案(2)|浏览(212)

我正在使用Pytorch在Google Colab上对德国交通标志数据集进行图像分类。下面是数据集的结构：

GTSRB
培训
00000/
*.ppmm
...
00043/
*.ppmm
测试
*.ppmm
...
labels.csv

我已经设法将整个数据集上传到我的驱动器（花了很长时间！！！）。我已经使用ImageFolder类和Dataset类分别加载训练图像和测试图像。
然而，训练我的模型真的很慢，GPU没有有效地使用。经过多次搜索，我发现从驱动器到Colab的文件传输在这里是错误的。
有谁知道我如何使用hd 5数据集（或其他技术）首先存储所有训练和测试图像，以便以后的预处理？

pytorch

来源：https://stackoverflow.com/questions/54049440/speed-up-datasets-loading-on-google-colab

2条答案

按热度按时间

fcg9iug31#

如果您的问题确实是Colab和Drive之间的网络速度问题，您应该尝试将文件直接上传到Google Colab示例，而不是从Drive访问它们。

from google.colab import files
dataset_file_dict = files.upload()

这样做会将文件直接保存到Colab示例中，允许代码在本地访问文件。
然而，我怀疑除了网络延迟之外，可能还有其他问题--也许你的模型有很多参数，或者代码中有一个错误，无法运行CUDA。有时我会忘记在“运行时”菜单选项卡“更改运行时类型”下将我的运行时更改为GPU运行时。
希望这有帮助！

赞(0）回复(0）举报 2023-06-23

brvekthn2#

下面的代码将从Google云端硬盘复制一个文件夹到Colab VM。（您需要像往常一样授权驱动器共享。）与在训练期间使用驱动器挂载相比，这大大缩短了模型训练时间。
我相信复制压缩文件，然后在目的地解压缩，可以进一步改善复制时间-我没有在这里添加。

import os
import shutil

from google.colab import drive
drive.mount('/content/drive')

def copy_files_recursive(source_folder, destination_folder):
    for root, dirs, files in os.walk(source_folder):
        for file in files:
            source_path = os.path.join(root, file)
            destination_path = os.path.join(destination_folder, os.path.relpath(source_path, source_folder))
            
            # Create destination directories if they don't exist
            os.makedirs(os.path.dirname(destination_path), exist_ok=True)
            
            shutil.copyfile(source_path, destination_path)

source_folder = '/content/drive/My Drive/xxx_folder'
destination_folder = '/content/xxx_folder'
copy_files_recursive(source_folder, destination_folder)

赞(0）回复(0）举报 2023-06-23

我来回答

pytorch 加快Google Colab上的数据集加载速度

2条答案

相关问题

热门标签

最新问答