使用来自Devops的自定义环境运行Azure ML作业时出现磁盘已满错误

wmomyfyw 于 2022-11-17 发布在其他

关注(0)|答案(1)|浏览(139)

bounty将在4天后过期。回答此问题可获得+50的声望奖励。L_Jay正在寻找来自知名来源的答案：我已尝试在Azure ML Studio中增加计算示例，但错误仍然存在。没有VM，仅使用计算示例和来自DevOps端的Azure代理来启动培训作业。

我需要一个解决方案来缓解磁盘已满错误，以便在培训作业中使用Azure ML Studio中的自定义环境。
我在使用从Azure DevOps启动的Azure ML SDK运行模型培训作业时遇到磁盘已满错误。我在Azure ML工作区中创建了自定义环境并使用了它。
我正在Azure DevOps中使用Azure CLI任务启动这些培训作业。如何解决磁盘已满问题？
DevOps培训任务中显示的错误消息：

"error": {
        "code": "UserError",
        "message": "{\"Compliant\":\"Disk full while running job. Please consider reducing amount of data accessed, or upgrading VM SKU. Total space: 14045 MB, available space: 1103 MB.\"}\n{\n  \"code\": \"DiskFullError\",\n  \"target\": \"\",\n  \"category\": \"UserError\",\n  \"error_details\": []\n}",
        "messageParameters": {},
        "details": []
    },

培训作业的.runconfig文件：

framework: Python
 script: cnn_training.py
 communicator: None
 autoPrepareEnvironment: true
 maxRunDurationSeconds:
 nodeCount: 1
 environment:
   name: cnn_training
   python:
     userManagedDependencies: true
     interpreterPath: python
   docker:
     enabled: true
     baseImage: 54646eeace594cf19143dad3c7f31661.azurecr.io/azureml/azureml_b17300b63a1c2abb86b2e774835153ee
     sharedVolumes: true
     gpuSupport: false
     shmSize: 2g
     arguments: []
 history:
   outputCollection: true
   snapshotProject: true
   directoriesToWatch:
   - logs
 dataReferences:
   workspaceblobstore:
     dataStoreName: workspaceblobstore
     pathOnDataStore: dataname
     mode: download
     overwrite: true
     pathOnCompute:

是否需要针对磁盘已满问题进行其他配置？是否需要在.runconfig文件中进行任何更改？

Azure

来源：https://stackoverflow.com/questions/74360262/disk-full-error-when-running-azure-ml-jobs-using-custom-environemnts-from-devops