python Azure ML Studio作业资源

t9aqgxwy 于 2023-05-16 发布在 Python

关注(0)|答案(1)|浏览(121)

我目前正在Azure ML Studio中的CommonCrawl新闻数据集上训练Top2Vec ML模型。当在ML Studio本身（在线）中的ipynb Notebook中运行我的Python代码时，CPU被完全使用（100%工作负载），但是当在作业中执行我的脚本作为任务时，CPU利用率（监控）不会超过25%。
我注意到完整JSON作业定义中的“containerInstance”部分包含此容器示例的资源设置，其配置方式如下：

"containerInstance": {
    "region": null,
    "cpuCores": 2,
    "memoryGb": 3.5
}

然而，我不知何故无法启动一个超过2个cpu核心和3.5 GB RAM的作业。我的计算机是STANDARD_F4S_V2示例，具有4个vCPU和8 GB RAM。因此，我希望我的容器示例使用所有可用资源，而不是仅使用50%。
这是我用来训练模型的超参数：

hyperparameters = {
    'min_count': 50,
    'topic_merge_delta': 0.1,
    'embedding_model': 'doc2vec',
    'embedding_batch_size': 32,
    'split_documents': False,
    'document_chunker': 'sequential',
    'chunk_length': 100,
    'chunk_overlap_ratio': 0.5,
    'chunk_len_coverage_ratio': 1,
    'speed': 'learn',
    'use_corpus_file': False,
    'keep_documents': True,
    'workers': 4,
    'verbose': True
}

是否可以编辑containerInstance选项？我看到我可以配置“每个节点的进程数”，但这听起来像我的脚本应该并行执行多少次。

python

来源：https://stackoverflow.com/questions/76198647/azure-ml-studio-job-resources

1条答案

按热度按时间

euoag5mw1#

我终于找到了问题的根源。这不是由于Docker容器示例没有使用所有核心，而是由于我的Python脚本。我的脚本依赖于Python的threading库来确保并行执行，但当时我并不知道GIL（全局解释器锁）只允许一个线程控制Python解释器，这当然让我对Python中的线程有了一点理解。在用multiprocessing库重写我的脚本后，Docker容器示例使用了所有可用的资源。
尽管如此，如果您计划手动定义CPU核心数量和RAM数量，则可以使用下面的Python脚本启动自定义Azure ML作业：

# Install azureml-core package first: pip install azureml-core

from azureml.core import RunConfiguration, Experiment, Workspace, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

workspace = Workspace("<SUBSCRIPTION_ID>", "<RESOURCE_GROUP_NAME>", "<AZURE_ML_WORKSPACE_NAME>")

# 'Default' is the name of the ML experiment, change this if you need to.
experiment = Experiment(workspace, "Default") 
# Define the environment to be used.
env = Environment.get(workspace, name="top2vec-env", version="1") 
# If you have a compute cluster set up enter the cluster name, otherwise comment this line out and replace 'cluster' on line 13 with the name of your compute instance.
cluster = workspace.compute_targets['<COMPUTE_CLUSTER_NAME>']
run_config = RunConfiguration()
# Define the number of CPU cores and the amount of memory to be used by the Docker container instance. 
run_config.docker = DockerConfiguration(use_docker=True, arguments=["--cpus=16", "--memory=128g"], shm_size="64M") 
run_config.environment = env
run_config.target = cluster
run_config.command = "python main_file_of_your_python_script.py"
# Pass the required environment variables to run your script.
run_config.environment_variables = {} 
# Enter the relative or absolute path to your source directory. Everything in it will be uploaded to the computing VM.
config = ScriptRunConfig("<RELATIVE_PATH_TO_SOURCE_DIR>", run_config=run_config) 

script_run = experiment.submit(config)

赞(0）回复(0）举报 2023-05-16

我来回答

python Azure ML Studio作业资源

1条答案

相关问题

热门标签

最新问答