ludwig MLflow集成无法记录元数据

kb5ga3dv  于 5个月前  发布在  其他
关注(0)|答案(5)|浏览(53)

描述bug

我正在运行ludwig 0.6.4和mlflow 2.1.1,我收到一个关于ludwig由于某些mlflow限制而无法记录元数据的警告。

重现问题

from ludwig.contribs import MlflowCallback

ludwig_config = {}
model = LudwigModel(config=ludwig_config, callbacks=[MlflowCallback(tracking_uri=MLFLOW_URL)])
model.train(data)

预期行为

成功将元数据记录到MLflow中。

截图

mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Tag value '[{"run_id": "1ec3b4d585774bc683804a805da0fa82", "artifact_path": "model", "utc_time_created": "2023-02-03 15:15:43.143908", "flavors": {"python_function": {"env": "conda.yaml", "loader_module": "ludwig.contribs.mlflow.model", "python_version": "3.9.1' had length 6364, which exceeded length limit of 5000
2023/02/03 15:19:16 WARNING mlflow.models.model: Logging model metadata to the tracking server has failed, possibly due older server version. The model artifacts have been logged successfully under production-mlflow-artifacts/7/1ec3b4d585774bc683804a805da0fa82/artifacts. In addition to exporting model artifacts, MLflow clients 1.7.0 and above attempt to record model metadata to the tracking store. If logging to a mlflow server via REST, consider upgrading the server version to MLflow 1.7.0 or above. Set logging level to DEBUG via `logging.getLogger("mlflow").setLevel(logging.DEBUG)` to see the full traceback.

以及回溯信息

Traceback (most recent call last):
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/models/model.py", line 489, in log
    mlflow.tracking.fluent._record_logged_model(mlflow_model)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 985, in _record_logged_model
    MlflowClient()._record_logged_model(run_id, mlflow_model)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/client.py", line 1370, in _record_logged_model
    self._tracking_client._record_logged_model(run_id, mlflow_model)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 404, in _record_logged_model
    self.store.record_logged_model(run_id, mlflow_model)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 325, in record_logged_model
    self._call_endpoint(LogModel, req_body)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 281, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 207, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Tag value '[{"run_id": "1ec3b4d585774bc683804a805da0fa82", "artifact_path": "model", "utc_time_created": "2023-02-03 15:15:43.143908", "flavors": {"python_function": {"env": "conda.yaml", "loader_module": "ludwig.contribs.mlflow.model", "python_version": "3.9.1' had length 6364, which exceeded length limit of 5000

环境信息(请填写以下信息):

  • OS: [MacOS]
  • 版本[13.2]
  • Python版本 3.9.1
  • Ludwig版本 0.6.4
  • MLflow版本 2.1.1
    附加上下文

与MLflow方面不会修复的相关问题 mlflow/mlflow#2892
我也可以帮助解决这个问题。

x759pob2

x759pob21#

你好,@dragosmc,感谢你标记这个问题!实际上,这也是一个我们自己已经发现的问题,我们将在未来的工作中修复它。
目前,为了解除你的限制,你能将MLFlow降级到1.30.0吗?这个应该可以工作——如果可以的话,请告诉我


# 代码块格式示例

def downgrade_mlflow():
    # 将MLFlow降级到1.30.0的代码逻辑
    pass
uqzxnwby

uqzxnwby2#

我也可以帮忙解决这个问题。
@dragosmc 如果你想在Ludwig中尝试修复这个问题,那将是非常棒的。

h79rfbju

h79rfbju3#

你好,@dragosmc,感谢你指出这个问题!实际上,这也是一个我们自己已经发现的问题,我们将在未来的版本中修复它。

目前,为了解决这个问题,你能将MLFlow降级到1.30.0吗?这个方法应该可以——如果有效的话,请告诉我

不幸的是,我无法降级到1.30,但我很乐意帮助解决这个问题——我会仔细查看代码并尽快提交一个PR。

谢谢。

w6lpcovy

w6lpcovy4#

感谢@dragosmc!所有相关的代码都包含在https://github.com/ludwig-ai/ludwig/blob/master/ludwig/contribs/mlflow/init.py#L38中。

lhcgjxsq

lhcgjxsq5#

我尝试了这个问题,经过我的挖掘,我相信问题出在MLflow上。从我看到的来看,ludwig调用了Model.log(),然后根据需要对数据进行拆分/处理。
此外,错误信息具有误导性,因为异常是在/2.0/mlflow/runs/log-model调用期间引发的,而不是专门创建标签。
我需要深入研究MLflow本身,以了解在记录日志时,json有效负载是如何被分割成标签和无标签的,但就目前而言,我无法使这个方法与2.1.1或1.30.0版本一起工作。

相关问题