描述bug
我正在运行ludwig 0.6.4和mlflow 2.1.1,我收到一个关于ludwig由于某些mlflow限制而无法记录元数据的警告。
重现问题
from ludwig.contribs import MlflowCallback
ludwig_config = {}
model = LudwigModel(config=ludwig_config, callbacks=[MlflowCallback(tracking_uri=MLFLOW_URL)])
model.train(data)
预期行为
成功将元数据记录到MLflow中。
截图
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Tag value '[{"run_id": "1ec3b4d585774bc683804a805da0fa82", "artifact_path": "model", "utc_time_created": "2023-02-03 15:15:43.143908", "flavors": {"python_function": {"env": "conda.yaml", "loader_module": "ludwig.contribs.mlflow.model", "python_version": "3.9.1' had length 6364, which exceeded length limit of 5000
2023/02/03 15:19:16 WARNING mlflow.models.model: Logging model metadata to the tracking server has failed, possibly due older server version. The model artifacts have been logged successfully under production-mlflow-artifacts/7/1ec3b4d585774bc683804a805da0fa82/artifacts. In addition to exporting model artifacts, MLflow clients 1.7.0 and above attempt to record model metadata to the tracking store. If logging to a mlflow server via REST, consider upgrading the server version to MLflow 1.7.0 or above. Set logging level to DEBUG via `logging.getLogger("mlflow").setLevel(logging.DEBUG)` to see the full traceback.
以及回溯信息
Traceback (most recent call last):
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/models/model.py", line 489, in log
mlflow.tracking.fluent._record_logged_model(mlflow_model)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 985, in _record_logged_model
MlflowClient()._record_logged_model(run_id, mlflow_model)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/client.py", line 1370, in _record_logged_model
self._tracking_client._record_logged_model(run_id, mlflow_model)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 404, in _record_logged_model
self.store.record_logged_model(run_id, mlflow_model)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 325, in record_logged_model
self._call_endpoint(LogModel, req_body)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 281, in call_endpoint
response = verify_rest_response(response, endpoint)
File "/Users/dragos/opt/anaconda3/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 207, in verify_rest_response
raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Tag value '[{"run_id": "1ec3b4d585774bc683804a805da0fa82", "artifact_path": "model", "utc_time_created": "2023-02-03 15:15:43.143908", "flavors": {"python_function": {"env": "conda.yaml", "loader_module": "ludwig.contribs.mlflow.model", "python_version": "3.9.1' had length 6364, which exceeded length limit of 5000
环境信息(请填写以下信息):
- OS: [MacOS]
- 版本[13.2]
- Python版本 3.9.1
- Ludwig版本 0.6.4
- MLflow版本 2.1.1
附加上下文
与MLflow方面不会修复的相关问题 mlflow/mlflow#2892
我也可以帮助解决这个问题。
5条答案
按热度按时间x759pob21#
你好,@dragosmc,感谢你标记这个问题!实际上,这也是一个我们自己已经发现的问题,我们将在未来的工作中修复它。
目前,为了解除你的限制,你能将MLFlow降级到1.30.0吗?这个应该可以工作——如果可以的话,请告诉我
uqzxnwby2#
我也可以帮忙解决这个问题。
@dragosmc 如果你想在Ludwig中尝试修复这个问题,那将是非常棒的。
h79rfbju3#
你好,@dragosmc,感谢你指出这个问题!实际上,这也是一个我们自己已经发现的问题,我们将在未来的版本中修复它。
目前,为了解决这个问题,你能将MLFlow降级到1.30.0吗?这个方法应该可以——如果有效的话,请告诉我
不幸的是,我无法降级到1.30,但我很乐意帮助解决这个问题——我会仔细查看代码并尽快提交一个PR。
谢谢。
w6lpcovy4#
感谢@dragosmc!所有相关的代码都包含在https://github.com/ludwig-ai/ludwig/blob/master/ludwig/contribs/mlflow/init.py#L38中。
lhcgjxsq5#
我尝试了这个问题,经过我的挖掘,我相信问题出在MLflow上。从我看到的来看,ludwig调用了
Model.log()
,然后根据需要对数据进行拆分/处理。此外,错误信息具有误导性,因为异常是在
/2.0/mlflow/runs/log-model
调用期间引发的,而不是专门创建标签。我需要深入研究MLflow本身,以了解在记录日志时,json有效负载是如何被分割成标签和无标签的,但就目前而言,我无法使这个方法与2.1.1或1.30.0版本一起工作。