Ludwig在保存过程中无法优雅地处理空分区,

cmssoen2  于 5个月前  发布在  其他
关注(0)|答案(1)|浏览(63)

如果在数据集拆分和预处理后,有空的DataFrame分区(在使用Ray/Dask后端进行训练时),Ray会抛出以下错误。

E                       ray.exceptions.RayTaskError(AssertionError): ray::_get_read_tasks() (pid=10328, ip=127.0.0.1)
E                         File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39_fresh/lib/python3.9/site-packages/ray/data/read_api.py", line 1136, in _get_read_tasks
E                           reader = ds.create_reader(**kwargs)
E                         File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39_fresh/lib/python3.9/site-packages/ray/data/datasource/parquet_datasource.py", line 167, in create_reader
E                           return _ParquetDatasourceReader(**kwargs)
E                         File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39_fresh/lib/python3.9/site-packages/ray/data/datasource/parquet_datasource.py", line 230, in __init__
E                           self._encoding_ratio = self._estimate_files_encoding_ratio()
E                         File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39_fresh/lib/python3.9/site-packages/ray/data/datasource/parquet_datasource.py", line 318, in _estimate_files_encoding_ratio
E                           sample_ratios = ray.get(futures)
E                       ray.exceptions.RayTaskError(AssertionError): ray::_sample_piece() (pid=10352, ip=127.0.0.1)
E                         File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39_fresh/lib/python3.9/site-packages/ray/data/datasource/parquet_datasource.py", line 437, in _sample_piece
E                           assert num_rows > 0 and metadata.num_rows > 0, (
E                       AssertionError: Sampled number of rows: 0 and total number of rows: 0 should be positive

复现

请使用 num_examples=20npartitions=10 运行以下单元测试。

pytest -xsrP tests/integration_tests/test_preprocessing.py::test_dask_known_divisions
  • 操作系统:macOS
  • 版本:12.3.1
  • Python版本:3.9
  • Ludwig版本:0.6.dev0
  • Ray版本:夜间版(2022年7月28日)
72qzrwbm

72qzrwbm1#

在这篇PR中添加一个更永久的解决方案:#2328

相关问题