我尝试在本地Python Notebook中使用datasets
Python模块加载数据集。我运行的是Python 3.10.13
内核,就像我在虚拟环境中运行的一样。
我无法加载我从教程中遵循的数据集。错误如下:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
/Users/ari/Downloads/00-fine-tuning.ipynb Celda 2 line 3
1 from datasets import load_dataset
----> 3 data = load_dataset(
4 "jamescalam/agent-conversations-retrieval-tool",
5 split="train"
6 )
7 data
File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/load.py:2149, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
2145 # Build dataset for splits
2146 keep_in_memory = (
2147 keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
2148 )
-> 2149 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
2150 # Rename and cast features to match task schema
2151 if task is not None:
2152 # To avoid issuing the same warning twice
File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/builder.py:1173, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
1171 is_local = not is_remote_filesystem(self._fs)
1172 if not is_local:
-> 1173 raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
1174 if not os.path.exists(self._output_dir):
1175 raise FileNotFoundError(
1176 f"Dataset {self.dataset_name}: could not find data in {self._output_dir}. Please make sure to call "
1177 "builder.download_and_prepare(), or use "
1178 "datasets.load_dataset() before trying to access the Dataset object."
1179 )
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
字符串
我该如何解决?我不明白这个错误是如何适用的,因为数据集是我正在获取的东西,因此首先不能缓存在我的LocalFileSystem中。
1条答案
按热度按时间vh0rcniy1#
试着做:
字符串
此错误源于fsspec中的一个重大更改。它已在最新的数据集版本(2.14.6)中得到修复。使用pip install -U数据集更新安装应该可以修复此问题。
git link:https://github.com/huggingface/datasets/issues/6352
如果你使用
fsspec
,那么:型
fsspec==2023.10.0
有问题git link:https://github.com/huggingface/datasets/issues/6330