pytorch 为什么加载IMDB数据集时出错

ykejflvf 于 2023-10-20 发布在其他

关注(0)|答案(1)|浏览(165)

from torchtext.datasets import WikiText2, IMDB
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

tkzer = get_tokenizer('basic_english')

tr_iter = WikiText2(split='train')
vocabulary = build_vocab_from_iterator(map(tkzer, tr_iter), specials=['<unk>'])

tr_iter_imdb = IMDB(split='train')
vocabulary = build_vocab_from_iterator(map(tkzer, tr_iter_imdb), specials=['<unk>'])

WikiText2的代码运行良好。但是当涉及到IMDB时，我在运行 build_vocab_from_iterator 时得到以下错误。
“tuple”对象没有属性“lower”
有人能帮我理解为什么会这样吗？我认为这与IMDB数据结构不同，不同于WikiText2。在这种情况下，我如何为IMDB数据集构建vocab。

pytorch

来源：https://stackoverflow.com/questions/77148629/why-am-i-getting-an-error-while-loading-imdb-dataset

1条答案

按热度按时间

ghg1uchk1#

IMDB()返回一个包含int和str的元组：

IMDB Dataset

For additional details refer to http://ai.stanford.edu/~amaas/data/sentiment/

Number of lines per split:

train: 25000
test: 25000
Args:
    root: Directory where the datasets are saved. Default: os.path.expanduser('~/.torchtext/cache')
    split: split or splits to be returned. Can be a string or tuple of strings. Default: (train, test)

:returns: DataPipe that yields tuple of label (1 to 2) and text containing the movie review
:rtype: (int, str)

我建议你检查元组中的文本是否是你想要的，然后更新你的map函数，如下所示：map(lambda x : tkzer(x[1]),tr_iter_imdb)

赞(0）回复(0）举报 2023-10-20

我来回答

pytorch 为什么加载IMDB数据集时出错

1条答案

相关问题

热门标签

最新问答