生成庞然大物语料库时出错

wkyowqbh 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(281)

我是hadoop和庞然大物的新手，我遵循了关于hadoop和庞然大物的教程https://github.com/digitalpebble/behemoth/wiki/tutorial 要为文本文档生成庞然大物语料库，请使用以下命令：
sudo bin/hadoop jar/home/madhumita/behemoth/core/target/behemoth core-*-job.jar com.digitalpebble.behemoth.util.corpusgenerator-i/home/madhumita/documents/testfile-o/home/madhumita/behemoth/testgateopcorts
我得到一个错误：
错误util.corpusgenerator:输入不存在：/home/madhumita/documents/testfile
每次运行该命令时，尽管我已经用gedit检查了路径是否正确。我在网上搜索过任何类似的问题，但没有找到。你知道为什么会这样吗？如果.txt文件格式不可接受，那么所需的文件格式是什么？

hadoop behemoth

来源：https://stackoverflow.com/questions/15470103/error-in-generating-behemoth-corpus

2条答案

按热度按时间

fcwjkofz1#

要直接从本地文件系统生成庞然大物语料库，请使用文件协议引用它。 (file:///) ```
hadoop jar core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i "file:///home/madhumita/Documents/testFile/test.txt" -o "/docs/behemoth/test"

赞(0）回复(0）举报 2021-06-03

enyaitl32#

好吧，我设法解决了这个问题。所需的输入路径是hadoop分布式文件系统上的文件路径，而不是本地计算机上的路径。
因此，首先我将本地文件复制到hdfs上的/data/test.txt，并将此路径作为输入参数。命令如下：

sudo bin/hadoop fs -copyFromLocal /home/madhumita/Documents/testFile/test.txt /docs/test.txt

    sudo bin/hadoop jar /home/madhumita/behemoth/core/target/behemoth-core-*-job.jar com.digitalpebble.behemoth.util.CorpusGenerator -i /docs/test.txt -o /docs/behemoth/test

这就解决了问题。感谢所有试图解决问题的人。

赞(0）回复(0）举报 2021-06-03

我来回答

生成庞然大物语料库时出错

2条答案

相关问题

热门标签

最新问答