mapper类如何在hadoop中将sequencefile标识为inputfile？

zujrkrfu 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(292)

在我的一个mapreduce任务中，我将byteswritable重写为keybyteswritable，并将bytewritable重写为valuebyteswritable。然后我使用sequencefileoutputformat输出结果。
我的问题是，当我开始下一个mapreduce任务时，我想将这个sequencefile用作inputfile。那么，如何设置jobclass，以及mapper类如何识别我之前重写的sequencefile中的键和值呢？
我知道我可以sequencefile.reader来读取键和值。

Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))

但我不知道如何使用这个读取器将键和值作为参数传递到mapper类中。如何将conf.setinputformat设置为sequencefileinputformat，然后让mapper获取键和值？
谢谢

hadoop sequencefile Mapper

来源：https://stackoverflow.com/questions/15179456/how-does-mapper-class-identify-the-sequencefile-as-inputfile-in-hadoop

1条答案

按热度按时间

6tdlim6h1#

您不需要手动读取序列文件。只需将输入格式类设置为序列文件：

job.setInputFormatClass(SequenceFileInputFormat.class);

并将输入路径设置为包含序列文件的目录。

FileInputFormat.setInputPaths(<path to the dir containing your sequence files>);

您需要注意Map器类的参数化类型上输入的（key，value）类型，以匹配序列文件中的（key，value）元组。

赞(0）回复(0）举报 2021-06-03

我来回答

mapper类如何在hadoop中将sequencefile标识为inputfile？

1条答案

相关问题

热门标签

最新问答