hadoopMap减少整个文件的输入格式

cyvaqqii 于 2021-05-30 发布在 Hadoop

关注(0)|答案(3)|浏览(314)

我正在尝试使用hadoop map reduce，但是我不想在Map器中一次Map每一行，而是希望一次Map整个文件。
所以我找到了这两个类(https://code.google.com/p/hadoop-course/source/browse/hadoopsamples/src/main/java/mr/wholefile/?r=3)希望能帮我做这件事。
我发现一个编译错误说：
类型jobconf中的方法setinputformat（class）不适用于参数（class）driver.java/ex2/src line 33 java问题
我把驾驶课改成了

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

import forma.WholeFileInputFormat;

/*
 * Driver
 * The Driver class is responsible of creating the job and commiting it.
 */
public class Driver {
    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(Driver.class);
        conf.setJobName("Get minimun for each month");

        conf.setOutputKeyClass(IntWritable.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        // previous it was 
        // conf.setInputFormat(TextInputFormat.class);
        // And it was changed it to :
        conf.setInputFormat(WholeFileInputFormat.class);

        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf,new Path("input"));
        FileOutputFormat.setOutputPath(conf,new Path("output"));

        System.out.println("Starting Job...");
        JobClient.runJob(conf);
        System.out.println("Job Done!");
    }

}

我做错什么了？

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/29684747/hadoop-map-reduce-whole-file-input-format

3条答案

按热度按时间

wlp8pajw1#

确保您的wholefileinputformat类具有正确的导入。您正在作业驱动程序中使用旧的mapreduce api。我认为您在wholefileinputformat类中导入了新的api fileinputformat。如果我是对的，您应该在wholefileinputformat类中导入org.apache.hadoop.mapreduce.lib.input.fileinputformat，而不是org.apache.hadoop.mapreduce.lib.input.fileinputformat。
希望这有帮助。

赞(0）回复(0）举报 2021-05-30

lqfhib0f2#

最简单的方法是gzip您的输入文件。这将使 FileInputFormat.isSplitable() 返回false。

赞(0）回复(0）举报 2021-05-30

sg3maiej3#

我们也遇到了类似的情况，并有一个另类的开箱即用的方法。
假设您需要处理100个大文件（f1，f2，…，f100），这样您就需要在map函数中完整地读取一个文件。因此，我们没有使用“wholeinputfileformat”读取器方法，而是创建了等效的10个文本文件（p1、p2、…、p10），每个文件包含f1-f100文件的hdfs url或web url。
因此p1将包含f1-f10的url，p2将包含f11-f20的url，依此类推。
这些新文件p1到p10然后被用作Map器的输入。因此，Map器m1处理文件p1将一次打开一个文件f1到f10，并对其进行整体处理。
这种方法允许我们控制Map器的数量，并在map-reduce应用程序中编写更详尽、更复杂的应用程序逻辑。e、我们可以用这种方法在pdf文件上运行nlp。

赞(0）回复(0）举报 2021-05-30

我来回答

hadoopMap减少整个文件的输入格式

3条答案

相关问题

热门标签

最新问答