java—通过mapreduce读取与特定模式匹配的目录中的文件，并输出各个文件的名称

huwehgph 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(358)

我试图读取一个目录中的文件，该目录的路径被指定为mapreduce程序的参数。其目的是对每个文件执行一些计算（比如某个特定单词的出现次数）。文件名也必须与模式匹配（比如.java文件）。程序的输出是文件名和计算值。
到目前为止，我已经能够实现一个非常基本的map程序，它读取目录的内容而不需要任何特定的模式，并输出文件名和一个常量。Map程序代码如下所示

public class CCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    private static IntWritable complexityCount = new IntWritable(1);
    private Text result = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {

        String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
        result.set(filePathString);
        context.write(result, complexityCount);

    }
 }

输入目录有3个文件-file1，file2，file3。但是这个程序的输出是这样的

file1.txt   1
file1.txt   1
file1.txt   1
file1.txt   1
file1.txt   1
file1.txt   1
file1.txt   1
file2.txt   1
file2.txt   1
file2.txt   1
file2.txt   1
file3.txt   1

如何让程序为每个文件输出一个示例。还有一种方法可以一次读取一个文件，对该文件执行计算并输出文件名和结果吗？如何修改inputsplit的值以匹配每个特定文件的大小？

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/38918355/read-files-in-a-directory-matching-a-particular-pattern-through-mapreduce-and-ou

1条答案

按热度按时间

j9per5c41#

我知道你的代码正在读取每个文件的内容。file1必须有7行，因此每行的键值对为“file1.txt 1”。同样，file2.txt必须有4行，file3.txt必须有1行。
要输出每个文件的一个匹配项，必须在reduce函数中编写代码，以根据键对值进行求和。

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable value : values) {
    sum += value.get();
  }

  context.write(key, new IntWritable(sum));
}

}

赞(0）回复(0）举报 2021-05-30

我来回答

java—通过mapreduce读取与特定模式匹配的目录中的文件，并输出各个文件的名称

1条答案

相关问题

热门标签

最新问答