hadoop在输入文件夹中选择输入文件

euoag5mw 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(368)

在 training_set 文件夹、文件的存在方式如下

mv_000000
mv_000001
mv_000002
...

索引是可以在上找到的电影id
movie_title.txt movie_title.tx 文件如下：

1,2003,Dinosaur Planet
2,2004,Isle of Man TT 2004 Review
3,1997,Character   
4,1994,Paula Abdul's Get Up & Dance
5,2004,The Rise and Fall of ECW 
...

第一列是特定电影名称的索引。
我在netplix奖品竞赛数据集上练习hadoop。我假设我插入了特定的电影标题，比如“生病”。那就去 movie_titles.txt 文件和搜索moive标题id的“生病”。最后设置输入路径电影标题id。
例如，如果我将hadoop程序启动为：

hadoop jar ~ [input path] [output path] [moiveA name]

必须设置输入路径 training_set/mv_movieAIndex .
如我所说，电影id的信息存在于 movie_title.txt .
请给我一点提示来解决这个问题。

Java hadoop

来源：https://stackoverflow.com/questions/26825570/hadoop-choose-input-file-among-input-folder

1条答案

按热度按时间

92dk7w1h1#

你的要求似乎与 Hadoop 完全。您只需要查找 id 根据 hadoop jar 命令。以下代码段将完成此工作：

private static Map<String, Integer> getMovieMappings(String filePath)
        throws IOException {
    Map<String, Integer> movieMap = new HashMap<String, Integer>();
    BufferedReader br = null;
    try {
        br = new BufferedReader(new FileReader(filePath));
        String line;
        while ((line = br.readLine()) != null) {
            String[] temp = line.split(",");
            movieMap.put(temp[2].trim(), Integer.parseInt(temp[0].trim()));
        }
    } finally {
        if (br != null)   br.close(); 
    }
    return movieMap;
}

现在在驱动程序中，只需获取Map并相应地设置输入路径：

Map<String, Integer> movieMap = getMovieMappings("/pathTo/movie_title.txt");
int movieId = movieMap.get(args[2]);
System.out.println(String.format("mv_%06d", movieId));
FileInputFormat.addInputPath( job, 
                              new Path( "training_set",
                                        String.format("mv_%06d", movieId)));

愿它有用。

赞(0）回复(0）举报 2021-06-03

我来回答

hadoop在输入文件夹中选择输入文件

1条答案

相关问题

热门标签

最新问答