我是否在Map器中生成了更多的线程？

gk7wooem 于 2021-06-03 发布在 Hadoop

关注(0)|答案(0)|浏览(209)

我正在尝试使用和创建一个web解析器，因为在程序从我创建的多线程文档中检索文档的过程中，自然会有停机时间。这个想法是我的线程从url堆中检索url。当我在emr上用中等示例运行程序时，它的速度提高了三倍。在大型示例中，我遇到了内存不足错误。我是只需要更少的线程，还是线程的数量没有我想象的那么严格？这是我的Map：

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
        private Text word = new Text();
        private URLPile pile= new URLPile();

        @Override
        public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)  {

            // non english encoding list, all others are considered english to
            // avoid missing any
            String url = value.toString();
            StringTokenizer urls = new StringTokenizer(url);
            Config.LoggerProvider = LoggerProvider.DISABLED;
            MyThread[] Threads = new MyThread[8];
            for(MyThread thread : Threads){
                thread = new MyThread(output,pile);
                thread.start();
            }

                while (urls.hasMoreTokens()) {
                    try{

                        if(urls.hasMoreTokens()){
                            word.set(urls.nextToken());
                            String currenturl= word.toString();   
                             pile.addUrl(currenturl);
                        }else{
                            System.out.println("out of tokens");
                            pile.waitTillDone();
                        }

                    } catch (Exception e) {
                        /*

                         */
                        continue;
                    }

                }

        }

}

Java hadoop emr

来源：https://stackoverflow.com/questions/17707883/am-i-spawning-more-threads-then-i-think-i-am-in-my-mapper

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

我是否在Map器中生成了更多的线程？

暂无答案！

相关问题

热门标签

最新问答