hadoop中的组合器、还原器和ecosystemproject

kxxlusnw  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(396)

你认为这个网站上提到的问题4的答案是什么?
答案是对的还是错的
问题:4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.

Answer:A

以及
问题:3

What happens in a MapReduce job when you set the number of reducers to one?

A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A

根据我的理解,以上问题的答案

Question 4: D
Question 3: B

更新

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B

对于更新后的问题,我的答案是b和c
编辑
正确答案:sqoop。

9nvpjoqh

9nvpjoqh1#

对我来说,问题4和3的答案似乎是正确的。对于问题4,这是非常合理的,因为当使用组合器时,Map输出被保存在集合n中,首先处理,然后在缓冲区满时刷新。为了证明这一点,我将添加以下链接:http://wiki.apache.org/hadoop/hadoopmapreduce
在这里,它清楚地说明了为什么合路器将增加速度的过程。
另外,我认为q.3的答案也是正确的,因为一般来说,这是基本配置,然后是默认配置。为了证明这一点,我将添加另一个信息链接:https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types

uujelgoq

uujelgoq2#

据我所知,这两个答案都是错的。
我没怎么和警察合作过 Combiner 但无论在哪里,我发现它都在处理 Mapper . 第四个问题的答案应该是d。
根据实践经验,我发现输出文件的数量总是等于输出文件的数量 Reducer s。所以第三个问题的答案应该是b。使用时可能不是这样 MultipleOutputs 但这并不常见。
最后,我认为apache不会对mapreduce撒谎(例外情况确实会发生:)。这两个问题的答案都可以在他们的wiki页面上找到。看一看。
顺便说一下,我喜欢“100%通过保证或您的钱回来!!!”引用您提供的链接;-)
编辑
因为我对pig&sqoop知之甚少,所以不太清楚更新部分的问题。但是当然,使用hive也可以通过在hdfs数据上创建外部表,然后连接来实现这一点。
更新
在用户milk3422和所有者的评论之后,我做了一些搜索,发现我认为hive是最后一个问题的答案的假设是错误的,因为涉及到另一个oltp数据库。正确的答案应该是c,因为sqoop设计用于在hdfs和关系数据库之间传输数据。

相关问题