如何计算一个mapreduce作业需要多少Map器

eoxn13cs  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(274)

下面我有一个问题给了我们这个信息。

Suppose the program presented in 2a) will be executed on a dataset of 200 million
recorded inspections, collecting 2000 days of data. In total there are 1,000,000 unique
establishments. The total input size is 1 Terabyte. The cluster has 100 worker nodes
(all of them idle), and HDFS is configured with a block size of 128MB.
Using that information, provide a reasoned answer to the following questions. State
any assumptions you feel necessary when presenting your answer.

在这里,我被要求回答这些问题。

1) How many worker nodes will be involved during the execution of the Map and Reduce
tasks of the job? 
2) How many times does the map method run on each physical worker?
3) How many input splits are processed at each node? 
4) How many times will the reduce method be invoked at each reducer?

有人能证实我的答案是正确的吗?
问题1)我基本上是在计算我需要多少绘图员?我的计算结果是1tb(输入大小)除以块大小(128mb)。
1tb/128mb=7812.5。由于7812.5Map器是需要的,我们只有100个工人节点,所有100个节点将被正确使用?
q2)从q1开始,我发现需要7812.5个Map器,因此每个Map方法将在每个pyhsical worker上运行7812.5次(四舍五入到7813次)。
q3)输入拆分与Map器的数量相同,因此将有7813个拆分。
问题4)因为我被告知有1000000个唯一值,并且默认的异径管数量是2。reduce方法将在每个减速器上运行500000次。
有人能仔细分析我的推理,看看我是否正确吗?谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题