我创建了一个包含2000万条记录的暂存表,其中只有两个字段viewerid和viewerid。因此,我试图创建一个带有“viewerid”列的动态分区orc表,但Map作业并没有完成,如所附图片所示
mapred-site.xml文件
<configuration>
<property>
<name> mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
</property>
**yarn-site.xml**
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8031</value>
</property>
作业状态:
我的舞台桌:
hive> desc formatted bmviews;
OK
# col_name data_type comment
viewerid int
viewedid int
# Detailed Table Information
Database: bm
Owner: sudheer
CreateTime: Tue Aug 29 18:22:34 IST 2017
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop-master:54311/user/hive/warehouse/bm.db/bmviews
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 9
numRows 0
rawDataSize 0
totalSize 539543256
transient_lastDdlTime 1504070146
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
my partition table description:
我已经将每个节点的分区更改为200k,但仍然面临这个问题。我有两个数据节点(8g,6g)内存分别和namenode与16gb内存。
如何将数据插入分区表?
暂无答案!
目前还没有任何答案,快来回答吧!