linux—在hdfs中将一个文件拆分为多个小文件

7dl7o3gd 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(635)

我有个文件叫 test.txt 在 HDFS . 它包含 1000 记录。
我想分一杯羹 test.txt 分为10个小文件，其中包含相同数量的记录。
我可以在家里做这个 Linux 就像下面一样

split -l $(($(wc -l < test.txt )/10 + 1)) test.txt

中是否有类似的功能 HDFS .
我怎样才能做到这一点 HDFS

hadoop linux hdfs split bash

来源：https://stackoverflow.com/questions/43747834/split-a-file-into-no-of-small-files-in-hdfs

1条答案

按热度按时间

zdwk9cvp1#

一个简单的hadoop流作业，输入格式如下 NLineInputFormat 我能搞定的。

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-<version>.jar \
   -Dmapreduce.input.lineinputformat.linespermap=10 \
   -Dmapreduce.job.reduces=0 \
   -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \
   -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
   -input /test.txt \
   -output /splitted_output

这里是物业 mapreduce.input.lineinputformat.linespermap 确定每个拆分必须包含的行数。

赞(0）回复(0）举报 2021-05-29

我来回答

linux—在hdfs中将一个文件拆分为多个小文件

1条答案

相关问题

热门标签

最新问答