在hdfs中移动和合并目录

6bc51xsx 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(368)

我正在更改hdfs目录结构。目前的情况如下：

.../customers/customers1/2016-05-16-10/lots_of_files1.csv
.../customers/customers2/2016-05-16-10/lots_of_files2.csv
.../customers/customers3/2016-05-16-10/lots_of_files1.csv
.../customers/customers4/2016-05-16-10/...
.../customers/customers5/2016-05-16-10/...
.../customers/customers6/2016-05-16-10/...
.../customers/customers7/2016-05-16-10/...

我想摆脱顾客（1-7）：

.../customers/2016-05-16-10/lots_of_files1.csv
.../customers/2016-05-16-10/lots_of_files2.csv
.../customers/2016-05-16-10/lots_of_files1(1).csv

我本想使用snakebite python hdfs库，但出现了很多边缘情况：1。同一日期可能出现多次。2csv的名称可能出现多次，但其数据不同，必须同时移动。
你如何以最干净的方式做到这一点？

hadoop hdfs python snakebite

来源：https://stackoverflow.com/questions/37079017/moving-and-merging-directories-in-hdfs

1条答案

按热度按时间

qlzsbp2j1#

如果您不担心保留文件名，那么可以轻松地使用apachedrill。apachedrill支持通过sql读写文件。像这样的事情

create table dfs.`/myfolder/customers/2016-05-16-10` select * from dfs.`/myfolder/customers` where dir1 = '2016-05-16-10';

/*/2016-05-16-10中的所有文件都将写入目标表。
https://drill.apache.org/docs/

赞(0）回复(0）举报 2021-05-29

我来回答

在hdfs中移动和合并目录

1条答案

相关问题

热门标签

最新问答