inhdfs:如何检查两个目录是否有相同的父目录

jobtbby3  于 2021-06-01  发布在  Hadoop
关注(0)|答案(2)|浏览(454)

是否有hdfs命令来检查hdfs中的两个目录是否有公共父目录。
如:

$ hadoop fs -ls -R  /user/username/data/
/user/username/data/LIST_1539724717/SUBLIST_1533057294, 
/user/username/data/LIST_1539724717/SUBLIST_1533873826/UI,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/LIST_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,
/user/username/data/ARRAY_1539724717/SUBLIST_1533057294, 
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/UI,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/A/N,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/M/K/L,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/O/P/P,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/wkejdhew,
/user/username/data/ARRAY_1539724717/SUBLIST_1533873826/NEWDATA/oi32u,

所有这些目录共享相同的父目录 /user/username/data/LIST_1539724717/SUBLIST_1533057294 以及 /user/username/data/ARRAY_1539724717/SUBLIST_1533057294 . 我们怎么能在bash里查到?

vxqlmq5t

vxqlmq5t1#

for value in `hadoop fs -ls ${DIR}| awk '{print $NF}' | tr '\n' ' '`
do
    if [ "$value" != "items" ]; then
        #add values into "results" array
        log "info" "$value"
        results+=("$value")
    fi
done

# Loop through each value inside the array ie " $DIR"

for i in "${results[@]}"
do
    oldVal=`hadoop fs -ls -R ${i} | sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
    log "info" "Checking sub-directories under $i ! "
    #This takes the directory name as its input and extract the directories only for the provided runID
        for val in `hadoop fs -ls -R $i  | grep  1539724717 |sed 's/  */ /g' | cut -d\  -f 1,8 --output-delimiter=',' | grep ^d | cut -d, -f2`
          do

           if [[ ! ${val} =~ ${oldVal} ]]; then
               oldVal=$val
               directory+=("${oldVal}")
           fi
        done
done
``` `directory` 数组包含所需的所有目录。
b4qexyjb

b4qexyjb2#

通过创建shell脚本,目录名可以作为变量传递,我们可以检查两者是否属于同一个父级。

相关问题