我的清除脚本在hdfs中删除早于1天的tmp/log文件花费的时间太长

wwodge7n  于 2021-07-13  发布在  Hadoop
关注(0)|答案(0)|浏览(207)

我试图删除hdfs中的tmp/log文件,因为它们占用了太多的空间。我使用下面的shell脚本来清除日志,但是它每小时删除40-50个文件,速度非常慢。有什么我可以修改来提高它的速度吗?


# !/bin/bash

current_date=$(date +'%y-%m-%d')
retention_date=$(date -d "$current_date 1 day ago" '+%Y-%m-%d')
echo "current_date:$current_date"
echo "retention_date:$retention_date"
sum=0
total_size=0
hdfs dfs -ls /tmp/logs/ | while read line ; do
username=$(echo ${line} | awk '{ print $3 }')

# echo "username:$username"

if [ "$username" = "XYZ" ]; then
        created_date=$(echo ${line} | awk '{ print $6 }')
        echo"createddate:$created_date"
        size=$(echo ${line} | awk '{ print $5 }')
        filename=$(echo ${line} | awk '{ print $8 }')
        if [[ "$retention_date" > "$created_date" ]]; then
                #echo"inside if"
                sum=$((sum + 1))
                #echo "sum:$sum"
                total_size=$((total_size + size))
                hdfs dfs -rm -r $filename
        fi
fi
done
hdfs dfs -rm -r /user/ABC_USER/.Trash/Current/tmp/logs/
echo "Total Folders qualified: $sum"

# echo "Total size: $total_size"

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题