bash—在特定作业上运行shell脚本时，如何从hdfs获取最新的有效分区日期？

kx5bkwkv 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(268)

我的任务是实现为特定spark作业分配的所有表。我需要写一个脚本的基础上，时间戳和路径要打印的所有表分配给该工作。我需要获取与该作业相关联的表的所有时间戳。
这是我写的剧本。


# !/usr/bin/env bash

JOB_NAME=${1}
 inputDirListings=$(awk -F: -v key="$1" '$1==key {print $2}' test_paths.txt)
for dir in  $(echo $inputDirListings | tr "," "\n");
do
    path=$dir
    echo "dir is $path"
    cmd2='hdfs dfs -du -h $path'
    ev1=`eval $cmd2 | tail -1`
    echo "ev1 value is $ev1"

    hdfsPath=`echo $ev1 | cut -d";" -f3- `
    echo "partition is $hdfsPath"

    latestPartition=`echo $hdfsPath | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'`
    echo "latest partition is $latestPartition"

    dt1="$(echo $ev1 | cut -d'=' -f2)"
    arr[i]=`date -d $dt1 +%Y%m%d`

    #---Getting minimum date from array---------
    max=${arr[0]}
    min=${arr[0]}

    for i in ${arr[@]}
    do
    if [[ $i > $max ]] ; then                           
    max=$i-1
    fi
    if [[ $i < $min ]] ; then
    min=$i
    fi
    echo "dt1"
    for (( c=$dt1; c<=$currDate; c++ ))
    do
        echo -n "$c "
        sleep 1
    done 
done
 echo "Max value is $max  , minimal value is $min"
dt2=`date -d $min +%Y-%m-%d`
done

我得到一个输出作为最大值和最小值作为相同的值
如： Max size is 9999-12-31, Min size is 9999-12-31 基本上，我需要得到最新的分区日期之前 9999-12-31

hadoop shell apache-spark bash

来源：https://stackoverflow.com/questions/51449355/how-to-get-latest-valid-partition-date-from-hdfs-when-running-a-shell-script-on

1条答案

按热度按时间

htzpubme1#

您的代码只将最后一个目录中的分区值存储到数组中，因为它在循环中每次都被覆盖。
数组需要在循环外定义，并且 i 需要在循环内递增，并且需要取出内部循环，如下所示：


# !/usr/bin/env bash

JOB_NAME=${1}
arr=()
i=0

inputDirListings=$(awk -F: -v key="$1" '$1==key {print $2}' test_paths.txt)
for dir in  $(echo $inputDirListings | tr "," "\n");
do
    path=$dir
    echo "dir is $path"
    cmd2='hdfs dfs -du -h $path'
    ev1=`eval $cmd2 | tail -1`
    echo "ev1 value is $ev1"

    hdfsPath=`echo $ev1 | cut -d";" -f3- `
    echo "partition is $hdfsPath"

    latestPartition=`echo $hdfsPath | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'`
    echo "latest partition is $latestPartition"

    dt1="$(echo $ev1 | cut -d'=' -f2)"
    arr[i]=`date -d $dt1 +%Y%m%d`

    let "i++"
echo "Max value is $max  , minimal value is $min"
dt2=`date -d $min +%Y-%m-%d`
done

# ---Getting minimum date from array---------

max=${arr[0]}
min=${arr[0]}

for i in ${arr[@]}
do
    if [[ $i > $max ]] ; then                           
        max=$i-1
    fi
    if [[ $i < $min ]] ; then
        min=$i
    fi
    echo "dt1"
    for (( c=$dt1; c<=$currDate; c++ ))
    do
        echo -n "$c "
        sleep 1
    done 
done

赞(0）回复(0）举报 2021-05-29

我来回答

bash—在特定作业上运行shell脚本时，如何从hdfs获取最新的有效分区日期？

1条答案

相关问题

热门标签

最新问答