kubernetes 在Kuberentes退出代码上启动提交

bvjxkvbb  于 2022-11-02  发布在  Kubernetes
关注(0)|答案(2)|浏览(146)

如何在运行spark-submit时,以编程方式检查spark作业是成功还是失败。通常使用unix退出代码。

phase: Failed
 container status:
     container name: spark-kubernetes-driver
     container image: <regstry>/spark-py:spark3.2.1
     container state: terminated
     container started at: 2022-03-25T19:10:51Z
     container finished at: 2022-03-25T19:10:57Z
     exit code: 1
     termination reason: Error

2022-03-25 15:10:58,457 INFO submit.LoggingPodStatusWatcherImpl: Application Postgres-Minio-Kubernetes.py with submission ID spark:postgres-minio-kubernetes-py-b70d3f7fc27829ec-driver finished
2022-03-25 15:10:58,465 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-25 15:10:58,466 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3321e67c-73d5-422d-a26d-642a0235cf23

进程失败,当我在unix中通过echo $得到退出代码时,它返回了一个零错误代码!

$ echo $?
0

pod也是随机生成的。除了使用sparkonk 8 operator之外,spark-submit还有什么处理方式?

crcmnpdw

crcmnpdw1#

如果你正在使用bash,一种方法是grep输出。你可能需要grep输出stderr or stdout,这取决于日志输出被发送到哪里。
大概是这样的:

OUTPUT=`spark-submit ...`
if echo "$OUTPUT" | grep -q "exit code: 1"; then
    exit 1
fi
t5zmwmid

t5zmwmid2#

除了@Rico提到的事情之外,我还考虑了clusterclient的部署模式,并更改了$SPARK_HOME/bin目录中的spark-submit shell文件,如下所示。


# !/usr/bin/env bash

# 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# 

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

# disable randomized hash for string in Python 3.3+

export PYTHONHASHSEED=0

# check deployment mode.

if echo "$@" | grep -q "\-\-deploy-mode cluster";
then
    echo "cluster mode..";
    # temp log file for spark job.
    export TMP_LOG="/tmp/spark-job-log-$(date '+%Y-%m-%d-%H-%M-%S').log";
    exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@" |& tee ${TMP_LOG};
    # when exit code 1 is contained in spark log, then return exit 1.
    if cat ${TMP_LOG} | grep -q "exit code: 1";
    then
        echo "exit code: 1";
        rm -rf ${TMP_LOG};
        exit 1;
    else
        echo "job succeeded.";
        rm -rf ${TMP_LOG};
        exit 0;
    fi
elif echo "$@" | grep -q "\-\-conf spark.submit.deployMode=cluster";
then
    echo "cluster mode..";
    # temp log file for spark job.
    export TMP_LOG="/tmp/spark-job-log-$(date '+%Y-%m-%d-%H-%M-%S').log";
    exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@" |& tee ${TMP_LOG};
    # when exit code 1 is contained in spark log, then return exit 1.
    if cat ${TMP_LOG} | grep -q "exit code: 1";
    then
        echo "exit code: 1";
        rm -rf ${TMP_LOG};
        exit 1;
    else
        echo "job succeeded.";
        rm -rf ${TMP_LOG};
        exit 0;
    fi
else
    echo "client mode..";
    exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
fi

然后,我已经建立和推动我的Spark码头形象。
有关详细信息,请参阅以下链接:

相关问题