kubernetes 我的spark作业的jar文件上的线程“main”中的异常java.nio.file.NoSuchFileException

qojgxg4l  于 2023-10-17  发布在  Kubernetes
关注(0)|答案(1)|浏览(177)

在AKS上运行Spark 3..5.0,我试图运行Spark包附带的一个示例作业,准确地说是JavaSparkSQLExample,当驱动程序pod启动时,我得到了一个NoSuchFileException,由于某种原因,它找不到应该上传到容器中的jar文件。我使用Spark包附带的docker文件来构建镜像:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG java_image_tag=17-jre

FROM eclipse-temurin:${java_image_tag}

ARG spark_uid=185

# Before building the docker image, first build and make a Spark distribution following
# the instructions in https://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/examples && \
    mkdir -p /opt/spark/work-dir && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*

COPY jars /opt/spark/jars
# Copy RELEASE file if exists
COPY RELEAS[E] /opt/spark/RELEASE
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}

这是我的K8S配置:

apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: spark-sql-proto-app
    namespace: spark-sql-proto
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: spark-sql-proto-app
    template:
      metadata:
        labels:
          app: spark-sql-proto-app
      spec:
        nodeSelector:
          beta.kubernetes.io/os: linux
        containers:
          - name: spark-proto-app
            image: mycontainerregistry.azurecr.io/mycustomsparkimage:latest
            command:
            - /opt/entrypoint.sh
            args:
            - /opt/spark/bin/spark-submit
            - --verbose
            - --master
            - k8s://https://sparkakscluster-dns-5i9mchtu.hcp.westeurope.azmk8s.io:443
            - --deploy-mode
            - cluster
            - --conf
            - spark.kubernetes.namespace=spark-sql-proto
            - --conf
            - spark.kubernetes.container.image.pullPolicy=Always
            - --conf
            - spark.executor.instances=5
            - --conf
            - spark.driver.memory=3G
            - --conf
            - spark.executor.memory=3G
            - --conf
            - spark.kubernetes.submission.waitAppCompletion=true
            - --conf
            - spark.kubernetes.driverEnv.HTTP2_DISABLE=true
            - --name
            - spark-proto-app
            - --conf
            - spark.kubernetes.file.upload.path=/opt/spark/work-dir
            - --conf
            - spark.kubernetes.container.image=mycontainerregistry.azurecr.io/mycustomsparkimage:latest
            - --class
            - main.java.org.apache.spark.examples.sql.JavaSparkSQLExample
            - file:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar
            imagePullPolicy: IfNotPresent
            ports:
              - containerPort: 8080
              - containerPort: 4040
              - containerPort: 9870
              - containerPort: 8088
              - containerPort: 8042

我尝试将此K8S配置应用于我的aks集群,我希望它星星所有需要的pod,但驱动程序pod在启动时失败,因为NoSuchFileException。我尝试了几种不同的属性值 spark.kubernetes.file.upload.path,但错误总是相同的。

t40tm48m

t40tm48m1#

好吧,我发现我错在哪里了,我以为pod可以直接交换文件。实际上,有必要让集群中的所有Pod都可以访问一个共享目录,以便驱动程序和执行程序可以上传和下载spark job的文件。在我的例子中,由于我在Azure中工作,所以我使用了Azure存储帐户,因此属性 spark.kubernetes.file.upload.path 必须设置为以下值:abfs:// email protected(https://stackoverflow.com/cdn-cgi/l/email-protection) /

相关问题