Flink SQL Client & Iceberg on MinIo:找不到org.apache.hadoop.fs.s3a.S3AFileSystem类

wljmcqd8  于 2023-08-01  发布在  Apache
关注(0)|答案(1)|浏览(170)

在尝试将本地环境设置为将数据从Flink流式传输到MinIO上的Iceberg表时遇到了一些问题。

[ERROR] Could not execute SQL statement. Reason:
org.apache.hadoop.hive.metastore.api.MetaException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

字符串
这是我用于Flink jobmanager/taskmanager和sqlclient的Dockerfile

FROM flink:1.16.2-scala_2.12-java11

ENV HADOOP_VERSION=3.3.2

RUN APACHE_HADOOP_URL=https://archive.apache.org/dist/hadoop/ \
    && HADOOP_VERSION=3.3.2 \
    && wget ${APACHE_HADOOP_URL}/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz \
    && tar xzvf hadoop-${HADOOP_VERSION}.tar.gz \
    && HADOOP_HOME=`pwd`/hadoop-${HADOOP_VERSION}

ENV HADOOP_CLASSPATH=/opt/flink/hadoop-${HADOOP_VERSION}/etc/hadoop:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/common/lib/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/common/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/hdfs:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/hdfs/lib/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/hdfs/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/mapreduce/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/yarn:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/yarn/lib/*:/opt/flink/hadoop-${HADOOP_VERSION}/share/hadoop/yarn/*

COPY lib/flink-sql-connector-hive-3.1.2_2.12-1.16.2.jar /opt/flink/lib/
COPY lib/flink-sql-connector-kafka-1.16.2.jar /opt/flink/lib/
COPY lib/iceberg-flink-runtime-1.16-1.3.0.jar /opt/flink/lib/
COPY lib/iceberg-hive-runtime-1.3.0.jar /opt/flink/lib/
COPY lib/hive-metastore-3.1.3.jar /opt/flink/lib/
COPY lib/hadoop-aws-3.3.2.jar /opt/flink/lib/
COPY lib/aws-java-sdk-bundle-1.11.1026.jar /opt/flink/lib/

COPY lib/flink-s3-fs-hadoop-1.16.2.jar /opt/flink/plugins/

WORKDIR /opt/flink


下面是docker-compose服务定义:

sqlclient:
    container_name: sqlclient
    build: flink
    command:
      - /opt/flink/bin/sql-client.sh
      - embedded
    depends_on:
      - jobmanager
    environment:
      - ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.16.2.jar
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
      - AWS_ACCESS_KEY_ID=minio
      - AWS_SECRET_ACCESS_KEY=minio123
      - AWS_REGION=us-east-1
    volumes:
      - ./flink-sql:/etc/sql

  jobmanager:
    build: flink
    hostname: "jobmanager"
    container_name: "jobmanager"
    expose:
      - "6123"
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.16.2.jar
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
      - AWS_ACCESS_KEY_ID=minio
      - AWS_SECRET_ACCESS_KEY=minio123
      - AWS_REGION=us-east-1

  taskmanager:
    build: flink
    hostname: "taskmanager"
    container_name: "taskmanager"
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: taskmanager
    links:
      - jobmanager:jobmanager
    environment:
      - ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.16.2.jar
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
      - AWS_ACCESS_KEY_ID=minio
      - AWS_SECRET_ACCESS_KEY=minio123
      - AWS_REGION=us-east-

a6b3iqyw

a6b3iqyw1#

您正在将S3插件复制到plugin文件夹的根目录中。要使用可插拔的文件系统,您必须在启动Flink之前将相应的JAR文件复制到Flink发行版的plugins目录下的目录中。
有关此的更多信息,请访问https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins/

相关问题