无法加载页面,无法加载libhdfs.so

r1wp621o  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(464)

我正在尝试将pyarrow Filesystem接口与HDFS配合使用。libhdfs.so在调用fs. Hadoop文件系统构造函数时,我收到了www.example.com not found错误,尽管libhdfs.so显然位于指定的位置。

from pyarrow import fs
hfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)

OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory

我已经尝试了不同的python和pyarrow版本,并设置了ARROW_LIBHDFS_DIR。为了测试,我在linuxmint上使用了下面的dockerfile。

FROM openjdk:11

RUN apt-get update &&\
  apt-get install wget -y

RUN wget -nv https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-aarch64.tar.gz &&\
  tar -xf hadoop-3.3.1-aarch64.tar.gz

ENV PATH=/miniconda/bin:${PATH}
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh &&\
  bash miniconda.sh -b -p /miniconda &&\
  conda init 

RUN conda install -c conda-forge python=3.9.6
RUN conda install -c conda-forge pyarrow=4.0.1

ENV JAVA_HOME=/usr/local/openjdk-11
ENV HADOOP_HOME=/hadoop-3.3.1  

RUN  printf 'from pyarrow import fs\nhfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)\n' > test_arrow.py

# 'python test_arrow.py' fails with ... 
# OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
RUN python test_arrow.py || true

CMD ["/bin/bash"]
vuktfyat

vuktfyat1#

我已经为pyarrow fs hadoopfilesystem客户端创建了一个docker文件。需要安装HDFS才能使用libhdfs.so文件。

RUN mkdir -p /data/hadoop
    RUN apt-get -q update
    RUN apt-get install software-properties-common -y
    RUN add-apt-repository "deb http://deb.debian.org/debian/ sid main"
    RUN apt-get -q update
    RUN apt-get install openjdk-8-jdk -y
    RUN apt-get clean
    RUN rm -rf /var/lib/apt/lists/*
    RUN wget "https://dlcdn.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz" -O hadoop-3.3.2.tar.gz
    RUN tar xzf hadoop-3.3.2.tar.gz
    ENV HADOOP_HOME=/app/hadoop-3.3.2
    ENV HADOOP_INSTALL=$HADOOP_HOME
    ENV HADOOP_MAPRED_HOME=$HADOOP_HOME
    ENV HADOOP_COMMON_HOME=$HADOOP_HOME
    ENV HADOOP_HDFS_HOME=$HADOOP_HOME
    ENV YARN_HOME=$HADOOP_HOME
    ENV HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    ENV PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    ENV HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"
    ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    ENV CLASSPATH="$HADOOP_HOME/bin/hadoop classpath --glob"
    ENV ARROW_LIBHDFS_DIR=$HADOOP_HOME/lib/native
    ADD pyarrow-app.py /app/
    CMD [ "python3" "-u" "/app/pyarrow-app.py.py"]

相关问题