python-3.x 创建dataproc群集时初始化操作失败

dgiusagp  于 2023-02-10  发布在  Python
关注(0)|答案(1)|浏览(122)

我在创建Dataproc群集时运行了一个初始化脚本。该脚本将python wheel软件包从GCS复制到群集中,然后在群集上安装wheel。几周前这似乎运行良好,但今天当我使用新版本的wheel创建群集时,它失败了,dataproc日志中显示以下错误。对wheel的更改仅在python软件包中(代码),它是一个纯粹的python wheel。我使用的是dataproc映像版本1.5.53-debian 10

pip is already installed.
Traceback (most recent call last):
    sys.exit(main())
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/main.py", line 45, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/commands/__init__.py", line 96, in create_command
    module = importlib.import_module(module_path)
  File "/opt/conda/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 23, in <module>
    from pip._internal.cli.req_command import RequirementCommand
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 20, in <module>
    from pip._internal.network.session import PipSession
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/network/session.py", line 17, in <module>
    from pip._vendor import requests, six, urllib3
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_vendor/requests/__init__.py", line 97, in <module>
    from pip._vendor.urllib3.contrib import pyopenssl
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/contrib/pyopenssl.py", line 46, in <module>
    import OpenSSL.SSL
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/__init__.py", line 8, in <module>
    from OpenSSL import crypto, SSL
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/crypto.py", line 1553, in <module>
    class X509StoreFlags(object):
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/crypto.py", line 1573, in X509StoreFlags
    CB_ISSUER_CHECK = _lib.X509_V_FLAG_CB_ISSUER_CHECK
AttributeError: module 'lib' has no attribute 'X509_V_FLAG_CB_ISSUER_CHECK'

似乎是dataproc中的python和pip包的一些问题,有没有人能建议如何解决这个问题,我正在使用的初始化脚本如下。

#!/bin/bash 
function install_pip() {
  if command -v pip >/dev/null; then
      echo "pip is already installed."
      return 0
  fi   
  if command -v easy_install >/dev/null; then
    echo "Installing pip with easy_install..."
    easy_install pip
    return 0
  fi   
  echo "Installing python-pip..."
  apt update
  apt install python-pip -y
} 
# install pip in the cluster
install_pip 

# Get the GCS location for SDK from cluster metadata attributes
SDK_GCS_LOCATION="$(/usr/share/google/get_metadata_value attributes/sdk-gcs-location)"
SDK_FILE_NAME="$(/usr/share/google/get_metadata_value attributes/sdk-file-name)"
readonly SDK_GCS_LOCATION
readonly SDK_FILE_NAME
SDK_WHEEL=$SDK_GCS_LOCATION/$SDK_FILE_NAME

# Copy wheel file from GCS to cluster
gsutil cp $SDK_WHEEL .

# Install wheel in cluster
pip install $SDK_FILE_NAME
myzjeezk

myzjeezk1#

我尝试将此软件包的最新版本pyOpenSSL==23.0.0添加到我用于创建群集的airflow DAG的dataproc:pip.packages配置中,其他软件包如下所示:

"software_config": {
        "image_version": "1.5.53-debian10",
        "properties": {
            "dataproc:efm.spark.shuffle": "primary-worker",
            "dataproc:efm.mapreduce.shuffle": "hcfs",
            "dataproc:pip.packages": "pyOpenSSL==23.0.0",
        },
    },

当软件包升级到与pip兼容的最新版本时,这就起作用了。我希望@Dataproc/@google-cloud-platform管理这些库,因为python和pip预装在dataproc集群中。

相关问题