Apache Spark flake8数据块的linting github中的python代码使用工作流

33qvvth1 于 2023-01-31 发布在 Apache

关注(0)|答案(3)|浏览(141)

我在github中有我的databricks python代码。我设置了一个基本的工作流来使用flake8来lint python代码。这失败了，因为当我的脚本在databricks上运行时，它隐式可用的名称（如spark，sc，dbutils，getArgument等）在flake8在databricks之外（在github ubuntu vm中）lint它时不可用。
如何使用flake8在github中打印数据块笔记本？
例如，我得到个错误：

test.py:1:1: F821 undefined name 'dbutils'
test.py:3:11: F821 undefined name 'getArgument'
test.py:5:1: F821 undefined name 'dbutils'
test.py:7:11: F821 undefined name 'spark'

我在github的笔记本：

dbutils.widgets.text("my_jdbcurl", "default my_jdbcurl")

jdbcurl = getArgument("my_jdbcurl")

dbutils.fs.ls(".")

df_node = spark.read.format("jdbc")\
  .option("driver", "org.mariadb.jdbc.Driver")\
  .option("url", jdbcurl)\
  .option("dbtable", "my_table")\
  .option("user", "my_username")\
  .option("password", "my_pswd")\
  .load()

我的.github/工作流/lint.yml

on:
  pull_request:
    branches: [ master ]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-python@v1
      with:
        python-version: 3.8
    - run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
    - name: Lint with flake8
      run: |
        pip install flake8
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

apache-spark

来源：https://stackoverflow.com/questions/61019498/flake8-linting-for-databricks-python-code-in-github-using-workflows

3条答案

按热度按时间

flvlnr441#

你可以做的一件事是：

from pyspark.sql import Spark Session

spark = SparkSession.builder.getOrCreate()

不管有没有数据块，在普通Python或pyspark客户机中都可以使用。
要检测您是在文件中还是在Databricks笔记本中，您可以运行：

try:
    __file__
    print("We are in a file, like in our IDE or being tested by flake8.")
except NameError:
    print("We are in a Databricks notebook. Act accordingly.")

然后，可以有条件地初始化或为display()和其他工具创建伪变量。
这只是一个部分的解决方案。我正在研究一个更好的解决方案，我会不断更新这个答案。

赞(0）回复(0）举报 2023-01-31

eimct9ow2#

这是我的观点，所有的链接器并不适用于所有的用例，这是我所做的。我使用了一个预提交钩子，忽略了规则F821。

# Flake rules: https://lintlyci.github.io/Flake8Rules/
- repo: https://gitlab.com/pycqa/flake8
  rev: 3.8.4
  hooks:
    - id: flake8
      exclude: (^docs/)
      additional_dependencies: [flake8-typing-imports==1.7.0]
      # F821 undefined name
      args:
        [
          "--max-line-length=127",
          "--config=setup.cfg",
          "--ignore=F821",
        ]

要匹配您的语法，请添加--ignore标志：

flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --ignore=C901,F821 --statistics

赞(0）回复(0）举报 2023-01-31

siv3szwd3#

您可以添加--builtins=dbutils，spark，display以忽略内置到数据块IDE中的变量

赞(0）回复(0）举报 2023-01-31

我来回答

Apache Spark flake8数据块的linting github中的python代码使用工作流

3条答案

相关问题

热门标签

最新问答