pyspark-importerror:没有名为

14ifxucb  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(440)

我正在做一个pyspark项目,下面是我的项目目录结构。

project_dir/
src/
    etl/
       __init__.py
       etl_1.py
       spark.py
    config/
       __init__.py
    utils/
       __init__.py
test/
    test_etl_1.py
setup.py
README.md
requirements.txt

当我运行低于单元测试代码时

python test_etl_1.py

Traceback (most recent call last):
  File "test_etl_1.py", line 1, in <module>
    from src.etl.spark import get_spark
ImportError: No module named src.etl.spark

这是我的单元测试文件:

from src.etl.spark import get_spark
from src.etl.addcol import with_status

class TestAppendCol(object):

  def test_with_status(self):

    source_data = [
        ("p", "w", "pw@sample.com"),
        ("j", "b", "jb@sample.com")
    ]
    source_df = get_spark().createDataFrame(
        source_data,
        ["first_name", "last_name", "email"]
    )

    actual_df = with_status(source_df)

    expected_data = [
        ("p", "w", "pw@sample.com", "added"),
        ("j", "b", "jb@sample.com", "added")
    ]
    expected_df = get_spark().createDataFrame(
        expected_data,
        ["first_name", "last_name", "email", "status"]
    )

    assert(expected_df.collect() == actual_df.collect())

我需要以pytest的形式运行此文件,但由于模块错误,它无法工作。你能帮我解决这个错误吗。

tcbh2hod

tcbh2hod1#

您的源代码是src,模块是etl、config和util。所以像下面这样更新导入。

from etl.spark import get_spark
from etl.addcol import with_status

确保pythonpath指向projectdir/src目录

相关问题