docker项目测试中pytest的临时文件夹未创建

yhuiod9q  于 2023-05-06  发布在  Docker
关注(0)|答案(1)|浏览(140)

错误为:AttributeError:“function”对象没有属性“_get_object_id”
test_functions.py中的相关代码是:

import urllib.request as urllib
import os
import pandas as pd
import pyspark.sql.functions as psf
from pyspark.sql import SparkSession
from src.functions import save, write, query
import pytest

global spark

spark = SparkSession.builder \
    .master("local") \
    .appName("load_parquet") \
    .config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
    .getOrCreate()

@pytest.fixture(scope="session")
def cache_dir(tmp_path_factory):
    return tmp_path_factory.mktemp("files") / "test.parquet"

def test_save(url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet',filename=cache_dir):
    print(cache_dir)
    assert save(url,filename), "Error saving file"

def test_write(write_method='psycopg2'):

    df = spark.read.parquet(cache_dir)
    if 'filename' not in df.columns:
        df = df.withColumn('filename',psf.lit('20-03'))

.....................

我做错了什么?我花了很多时间想让这件事成功
我已经尝试了几个谷歌的页面和我的不同方法,使这一工作。没有。我希望在第一个上传递Assert,如果第二个可以读取文件,它也会通过。
代码以

#%% Run tests
if __name__ == "__main__":
    print(cache_dir)
    test_save()
    test_write()
    test_query()
    test_percentile()
    print("Everything passed")
# %%

在Magus回复测试保存不工作后,只有test_write有问题:

def test_write(cache_dir):
    write_method='psycopg2'
    url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet'
    save(url,cache_dir)

    df = spark.read.parquet(cache_dir)
    if 'filename' not in df.columns:
        df = df.withColumn('filename',psf.lit('20-03'))

2023-05-03 13:四十二:13 _ _ _ ________________________________ _ _ _ 2023-05-03 13:四十二:13 2023-05 - 03 13:四十二:13参数= PosixPath('/tmp/pytest-of-root/pytest-0/files 0/test. 2023-05-03 13:四十二:13 python_proxy_pool =<py4j.java_gateway.PythonProxyPool object at 0x7ff238c2b580>2023-05 - 03 13:四十二:13 2023 - 05 - 03 13:四十二:13 def get_command_part(parameter,python_proxy_pool=None):2023-05-03 13:42:13““将Python对象转换为符合2023-05-03 13:42:13 Py 4J协议的字符串表示。2023-05-03 13:42:13
2023-05-03 13:42:13例如,整数1转换为u"i1" 2023-05-03 13:42:13
2023-05-03 13:42:13:param参数:要转换的对象2023-05-03 13:42:13:rtype:表示命令部分的字符串2023-05-03 13:42:13“””2023-05-03 13:42:13 command_part =“”2023-05-03 13:42:13
2023-05-03 13:42:13如果参数为None:2023-05-03 13:42:13 command_part = NULL_TYPE 2023-05-03 13:42:13 elif isinstance(parameter,bool):2023-05-03 13:42:13 command_part = BOOLEAN_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,Decimal):2023-05-03 13:42:13 command_part = DECIMAL_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,int)and parameter〈= JAVA_MAX_INT
2023-05-03 13:42:13 and parameter〉= JAVA_MIN_INT:2023-05-03 13:42:13 command_part = INTEGER_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,long)or isinstance(parameter,int):2023-05-03 13:42:13 command_part = LONG_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,float):2023-05-03 13:42:13 command_part = DOUBLE_TYPE + encode_float(parameter)2023-05-03 13:42:13 elif isbytearray(parameter):2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)2023-05-03 13:42:13 elif ispython3bytestr(parameter):2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)2023-05-03 13:42:13 elif isinstance(parameter,basestring):2023-05-03 13:42:13 command_part = STRING_TYPE + escape_new_line(parameter)2023-05-03 13:42:13 elif is_python_proxy(parameter):2023-05-03 13:42:13 command_part = PYTHON_PROXY_TYPE + python_proxy_pool.put(parameter)2023-05-03 13:42:13用于参数中的接口。Java。实现:2023-05-03 13:42:13 command_part +=“;“+ interface 2023-05-03 13:42:13 else:2023-05-03 13:42:13〉command_part = REFERENCE_TYPE + parameter._get_object_id()2023-05-03 13:42:13 E AttributeError:'PosixPath'对象没有属性'_get_object_id' 2023-05-03 13:42:13 2023-05-03 13:42:13 /usr/local/lib/python3.10/dist-packages/py 4j/protocol.py:298:AttributeError 2023-05-03 13:42:13 -----------------------------捕获的stdout调用----------------------------------

#EDIT

After some testing seems like there is some issue in the configuration for spark to use the temp folders, switching to pandas to read the file in the folder worked
qcbq4gxm

qcbq4gxm1#

以下是你做错的地方:

  • 临时文件夹的寿命很短,在使用它的进程存在后会被清理,您需要使用pytests setup/teardown或yield保持mktemp上下文打开
  • fixture应该在test参数中声明为依赖项。
  • 测试之间不应该相互依赖,测试应该是无状态的,或者通过使用fixture来创建所需的状态。在您的例子中,您应该有一个fixture来创建test_read试图访问的文件,并且不依赖于另一个测试的结果。

使用tmp_path夹具的yield

@pytest.fixture(scope="session")
def cache_dir(tmp_path_factory):
   #creates a context
   with tmp_path_factory.mktemp("files") as f
       #yields the file path, holds the context open
       yield f / "test.parquet"

def test_save(cache_dir):
    url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet'
    print(cache_dir)
    assert save(url,cache_dir), "Error saving file"

相关问题