错误为:AttributeError:“function”对象没有属性“_get_object_id”
test_functions.py中的相关代码是:
import urllib.request as urllib
import os
import pandas as pd
import pyspark.sql.functions as psf
from pyspark.sql import SparkSession
from src.functions import save, write, query
import pytest
global spark
spark = SparkSession.builder \
.master("local") \
.appName("load_parquet") \
.config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
.getOrCreate()
@pytest.fixture(scope="session")
def cache_dir(tmp_path_factory):
return tmp_path_factory.mktemp("files") / "test.parquet"
def test_save(url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet',filename=cache_dir):
print(cache_dir)
assert save(url,filename), "Error saving file"
def test_write(write_method='psycopg2'):
df = spark.read.parquet(cache_dir)
if 'filename' not in df.columns:
df = df.withColumn('filename',psf.lit('20-03'))
.....................
我做错了什么?我花了很多时间想让这件事成功
我已经尝试了几个谷歌的页面和我的不同方法,使这一工作。没有。我希望在第一个上传递Assert,如果第二个可以读取文件,它也会通过。
代码以
#%% Run tests
if __name__ == "__main__":
print(cache_dir)
test_save()
test_write()
test_query()
test_percentile()
print("Everything passed")
# %%
在Magus回复测试保存不工作后,只有test_write有问题:
def test_write(cache_dir):
write_method='psycopg2'
url='https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet'
save(url,cache_dir)
df = spark.read.parquet(cache_dir)
if 'filename' not in df.columns:
df = df.withColumn('filename',psf.lit('20-03'))
2023-05-03 13:四十二:13 _ _ _ ________________________________ _ _ _ 2023-05-03 13:四十二:13 2023-05 - 03 13:四十二:13参数= PosixPath('/tmp/pytest-of-root/pytest-0/files 0/test. 2023-05-03 13:四十二:13 python_proxy_pool =<py4j.java_gateway.PythonProxyPool object at 0x7ff238c2b580>2023-05 - 03 13:四十二:13 2023 - 05 - 03 13:四十二:13 def get_command_part(parameter,python_proxy_pool=None):2023-05-03 13:42:13““将Python对象转换为符合2023-05-03 13:42:13 Py 4J协议的字符串表示。2023-05-03 13:42:13
2023-05-03 13:42:13例如,整数1
转换为u"i1"
2023-05-03 13:42:13
2023-05-03 13:42:13:param参数:要转换的对象2023-05-03 13:42:13:rtype:表示命令部分的字符串2023-05-03 13:42:13“””2023-05-03 13:42:13 command_part =“”2023-05-03 13:42:13
2023-05-03 13:42:13如果参数为None:2023-05-03 13:42:13 command_part = NULL_TYPE 2023-05-03 13:42:13 elif isinstance(parameter,bool):2023-05-03 13:42:13 command_part = BOOLEAN_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,Decimal):2023-05-03 13:42:13 command_part = DECIMAL_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,int)and parameter〈= JAVA_MAX_INT
2023-05-03 13:42:13 and parameter〉= JAVA_MIN_INT:2023-05-03 13:42:13 command_part = INTEGER_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,long)or isinstance(parameter,int):2023-05-03 13:42:13 command_part = LONG_TYPE + smart_decode(parameter)2023-05-03 13:42:13 elif isinstance(parameter,float):2023-05-03 13:42:13 command_part = DOUBLE_TYPE + encode_float(parameter)2023-05-03 13:42:13 elif isbytearray(parameter):2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)2023-05-03 13:42:13 elif ispython3bytestr(parameter):2023-05-03 13:42:13 command_part = BYTES_TYPE + encode_bytearray(parameter)2023-05-03 13:42:13 elif isinstance(parameter,basestring):2023-05-03 13:42:13 command_part = STRING_TYPE + escape_new_line(parameter)2023-05-03 13:42:13 elif is_python_proxy(parameter):2023-05-03 13:42:13 command_part = PYTHON_PROXY_TYPE + python_proxy_pool.put(parameter)2023-05-03 13:42:13用于参数中的接口。Java。实现:2023-05-03 13:42:13 command_part +=“;“+ interface 2023-05-03 13:42:13 else:2023-05-03 13:42:13〉command_part = REFERENCE_TYPE + parameter._get_object_id()2023-05-03 13:42:13 E AttributeError:'PosixPath'对象没有属性'_get_object_id' 2023-05-03 13:42:13 2023-05-03 13:42:13 /usr/local/lib/python3.10/dist-packages/py 4j/protocol.py:298:AttributeError 2023-05-03 13:42:13 -----------------------------捕获的stdout调用----------------------------------
#EDIT
After some testing seems like there is some issue in the configuration for spark to use the temp folders, switching to pandas to read the file in the folder worked
1条答案
按热度按时间qcbq4gxm1#
以下是你做错的地方:
使用tmp_path夹具的yield