pyspark Python单元测试类型错误:“MagicMock”和“int”的示例之间不支持“>”

f1tvaqid  于 2022-12-22  发布在  Spark
关注(0)|答案(1)|浏览(135)

我尝试使用MockMagic()在python中模拟 Dataframe 创建,但看起来代码的某个部分由于从单元测试函数调用时MagicMock中不支持的比较而失败。
这是我的testcase.py网址

sys.modules["pyspark.sql"] = MagicMock()

def test_process_batch():
    df = (
        [
            (1, "foo"),
            (2, "bar"),
        ],
        ["id", "label"]
    )
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    spark_df = spark.createDataFrame(df)

    process_batch(spark_df, "123")
    assert True

在主程序中,process_batch()的代码包含一行,用于将 Dataframe 计数与如下数字进行比较。

def process_batch(data_frame, batchId):
    """Process streamed batch dataframe"""

    if (data_frame.count() > 0):
     ...

单元测试失败,出现以下错误。

[CPython38-test] =================================== FAILURES ===================================
[CPython38-test] ______________________________ test_process_batch ______________________________
[CPython38-test] 
[CPython38-test]     def test_process_batch():
[CPython38-test]         df = (
[CPython38-test]             [
[CPython38-test]                 (1, "foo"),
[CPython38-test]                 (2, "bar"),
[CPython38-test]             ],
[CPython38-test]             ["id", "label"]
[CPython38-test]         )
[CPython38-test]         from pyspark.sql import SparkSession
[CPython38-test]         spark = SparkSession.builder.getOrCreate()
[CPython38-test]         spark_df = spark.createDataFrame(df)
[CPython38-test]     
[CPython38-test] >       process_batch(spark_df, "123")
[CPython38-test] 
[CPython38-test] test/test_cia_optics_ingestion_glue_spark_streaming.py:54: 
[CPython38-test] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[CPython38-test] 
[CPython38-test] data_frame = <MagicMock name='mock.SparkSession.builder.getOrCreate().createDataFrame()' id='140188327287056'>
[CPython38-test] batchId = '123'
[CPython38-test] 
[CPython38-test]     def process_batch(data_frame, batchId):
[CPython38-test]         """Process streamed batch dataframe"""
[CPython38-test]     
[CPython38-test] >       if (data_frame.count() > 0):
[CPython38-test] E       TypeError: '>' not supported between instances of 'MagicMock' and 'int'
[CPython38-test]

你能指导我如何克服这种情况吗?

myss37ts

myss37ts1#

正如Samwise所指出的,您也可以模拟count()方法:spark_df.count.return_value = 10。完整的工作代码如下所示:

import sys
from unittest.mock import MagicMock

sys.modules["pyspark.sql"] = MagicMock()

def process_batch(data_frame, batchId):
    """Process streamed batch dataframe"""

    if data_frame.count() > 0:
        ...

def test_process_batch():
    df = ([(1, "foo"), (2, "bar"), ], ["id", "label"])
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    spark_df = spark.createDataFrame(df)
    spark_df.count.return_value = 10

    process_batch(spark_df, "123")
    assert True

相关问题