使用dataframe.schema与dataframe.printschema()比较pyspark模式

zkure5ic 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(555)

我在尝试比较两个pysparkDataframe的模式时遇到了一个问题。
如果我使用 df1.schema == df2.schema 它有时会回来 True 但有时会回来 False （我确信模式是匹配的）
但是，当我使用 df1.printSchema() == df2.printSchema() ，输出总是 True .
我知道df.schema的数据类型是pyspark.sql.types.structtype，但是为什么它有时会给出错误的比较结果呢？是Pypark里的虫子吗？

apache-spark pyspark apache-spark-sql pyspark-dataframes types

来源：https://stackoverflow.com/questions/63727599/comparing-pyspark-schema-using-dataframe-schema-vs-dataframe-printschema

1条答案

按热度按时间

63lcw9qa1#

如果您正在使用 pyspark 得到 dtypes 返回 List[(column_name, data_type)] 比较如下：

for idx1,el1 in enumerate(df1.dtypes):
    for idx2,el2 in enumerate(df2.dtypes):
        if idx1 == idx2:
            if el1[0] == el2[0] and el1[1] == el2[1]:
                continue
            else:
                raise ValueError("Schema Don't Match for Col {0} and {1}".format(el1[0],el2[0]))

赞(0）回复(0）举报 2021-05-27

我来回答

使用dataframe.schema与dataframe.printschema()比较pyspark模式

1条答案

相关问题

热门标签

最新问答