Apache Spark JAR的签名者信息与另一个类冲突

gmol1639  于 2022-11-16  发布在  Apache
关注(0)|答案(1)|浏览(155)

我尝试将两个jar加载到我的AWS Glue/Spark读取方法中,但出现错误:

An error occurred while calling o142.save.
: java.lang.SecurityException: class "com.microsoft.sqlserver.jdbc.ISQLServerBulkData"'s signer information does not match signer information of other classes in the same package
    at java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
    at java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:754)
    at java.security.SecureClas...

我的代码如下,我尝试了多个glue_dynamicFrame write方法,但是批量插入到SQL服务器中不起作用。根据MS的说法,这些驱动程序应该可以。
任何关于修复它的建议都是非常受欢迎的!

def write_df_to_target(self, df, schema_table):
    spark = self.gc.spark_session
    spark.builder.config('spark.jars.packages', 'com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8,com.microsoft.azure:spark-mssql-connector_2.12:1.1.0').getOrCreate()
    credentials = self.get_credentials(self.replica_connection_name)

    df.write \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", credentials["url"] + ";databaseName=" + self.database_name) \
        .option("dbtable", schema_table) \
        .option("user", credentials["user"]) \
        .option("password", credentials["password"]) \
        .option("batchsize","50000") \
        .option("numPartitions","150") \
        .option("bulkCopyTableLock","true") \
        .save()
waxmsbnn

waxmsbnn1#

使用com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8是一回事,但您还需要MS' Spark SQL Connector的正确版本
com.microsoft.azure:spark-mssql-connector_2.12_3.0:1.0.0-alphacom.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8不适用于我的情况,因为我使用的是AWS Glue 3.0(即Spark 3.1)
我不得不切换到com.microsoft.azure:spark-mssql-connector_2.12:1.2.0,因为它的Spark3.1兼容。

def write_df_to_target(self, df, schema_table):
    spark = self.gc.spark_session
    spark.builder.config('spark.jars.packages', 'com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8,com.microsoft.azure:spark-mssql-connector_2.12:1.2.0').getOrCreate()
    credentials = self.get_credentials(self.replica_connection_name)

    df.write \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", credentials["url"] + ";databaseName=" + self.database_name) \
        .option("dbtable", schema_table) \
        .option("user", credentials["user"]) \
        .option("password", credentials["password"]) \
        .option("batchsize","100000") \
        .option("numPartitions","15") \
        .save()

相关问题