mysql在客户端出现警告,但来自spark的jdbc连接出现错误

lmvvr0a8  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(447)

我正在编写一个spark代码,从一个dataframe插入到一个mysql表中,我得到了一个错误

df_to_write.write.format("jdbc").option("url", jdbc_url).option("driver", "com.mysql.jdbc.Driver")\
          .option("dbtable", tbl_name).option("user", DBUser).option("password", DBPassword)\
          .option("numPartitions",32).mode('append').save()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 18.0 failed 4 times, most recent failure: Lost task 1.3 in stage 18.0 (TID 170, 10.151.244.77, executor 0): java.sql.BatchUpdateException: Data truncation: Incorrect datetime value: '1970-01-01 00:00:00' for column

我发现这个是因为列的值是1970-01-01 00:00:00
但是,如果我在mysql workbench中运行insert语句,它将发出警告,并且值将作为0000插入
示例如下:

create table test_time (processing_ts timestamp NULL DEFAULT NULL);

insert into test_time values ('1990-01-01T00:00:00.000+0000'),('1990-01-01T00:00:00.000+0000'),('1970-01-01T00:00:00.000+0000');

表的输出值如下所示

1990-01-01 00:00:00
1990-01-01 00:00:00
0000-00-00 00:00:00     --> no error only warning and changed the value to 0000:00:00 00:00:00

想知道我可以对spark应用什么设置来获得相同的行为,即应该没有错误,并且值应该设置为0000-00-00:00:00。写入表时,是否仍要在spark中应用insert ignore选项。
根据注解,粘贴模拟代码
sql语句

mysql> create table test (processing_ts timestamp null);
Query OK, 0 rows affected (0.05 sec)

mysql> select * from test;
+---------------------+
| processing_ts       |
+---------------------+
| 1997-02-28 10:30:00 |
+---------------------+
1 row in set (0.00 sec)
df = spark.createDataFrame([('1970-01-01 00:00:00',)], ['processing_ts'])
df2 =  df.select(f.to_timestamp(df.processing_ts, 'yyyy-MM-dd HH:mm:ss').alias('processing_ts'))

db_host = '127.0.0.1'
DBName = 'test'
jdbc_url = "jdbc:mysql://{}/{}".format(db_host ,DBName)
DBUser = 'XXXXXX'
DBPassword = 'XXXXXXX123'
tbl_name = 'test'

df2.write.format("jdbc").option("url", jdbc_url).option("driver", "com.mysql.jdbc.Driver")\
          .option("dbtable", tbl_name).option("user", DBUser).option("password", DBPassword)\
          .option("numPartitions",32).mode('append').save()
8wigbo56

8wigbo561#

时间戳值的范围是 1970-01-01 00:00:01.0000002038-01-19 03:14:07.999999 mysql文档。
jdbc驱动程序可能无法处理相同的操作,因此它正在中止插入作业。这里你唯一能做的就是替换时间戳 1970-01-01 00:00:00 因为支持的范围是 1000-01-01 00:00:009999-12-31 23:59:59 .

相关问题