如何修复上传csv到bigquery时出现的错误?

cwdobuhd  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(163)

我在使用Python将CSV上传到bigquery时收到以下错误:
请求错误:400阅读数据时出错,错误信息:无法将“80:00:00”解析为从位置11602908开始的字段global_time_for_first_response_goal(位置36)的时间,并显示消息“无效的时间字符串“80:00:00”“文件:gs://我的bucket/我的机票/2023年2月13日09:58:11:865588.csv

def upload_csv_bigquery_dataset():
    # logging.info(">>> Uploading CSV to Big Query")
    client      = bigquery.Client()
    table_id    = "myproject-dev.tickets.ticket"
    job_config  = bigquery.LoadJobConfig(
        write_disposition     = bigquery.WriteDisposition.WRITE_TRUNCATE,
        source_format         = bigquery.SourceFormat.CSV,
        schema                = [bigquery.table_schema],
        skip_leading_rows     = 1,
        autodetect            = True,
        allow_quoted_newlines = True
    )
    uri = "gs://mybucket/mytickets/2023-02-1309:58:11:865588.csv"
    load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
    )  # Make an API request.

    load_job.result()  # Waits for the job to complete.

    destination_table = client.get_table(table_id)
    print(">>> Loaded {} rows.".format(destination_table.num_rows))

有没有人能告诉我一个解决方案或变通方案,请?停留在这个。

bfrts1fy

bfrts1fy1#

如果您认为这些行无效,可以考虑将allow_jagged_rowsignore_unknown_values选项设置为True,它将忽略具有无效值的行...

job_config = bigquery.LoadJobConfig(
    write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
    source_format=bigquery.SourceFormat.CSV,
    schema=[bigquery.table_schema],
    skip_leading_rows=1,
    autodetect=True,
    allow_quoted_newlines=True,
    allow_jagged_rows=True,
    ignore_unknown_values=True,
)

文件here

相关问题