如何将BQ表以JSON格式导出到GCS而不更改编码

tvokkenx 于 2023-07-01 发布在其他

关注(0)|答案(1)|浏览(140)

我使用python以JSON格式将BQ表导出到GCS。导出是成功的，但是，当我从GCS下载JSON文件时，我注意到特殊的字符发生了变化。比如说

Shirt & Trouser Presses

在BQ中已更改为

Shirt \u0026 Trouser Presses

昏迷不醒
是否有方法确保在以JSON格式从BQ导出到GCS时编码不会更改？
下面是我使用的代码片段：

dataset_ref = bigquery.DatasetReference(BQ_PROJECT, dataset_id)
        client = bigquery.Client(project=BQ_PROJECT)
        tables = client.list_tables(dataset_id)
        job_config = bigquery.job.ExtractJobConfig()
        job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
        for table in tables:
            if table.table_type == "TABLE":
                table_id = table.table_id
                destination_blob = table_id
                table_ref = dataset_ref.table(table_id)
                destination_uri = "gs://{}/{}".format(BUCKET, destination_blob)
    
                extract_job = client.extract_table(
                    table_ref,
                    destination_uri,
                    job_config=job_config,
                    # Location must match that of the source table.
                    location="EU",
                )  # API request
                extract_job.result()  # Waits for job to complete.

JSON

来源：https://stackoverflow.com/questions/76551022/how-to-export-bq-tables-to-gcs-in-json-format-without-change-in-encoding