我使用python以JSON格式将BQ表导出到GCS。导出是成功的,但是,当我从GCS下载JSON文件时,我注意到特殊的字符发生了变化。比如说
Shirt & Trouser Presses
在BQ中已更改为
Shirt \u0026 Trouser Presses
昏迷不醒
是否有方法确保在以JSON格式从BQ导出到GCS时编码不会更改?
下面是我使用的代码片段:
dataset_ref = bigquery.DatasetReference(BQ_PROJECT, dataset_id)
client = bigquery.Client(project=BQ_PROJECT)
tables = client.list_tables(dataset_id)
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
for table in tables:
if table.table_type == "TABLE":
table_id = table.table_id
destination_blob = table_id
table_ref = dataset_ref.table(table_id)
destination_uri = "gs://{}/{}".format(BUCKET, destination_blob)
extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=job_config,
# Location must match that of the source table.
location="EU",
) # API request
extract_job.result() # Waits for job to complete.
1条答案
按热度按时间piok6c0g1#
在@johnHanley的帮助下,我发现当我使用pandas从GCS读取数据时,我会得到正确的编码。因此,
"Shirt \u0026 Trouser Presses"
将使用pandas读取为"Shirt & Trouser Presses"
。这样问题就解决了