pyspark 如何在增量表中追加数据?

b5lpy0ml  于 2022-12-11  发布在  Spark
关注(0)|答案(1)|浏览(208)

以下是我目前为止尝试过的代码

import json
import csv

with open('/dbfs/mnt/costtransparency/HealthPlans/UHC/2022-11/2022-11-01_United-HealthCare-Services--Inc-_Third-Party-Administrator_Winstead_CSP-903-C746_in-network-rates.json.gz') as f:
    d=json.load(f)
employee_data = d['in_network']#d['provider_references']

# now we will open a file for writing
data_file = open('/dbfs/FileStore/data_file.csv', 'w')

# create the csv writer object
csv_writer = csv.writer(data_file)

# Counter variable used for writing
# headers to the CSV file
count = 0

for emp in employee_data:
    if count == 0:

        # Writing headers of CSV file
        header = emp.keys()
        csv_writer.writerow(header)
        count += 1

    # Writing data of CSV file
    csv_writer.writerow(emp.values())

data_file.close()

这里的问题是,目前我存储csv文件到存储,但我想把它存储到增量表。
有人能帮我做这个吗

e0bqpujr

e0bqpujr1#

如何在增量表中追加数据?
如果要将数据追加到增量表中,请使用以下代码:

  • 首先读取数据框:*
df = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/vm_name3.csv")
  • 然后,在特定表中使用write方法将数据追加到增量表中 *
permanent_table_name = "demo123" #Table name
df.write.mode("append").format("delta").saveAsTable(permanent_table_name)

或者

  • 如果要在特定位置保存增量表,请使用此代码 *
permanent_table_name = "dbfs:/user/hive/del"
df.write.mode("append").format("delta").save(permanent_table_name)

相关问题