import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
# Read the existing Parquet file
existing_df = pd.read_parquet('existing_file.parquet')
# Create a new DataFrame with new data (alternatively, read from another source)
new_data = {'column1': [value1, value2, ...],
'column2': [value1, value2, ...],
...}
new_df = pd.DataFrame(new_data)
# Concatenate the existing DataFrame with the new DataFrame
updated_df = pd.concat([existing_df, new_df], ignore_index=True)
# Write the updated DataFrame to the same Parquet file
table = pa.Table.from_pandas(updated_df)
pq.write_to_dataset(table, root_path='existing_file.parquet', compression='snappy', use_dictionary=True)
1条答案
按热度按时间yqhsw0fo1#
解决方案是读取数据,然后追加,然后写回文件。
示例代码假设使用pandas和数据适合内存,如果不适合,可以使用dask