I have a usecase where I need to seed a Flink Application(both RocksDB state and Broadcast State) using Bounded S3 sources and then read other unbounded/bounded S3 sources after the seeding is complete.
I was trying to achieve this in 2 steps:
- Seeding: Trigger a Flink job with only the seeding data bounded source and take a savepoint after the job finishes.
- Regular Processing: Restore from seeded savepoint on a new Flink graph to process other unbounded/bounded S3 sources.
Questions: - For Step 1: Does Flink support taking savepoints automatically after Job Finishes in Streaming Mode.
- If only manual savepoint trigger is supported, what can be used a done signal that all the seeding data is processed completely and all the task are finished processing?
Any other approaches to achieve the seeding usecase is appreciated as well. Note: Approaches where we buffer the regular data until seeding data is processed is not feasible for my usecase
Thanks
1条答案
按热度按时间2wnc66cl1#
1.使用unbounded source,您可以使用externalized checkpoint,并且您将能够从检查点启动/恢复作业。启用此功能时,必须有一个进程在作业取消时清理检查点,否则Flink不会删除检查点。
1.您可以使用Flink 1.15中提供的新特性(已完成任务的检查点)来完成此操作。