Flink -keyBy()函数是否在内部对flink操作符进行分区?

1l5u6lss  于 2023-09-28  发布在  Apache
关注(0)|答案(1)|浏览(103)

在下面的代码中,摘自文档:

package spendreport;

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.walkthrough.common.sink.AlertSink;
import org.apache.flink.walkthrough.common.entity.Alert;
import org.apache.flink.walkthrough.common.entity.Transaction;
import org.apache.flink.walkthrough.common.source.TransactionSource;

public class FraudDetectionJob {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<Transaction> transactions = env
            .addSource(new TransactionSource())
            .name("transactions");
        
        DataStream<Alert> alerts = transactions
            .keyBy(Transaction::getAccountId)
            .process(new FraudDetector())
            .name("fraud-detector");

        alerts
            .addSink(new AlertSink())
            .name("send-alerts");

        env.execute("Fraud Detection");
    }
}

应用于数据流transactions.keyBy(Transaction::getAccountId)函数是否对传入数据进行分区(基于帐户ID),并在每个分区上执行process(new FraudDetector()),如下图所示?
2)如果是,如何写一个Map器来处理从每个分区流的数据?使用.keyBy(Transaction::getAccountId)

n1bvdmb6

n1bvdmb61#

keyBy被应用于数据流transactions
在应用keyBy之后,来自transactions的记录与相同的account ID将在同一个分区中,并且您可以应用来自KeyedStream的函数,如process(不推荐,因为它被标记为已弃用),windowreducemin/max/sum等。
有很多使用keyBy的例子,例如:此链接

相关问题