如何等待有限流批量结果

5jdjgkvh  于 2021-06-04  发布在  Kafka
关注(0)|答案(1)|浏览(313)

我有一个用springcloudstreams和kafka streams构建的流处理应用程序,这个系统从一个应用程序中获取日志,并将它们与另一个流处理器的观察结果进行比较,生成一个分数,然后将日志流除以分数(高于或低于某个阈值)。
拓扑结构:

问题是:
因此,我的问题是如何正确地实现“日志最佳观测选择器处理器”,在处理日志的那一刻,观测值是有限的,但是可能有很多。
所以我想出了两个解决办法。。。
组和窗口日志按日志id对观察主题进行评分,然后减少以获得最高分数(问题:为所有观察结果打分可能比窗口时间长)
每次评分后发出评分完成消息,加入日志相关观察,使用日志评分观察全局表和交互式查询检查每个观察id是否在全局表存储中,当所有id都在存储中时,Map到得分最高的观察(问题:全局表仅用于交互式查询时似乎不起作用)
要达到我的目标,最好的方法是什么?
我希望不会造成任何分区、磁盘或内存瓶颈。
当值从log&observation连接起来时,任何东西都有唯一的id和相关id的元组。
(编辑:带有图表和更改标题的拓扑的切换文本描述)

ycl3bljg

ycl3bljg1#

解决方案#2似乎可以工作,但它发出了警告,因为交互式查询需要一些时间才能准备好-所以我用一个转换器实现了相同的解决方案:

@Slf4j
@Configuration
@RequiredArgsConstructor
@SuppressWarnings("unchecked")
public class LogBestObservationsSelectorProcessorConfig {
    private String logScoredObservationsStore = "log-scored-observations-store";

    private final Serde<LogEntryRelevantObservationIdTuple> logEntryRelevantObservationIdTupleSerde;
    private final Serde<LogRelevantObservationIdsTuple> logRelevantObservationIdsTupleSerde;
    private final Serde<LogEntryObservationMatchTuple> logEntryObservationMatchTupleSerde;
    private final Serde<LogEntryObservationMatchIdsRelevantObservationsTuple> logEntryObservationMatchIdsRelevantObservationsTupleSerde;

    @Bean
    public Function<
            GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>,
                Function<
                    KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple>,
                    Function<
                            KTable<String, LogRelevantObservationIds>,
                            KStream<String, LogEntryObservationMatchTuple>
                    >
                >
            >
    logBestObservationSelectorProcessor() {
        return (GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> logScoredObservationsTable) ->
                (KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple> logScoredObservationProcessedStream) ->
                        (KTable<String, LogRelevantObservationIdsTuple> logRelevantObservationIdsTable) -> {
            return logScoredObservationProcessedStream
                    .selectKey((k, v) -> k.getLogId())
                    .leftJoin(
                            logRelevantObservationIdsTable,
                            LogEntryObservationMatchIdsRelevantObservationsTuple::new,
                            Joined.with(
                                    Serdes.String(),
                                    logEntryRelevantObservationIdTupleSerde,
                                    logRelevantObservationIdsTupleSerde
                            )
                    )
                    .transform(() -> new LogEntryObservationMatchTransformer(logScoredObservationsStore))
                    .groupByKey(
                            Grouped.with(
                                Serdes.String(),
                                logEntryObservationMatchTupleSerde
                            )
                    )
                    .reduce(
                            (match1, match2) -> Double.compare(match1.getScore(), match2.getScore()) != -1 ? match1 : match2,
                            Materialized.with(
                                    Serdes.String(),
                                    logEntryObservationMatchTupleSerde
                            )
                    )
                    .toStream()
                    ;
        };
    }

    @RequiredArgsConstructor
    private static class LogEntryObservationMatchTransformer implements Transformer<String, LogEntryObservationMatchIdsRelevantObservationsTuple, KeyValue<String, LogEntryObservationMatchTuple>> {
        private final String stateStoreName;
        private ProcessorContext context;
        private TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> kvStore;

        @Override
        public void init(ProcessorContext context) {
            this.context = context;
            this.kvStore = (TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>) context.getStateStore(stateStoreName);
        }

        @Override
        public KeyValue<String, LogEntryObservationMatchTuple> transform(String logId, LogEntryObservationMatchIdsRelevantObservationsTuple value) {
            val observationIds = value.getLogEntryRelevantObservationsTuple().getRelevantObservations().getObservationIds();
            val allObservationsProcessed = observationIds.stream()
                    .allMatch((observationId) -> {
                        val key = LogEntryRelevantObservationIdTuple.newBuilder()
                                .setLogId(logId)
                                .setRelevantObservationId(observationId)
                                .build();
                        return kvStore.get(key) != null;
                    });
            if (!allObservationsProcessed) {
                return null;
            }

            val observationId = value.getLogEntryRelevantObservationIdTuple().getObservationId();
            val key = LogEntryRelevantObservationIdTuple.newBuilder()
                    .setLogId(logId)
                    .setRelevantObservationId(observationId)
                    .build();
            ValueAndTimestamp<LogEntryObservationMatchTuple> observationMatchValueAndTimestamp = kvStore.get(key);
            return new KeyValue<>(logId, observationMatchValueAndTimestamp.value());
        }

        @Override
        public void close() {

        }
    }
}

相关问题