ApacheFlink减少了许多值，而不是一个值

toe95027 于 2021-06-21 发布在 Flink

关注(0)|答案(2)|浏览(360)

我尝试在windowedstream上实现reduce，如下所示：

.keyBy(t -> t.key)
            .timeWindow(Time.of(15, MINUTES), Time.of(1, MINUTES))
            .reduce(new ReduceFunction<TwitterSentiments>() {
                @Override
                public TwitterSentiments reduce(TwitterSentiments t2, TwitterSentiments t1) throws Exception {
                    t2.positive += t1.positive;
                    t2.neutral += t1.neutral;
                    t2.negative += t1.negative;

                    return t2;
                }
            });

我遇到的问题是，当我调用stream.print（）时，我会得到许多值（看起来像每个twitter对象一个值，而不是一个聚合对象）。
我也尝试过使用这样的aggregationfunction，但有同样的问题：

.aggregate(new AggregateFunction<TwitterSentiments, Tuple3<Long, Long, Long>, Tuple3<Long, Long, Long>>() {
                @Override
                public Tuple3<Long, Long, Long> createAccumulator() {
                    return new Tuple3<Long, Long, Long>(0L,0L,0L);
                }

                @Override
                public Tuple3<Long, Long, Long> add(TwitterSentiments ts, Tuple3<Long, Long, Long> accumulator) {
                    return new Tuple3<Long, Long, Long>(
                            accumulator.f0 + ts.positive.longValue(),
                            accumulator.f1 + ts.neutral.longValue(),
                            accumulator.f2 + ts.negative.longValue()
                    );
                }

                @Override
                public Tuple3<Long, Long, Long> getResult(Tuple3<Long, Long, Long> accumulator) {
                    return accumulator;
                }

                @Override
                public Tuple3<Long, Long, Long> merge(Tuple3<Long, Long, Long> accumulator1, Tuple3<Long, Long, Long> accumulator2) {
                    return new Tuple3<Long, Long, Long>(
                            accumulator1.f0 + accumulator2.f0,
                            accumulator1.f1 + accumulator2.f1,
                            accumulator1.f2 + accumulator2.f1);
                }
            });

为什么stream.print（）在这些聚合之后仍然输出许多记录？

Java apache-flink flink-streaming apache bigdata

来源：https://stackoverflow.com/questions/53624098/apache-flink-reduce-results-in-many-values-instead-of-one

2条答案

按热度按时间

ijnw1ujt1#

看来我误解了使用钥匙的原因。就我而言，我不需要 KeyedStream ，因为我只希望每分钟有一个输出，它由所有的记录组成，减少到一个值。最后我用了一个 .timeWindowAll ，在 SingleOutputStreamOperator ，运行reduce现在可以正常工作。

赞(0）回复(0）举报 2021-06-21

gopyfrb32#

如果不需要每个键的结果，可以使用timewindowall生成单个结果。但是，timewindowall并不并行运行。如果要以更具可伸缩性的方式计算结果，可以执行以下操作：

.keyBy(t -> t.key)
    .timeWindow(<time specification>)
    .reduce(<reduce function>)
    .timeWindowAll(<same time specification>)
    .reduce(<same reduce function>)

您可能希望flink的运行时足够聪明，可以为您进行并行预聚合（前提是您使用的是reducefunction或aggregateffunction），但事实并非如此。

赞(0）回复(0）举报 2021-06-21

我来回答

ApacheFlink减少了许多值，而不是一个值

2条答案

相关问题

热门标签

最新问答