问题描述
下面的最小工作示例快速生成事件,然后更新 IMap。 IMap 反过来从它的日志中产生更新事件。
public class FastIMapExample {
private static final int NUMBER_OF_GROUPS = 10;
private static final int NUMBER_OF_EVENTS = 1000;
public static void main(String[] args) {
JetInstance jet = Jet.newJetInstance();
IMap<Long,Long> groups = jet.getMap("groups");
Pipeline p1 = Pipeline.create();
p1.readFrom(fastStreamOfLongs(NUMBER_OF_EVENTS))
.withoutTimestamps()
.writeTo(Sinks.mapWithUpdating(groups,event -> event % NUMBER_OF_GROUPS,(oldState,event) -> increment(oldState)
));
Pipeline p2 = Pipeline.create();
p2.readFrom(Sources.mapJournal(groups,START_FROM_OLDEST))
.withIngestionTimestamps()
.map(x -> x.getKey() + " -> " + x.getValue())
.writeTo(Sinks.logger());
jet.newJob(p2);
jet.newJob(p1).join();
}
private static StreamSource<Long> fastStreamOfLongs(int numberOfEvents) {
return SourceBuilder
.stream("fast-longs",ctx -> new AtomicLong(0))
.<Long>fillBufferFn((num,buf) -> {
long val = num.getAndIncrement();
if (val < numberOfEvents) buf.add(val);
})
.build();
}
private static long increment(Long x) {
return x == null ? 1 : x + 1;
}
}
示例输出:
3 -> 7
3 -> 50
3 -> 79
7 -> 42
...
6 -> 100
0 -> 82
9 -> 41
9 -> 100
我原以为每次更新都能准确地看到 1000 个事件。相反,我看到大约 50-80 个事件。 (似乎输出包含来自每个组的所有最新更新(即 "-> 100"
),但除此之外它是一个随机子集。)
当 NUMBER_OF_GROUPS
等于 NUMBER_OF_EVENTS
时(或当事件生成被人为减慢)时,我会收到所有 1000 个更新。
这种行为是预期的吗?是否可以从快速源接收所有更新事件?
解决方法
Sinks.mapWithUpdating
使用批处理,因此一些更新在发送实际更新条目处理器之前在本地应用。您需要使用 Sinks.mapWithEntryProcessor
为每个项目发送更新条目处理器。
来自 Sinks.mapWithEntryProcessor
的 JavaDoc:
* As opposed to {@link #mapWithUpdating} and {@link #mapWithMerging},* this sink does not use batching and submits a separate entry processor
* for each received item. For use cases that are efficiently solvable
* using those sinks,this one will perform worse. It should be used only
* when they are not applicable.
请记住,事件日志的默认容量是 10K,如果您使用默认分区计数,则每个分区有 36 个,这不足以一次存储所有更新。对于您的情况,如果您使用默认分区计数,则需要将容量设置为 271K 或更高以存储所有更新。