从ReplicatedAggregatingMergeTree materialized_view中选择countMerge

问题描述

最近,我开始使用Clickhouse,但遇到了一些麻烦。 我使用具有3个分片的群集,每个分片都有一个额外的复制,因此总共有6台服务器。

我在包含完整数据(称为tbl)的本地表上创建本地MV,并基于本地MV创建分布式MV。 包含完整数据的本地表,或者tbl使用ReplicatedMergeTree作为引擎。本地MV使用ReplicatedAggregatingMergeTree作为引擎。 另外,我使用POPULATE在测试数据库上重新创建了每个MV。这样,就不可能重复插入。

问题是, 当我从分布式MV中选择countMerge时,我得到了两倍的正确答案(即,如果正确答案是50,那么我得到100。) 而从分布式MV中选择uniqExactMerge会给出正确的结果。

这是我的剧本:

本地MV脚本:

CREATE DATABASE IF NOT EXISTS test ON CLUSTER cc_cluster;
CREATE MATERIALIZED VIEW IF NOT EXISTS
test.user_event_stat_scene_mv_local_test_v2 ON CLUSTER cc_cluster
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{layer}-{shard}/test.user_event_stat_scene_mv_local_test_v2','{replica}')
PARTITION BY (dt)
ORDER BY (dt,scene)
POPULATE
AS select
    countState(1) as expos,countState(if(click>0,1,null)) as clicks,sumState(if(click=1,watch,0)) as dr,countState(if(click=1 and tbl.cost_cnt>0,null)) as cost_cnt,tbl.cost_total,0)) as cost_total,countState(if(click=1 and tbl.chat_cnt>0,null)) as chat_cnt,uniqExactState(recom_token) as item_expos,uniqExactState(if(click=1,recom_token,null)) as item_clicks,uniqExactState(uid) as user_expos,uid,null)) as user_clicks,uniqExactState(if(click=1 and tbl.cost_cnt>0,null)) as user_costs,uniqExactState(if(click=1 and tbl.chat_cnt>0,null)) as user_chats,scene,toFixedString(dt,8) as dt
FROM recom_stats_dws.user_event_log_day_local as tbl
GROUP BY dt,scene;

分布式MV脚本如下:

CREATE TABLE IF NOT EXISTS
test.user_event_stat_scene_mv_all_test_v2
ON CLUSTER cc_cluster
AS test.user_event_stat_scene_mv_local_test_v2
ENGINE = distributed(cc_cluster,test,user_event_stat_scene_mv_local_test_v2,rand());

查询脚本是:

select
    countMerge(expos) as expos,countMerge(clicks) as clicks,sumMerge(dr) as dr,countMerge(cost_cnt) as cost_cnt,sumMerge(cost_total) as cost_total,countMerge(chat_cnt) as chat_cnt,uniqExactMerge(item_expos) as item_expos,uniqExactMerge(item_clicks) as item_clicks,uniqExactMerge(user_expos) as user_expos,uniqExactMerge(user_clicks) as user_clicks,uniqExactMerge(user_costs) as user_costs,uniqExactMerge(user_chats) as user_chats,dt
FROM test.user_event_stat_scene_mv_all_test_v2 as tbl
GROUP BY dt,scene
order by dt,scene;

PS:

  • 本地MV = test.user_event_stat_scene_mv_local_test_v2
  • 分布式MV = test.user_event_stat_scene_mv_all_test_v2
  • 包含完整数据的本地表= recom_stats_dws.user_event_log_day_local

我可能在哪个部分做错了? 希望得到您的帮助XD

解决方法

https://github.com/ClickHouse/ClickHouse/issues/16208

您已执行两次PUPULATE。在每个副本上。数据翻了一番。

您只需要在一个副本上运行POPULATE。