如何在 Clickhouse 中高效地进行小型查询

问题描述

在我们的部署中，有一千个分片。插入是通过带有分片 jumpConsistentHash(colX,1000) 的分布式表完成的。当我使用 colX=... 查询行并打开 send_logs_level='trace' 时，我看到查询被发送到所有分片并在每个分片上执行。这限制了我们的 QPS（每秒查询数）。检查 Clickhouse document，它指出：

SELECT queries are sent to all the shards and work regardless of how data is distributed across the shards (they can be distributed completely randomly). 
When you add a new shard,you don’t have to transfer the old data to it. 
You can write new data with a heavier weight – the data will be distributed slightly unevenly,but queries will work correctly and efficiently.

You should be concerned about the sharding scheme in the following cases:

* Queries are used that require joining data (IN or JOIN) by a specific key. If data is sharded by this key,you can use local IN or JOIN instead of GLOBAL IN or GLOBAL JOIN,which is much more efficient.
* A large number of servers is used (hundreds or more) with a large number of small queries (queries of individual clients - websites,advertisers,or partners). 
In order for the small queries to not affect the entire cluster,it makes sense to locate data for a single client on a single shard. 
Alternatively,as we’ve done in Yandex.Metrica,you can set up bi-level sharding: divide the entire cluster into “layers”,where a layer may consist of multiple shards. 
Data for a single client is located on a single layer,but shards can be added to a layer as necessary,and data is randomly distributed within them. 
distributed tables are created for each layer,and a single shared distributed table is created for global queries.

对于像我们这样的小查询（上面的第二个项目），似乎有一个解决方案，但我不清楚这一点。这是否意味着在使用谓词 colX=... 查询特定查询时，我需要找到包含其行的相应“层”，然后在该层的相应分布式表上进行查询？

有没有办法在全局分布式表上查询这些小查询？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

clickhouse