问题描述
我使用的是 postgresql 8.3.23
我有一个表,其中 ip 表示为 3 个不同的列。
在不使用多个连接和多个查询的情况下,按 IP 聚合并按列获取总和的最佳方法是什么?
表格
来源 | trans | dest |
---|---|---|
ip1 | ip2 | |
ip1 | ip1 | ip3 |
ip1 | ip2 | ip3 |
ip2 | ip4 | ip5 |
我想得到什么
ip | 来源 | trans | dest |
---|---|---|---|
ip1 | 3 | 1 | 0 |
ip2 | 1 | 1 | 1 |
ip3 | 0 | 0 | 2 |
ip4 | 0 | 1 | 0 |
ip5 | 0 | 0 | 1 |
解决方法
您可以取消透视和重新聚合:
select ip,sum(source),sum(trans),sum(dest)
from ((select source as ip,1 as source,0 as trans,0 as dest
from t
) union all
(select trans as ip,0 as source,1 as trans,0 as dest
from t
) union all
(select dest as ip,1 as dest
from t
)
) t
group by ip;
这个逻辑肯定有不同的表达方式。但谁还记得 Postgres 8.3 支持什么?
,使用 Gordon 的技巧,我以下面的查询结束。由于它用于汇总每天摄取的 55 亿个事件,其中某些 IP 过度表示,因此在子查询中获取总和的效率更高。
select ip,sum(source_count) as source_count,sum(trans_count) as trans_count,sum(dest_count) as dest_count
from ((select source as ip,count(*) as source_count,0 as trans_count,0 as dest_count
from t where source is not null
group by 1,3,4
) union all
(select trans as ip,0 as source_count,count(*) as trans_count,0 as dest_count
from t where trans is not null
group by 1,2,4
) union all
(select dest as ip,count(*) as dest_count
from t where dest is not null
group by 1,3
)
) t
group by ip
order by ip;