如何使用postgresql按多列中表示的值进行聚合

问题描述

我使用的是 postgresql 8.3.23
我有一个表,其中 ip 表示为 3 个不同的列。 在不使用多个连接和多个查询的情况下,按 IP 聚合并按列获取总和的最佳方法是什么?

表格

来源 trans dest
ip1 ip2
ip1 ip1 ip3
ip1 ip2 ip3
ip2 ip4 ip5

我想得到什么

ip 来源 trans dest
ip1 3 1 0
ip2 1 1 1
ip3 0 0 2
ip4 0 1 0
ip5 0 0 1

解决方法

您可以取消透视和重新聚合:

select ip,sum(source),sum(trans),sum(dest)
from ((select source as ip,1 as source,0 as trans,0 as dest
       from t
      ) union all
      (select trans as ip,0 as source,1 as trans,0 as dest
       from t
      ) union all
      (select dest as ip,1 as dest
       from t
      )
     ) t
group by ip;

这个逻辑肯定有不同的表达方式。但谁还记得 Postgres 8.3 支持什么?

,

使用 Gordon 的技巧,我以下面的查询结束。由于它用于汇总每天摄取的 55 亿个事件,其中某些 IP 过度表示,因此在子查询中获取总和的效率更高。

select ip,sum(source_count) as source_count,sum(trans_count) as trans_count,sum(dest_count) as dest_count
from ((select source as ip,count(*) as source_count,0 as trans_count,0 as dest_count
       from t where source is not null
       group by 1,3,4
      ) union all
      (select trans as ip,0 as source_count,count(*) as trans_count,0 as dest_count
       from t where trans is not null
       group by 1,2,4
      ) union all
      (select dest as ip,count(*) as dest_count
       from t where dest is not null
       group by 1,3
      )
     ) t
group by ip
order by ip;