查询需要几个小时才能完成，由于大量数据集上的多个连接和联合，查询真的很慢，也许索引是错误的？

问题描述

我有一个很长的查询，我想将其另存为存储过程，稍后在 etl 作业中使用它。它影响了大约 15+ 万行，需要 1 个半小时才能完成。我正在使用 postgres 和 pgadmin。

INSERT INTO t_temporary_id_table ( 
    ref_date,id,client,slcunda,count_single_1,count_double_2,count_e,count_g,count_m,active,valid_till_max,id_2,created,lastmodified 
    )
with 
cte_tmp as (
    select 
        a.id,mm.tenant,c.slcunda,d.single,d.count_single,e.double_1,e.count_double_1,SUM (CASE WHEN id_role = 'E' THEN 1 ELSE 0 END) AS "count_e",SUM (CASE WHEN id_role = 'G' THEN 1 ELSE 0 END) AS "count_g",SUM (CASE WHEN id_role = 'M' THEN 1 ELSE 0 END) AS "count_m",case when min(status)='active' then 1 else 0 end active,MAX(valid_till) as valid_till_max
        
    from schema1.struct a
    inner join 
    (
    select 
        id,max(valid_till) valid_till_max
    from 
        schema1.struct a
    group by 
        id
    ) b
    on 
    a.id=b.id and a.valid_till = b.valid_till_max
    left outer join 
    schema2.tenants mm on a.tsl_1_2 = mm.tenant 
    left outer join (
        select 
            id,key_1 as slcunda
        from    
            schema1.t_id 
        where 
            id in (Select id from schema1.t_id group by id having count(id)=1) 
    ) c
    on a.id=c.id
    left outer join(
        select 
        id,count_single,single 
        from (
            select
                id,id_2 as single,id_2_role,count(id) over(partition by id) as count_single,row_number() over(partition by id order by id,id_2_role desc) as rn
            from 
                schema1.different_id_2
            where 
                id_2_role in ('03','08','17')   
            ) a
        where rn=1
    ) d
    on a.id=d.id
    left outer join(
        select 
        id,count_double_1,double_1
        from (
            select
                id,id_2 as double_1,count(id) over(partition by id) as count_double_1,id_2_role desc) as rn 
            from 
                schema1.different_id_2
            where 
                id_2_role in ('06','19')    
        ) a
    where rn=1
    ) e 
    on a.id=e.id    
    group by a.id,mm.client,e.count_double_1
),y as ( 
    select * 
    from (
        SELECT 
            cte_tmp.id,count_single as count_single_1,count_double_1 as count_double_2,b.id_2 
        FROM 
            cte_tmp
        inner join (
            select 
                id,id_2
            from (
                select 
                    id,id_theory as id_2,row_number() over(partition by id) rn
                from
                    schema1.struct
            ) a 
        where rn=1  
        ) b
        on cte_tmp.id=b.id
        where
            count_e=1 and count_g=0 and count_m=0 and count_single=0 
        union all   
        SELECT 
            id,single as id_2 
        FROM 
            cte_tmp
        where 
            count_e=1 and count_g=0 and count_m=0 and count_single>=1
        union all
        SELECT
            id,double_1 as id_2 
        FROM 
            cte_tmp
        where
            count_e=0 and count_g=1 and count_m>=1 and active=1 and count_double>=1
        union all
        SELECT 
            id,double_1 as id_2 
        FROM 
            cte_tmp
        where 
            count_e<>1 and count_g<>0 and count_m<>=0 and active=0 and count_double>=1
    ) a 
),z as (
    SELECT 
        cte_tmp.id,valid_till_max
    FROM cte_tmp
    except
    select 
        id,valid_till_max
    from y
),temporary_result as (
    select 
        id::bigint,'' as id_2
    from z
    union all
    select 
        id::bigint,id_2
    from y
)
select 
    Now(),id_2::bigint,Now(),Now()
from temporary_result

我有索引

schema1.struct 表列“id”和“valid_till”
schema1.t_id 表列 'id'
schema1.different_id_2 表列“id”和“id_2_role”

我是新手，所以任何建议都将不胜感激。

解释查询结果如下：

#   Node    Rows    Loops

实际

聚合（行=13953682 循环=1）13953682 1
排序（行=15791738 循环=1）15791738 1
散列左连接（行=15791738 循环=1）哈希条件：（a.id = a_2.id） 15791738 1
散列左连接（行=15791738 循环=1）哈希条件：（a.id = a_1.id） 15791738 1
哈希右连接（行=15791738 循环=1）哈希条件：（t_id.id = a.id） 15791738 1
哈希内连接（行=60629 循环=1）哈希条件：（t_id.id = t_id_1.id） 60629 1
将 t_id 的 Seq 扫描为 t_id（行 = 45144181 循环 = 1）45144181 1
哈希（行=60629 循环=1）存储桶：131072 批次：2 内存使用：2241 kB 60629 1
聚合（行=60629 循环=1）过滤器：（计数（t_id_1.id）= 1）过滤器删除的行：15056381 60629 1
收集合并（行 = 31764065 循环 = 1）31764065 1
聚合（行=10588022 循环=3）10588022 3
排序（行=15048060 循环=3）15048060 3
将 t_id 上的 Seq 扫描为 t_id_1（行 = 15048060 循环 = 3）15048060 3
哈希（行=15791738 循环=1）存储桶：65536 批次：512 内存使用：3585 kB 15791738 1
收集（行=15791738 循环=1）15791738 1
散列左连接（行=5263913 循环=3）哈希条件：(a.tsl_1_2 = (mm.tenant)::numeric) 5263913 3
哈希内连接（行=5263913 循环=3）哈希条件：((a.id = struct.id) AND (a.valid_till = (max(struct.valid_till)))) 5263913 3
对结构进行 Seq 扫描（行 = 5292460 循环 = 3）5292460 3
哈希（行=13953682 循环=3）存储桶：131072 批次：256 内存使用：3575 kB 13953682 3
聚合（行=13953682 循环=3）13953682 3
排序（行=15877381 循环=3）15877381 3
Seq Scan on struct as struct (rows=15877381 loops=3) 15877381 3
哈希（行=54 循环=3）存储桶：1024 批次：1 内存使用：11 kB 54 3
以 mm 为单位对租户进行 Seq 扫描（行 = 54 个循环 = 3）54 3
哈希（行=7983 循环=1）存储桶：8192 批次：1 内存使用量：634 kB 7983 1
子查询扫描（行=7983 循环=1）过滤器：（a_1.rn = 1）过滤器删除的行数：12669 7983 1
排序（行=20652 循环=1）20652 1
窗口聚合（行=20652 循环=1）20652 1
窗口聚合（行=20652 循环=1）20652 1
排序（行=20652 循环=1）20652 1
收集（行 = 20652 循环 = 1）20652 1
对 different_id_2 进行 Seq 扫描为 different_id_2（行 = 6884 循环 = 3）过滤器：(id_2_role = ANY ('{3,8,17}'::numeric[])) 过滤器删除的行：1798703 6884 3
哈希（行=1815522 循环=1）存储桶：65536 批次：64 内存使用：3585 kB 1815522 1
子查询扫描（行=1815522 循环=1）过滤器：（a_2.rn = 1）过滤器删除的行：3589410 1815522 1
排序（行=5404932 循环=1）5404932 1
窗口聚合（行=5404932 循环=1）5404932 1
窗口聚合（行=5404932 循环=1）5404932 1
排序（行=5404932 循环=1）5404932 1
对 different_id_2 进行 Seq 扫描为 different_id_2_1（行 = 5404932 循环 = 1）过滤器：(id_2_role = ANY ('{6,19}'::numeric[])) 过滤器删除的行数：11829 5404932 1

但我不太明白！

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

execution-time postgresql query-optimization sql sql