查询需要几个小时才能完成,由于大量数据集上的多个连接和联合,查询真的很慢,也许索引是错误的?

问题描述

我有一个很长的查询,我想将其另存为存储过程,稍后在 etl 作业中使用它。它影响了大约 15+ 万行,需要 1 个半小时才能完成。我正在使用 postgres 和 pgadmin。

代码

INSERT INTO t_temporary_id_table ( 
    ref_date,id,client,slcunda,count_single_1,count_double_2,count_e,count_g,count_m,active,valid_till_max,id_2,created,lastmodified 
    )
with 
cte_tmp as (
    select 
        a.id,mm.tenant,c.slcunda,d.single,d.count_single,e.double_1,e.count_double_1,SUM (CASE WHEN id_role = 'E' THEN 1 ELSE 0 END) AS "count_e",SUM (CASE WHEN id_role = 'G' THEN 1 ELSE 0 END) AS "count_g",SUM (CASE WHEN id_role = 'M' THEN 1 ELSE 0 END) AS "count_m",case when min(status)='active' then 1 else 0 end active,MAX(valid_till) as valid_till_max
        
    from schema1.struct a
    inner join 
    (
    select 
        id,max(valid_till) valid_till_max
    from 
        schema1.struct a
    group by 
        id
    ) b
    on 
    a.id=b.id and a.valid_till = b.valid_till_max
    left outer join 
    schema2.tenants mm on a.tsl_1_2 = mm.tenant 
    left outer join (
        select 
            id,key_1 as slcunda
        from    
            schema1.t_id 
        where 
            id in (Select id from schema1.t_id group by id having count(id)=1) 
    ) c
    on a.id=c.id
    left outer join(
        select 
        id,count_single,single 
        from (
            select
                id,id_2 as single,id_2_role,count(id) over(partition by id) as count_single,row_number() over(partition by id order by id,id_2_role desc) as rn
            from 
                schema1.different_id_2
            where 
                id_2_role in ('03','08','17')   
            ) a
        where rn=1
    ) d
    on a.id=d.id
    left outer join(
        select 
        id,count_double_1,double_1
        from (
            select
                id,id_2 as double_1,count(id) over(partition by id) as count_double_1,id_2_role desc) as rn 
            from 
                schema1.different_id_2
            where 
                id_2_role in ('06','19')    
        ) a
    where rn=1
    ) e 
    on a.id=e.id    
    group by a.id,mm.client,e.count_double_1
),y as ( 
    select * 
    from (
        SELECT 
            cte_tmp.id,count_single as count_single_1,count_double_1 as count_double_2,b.id_2 
        FROM 
            cte_tmp
        inner join (
            select 
                id,id_2
            from (
                select 
                    id,id_theory as id_2,row_number() over(partition by id) rn
                from
                    schema1.struct
            ) a 
        where rn=1  
        ) b
        on cte_tmp.id=b.id
        where
            count_e=1 and count_g=0 and count_m=0 and count_single=0 
        union all   
        SELECT 
            id,single as id_2 
        FROM 
            cte_tmp
        where 
            count_e=1 and count_g=0 and count_m=0 and count_single>=1
        union all
        SELECT
            id,double_1 as id_2 
        FROM 
            cte_tmp
        where
            count_e=0 and count_g=1 and count_m>=1 and active=1 and count_double>=1
        union all
        SELECT 
            id,double_1 as id_2 
        FROM 
            cte_tmp
        where 
            count_e<>1 and count_g<>0 and count_m<>=0 and active=0 and count_double>=1
    ) a 
),z as (
    SELECT 
        cte_tmp.id,valid_till_max
    FROM cte_tmp
    except
    select 
        id,valid_till_max
    from y
),temporary_result as (
    select 
        id::bigint,'' as id_2
    from z
    union all
    select 
        id::bigint,id_2
    from y
)
select 
    Now(),id_2::bigint,Now(),Now()
from temporary_result

我有索引

  1. schema1.struct 表列“id”和“valid_till”
  2. schema1.t_id 表列 'id'
  3. schema1.different_id_2 表列“id”和“id_2_role”

我是新手,所以任何建议都将不胜感激。

解释查询结果如下:

#   Node    Rows    Loops

实际

  1. 聚合(行=13953682 循环=1)13953682 1
  2. 排序(行=15791738 循环=1)15791738 1
  3. 散列左连接(行=15791738 循环=1) 哈希条件:(a.id = a_2.id) 15791738 1
  4. 散列左连接(行=15791738 循环=1) 哈希条件:(a.id = a_1.id) 15791738 1
  5. 哈希右连接(行=15791738 循环=1) 哈希条件:(t_id.id = a.id) 15791738 1
  6. 哈希内连接(行=60629 循环=1) 哈希条件:(t_id.id = t_id_1.id) 60629 1
  7. 将 t_id 的 Seq 扫描为 t_id(行 = 45144181 循环 = 1)45144181 1
  8. 哈希(行=60629 循环=1) 存储桶:131072 批次:2 内存使用:2241 kB 60629 1
  9. 聚合(行=60629 循环=1) 过滤器:(计数(t_id_1.id)= 1) 过滤器删除的行:15056381 60629 1
  10. 收集合并(行 = 31764065 循环 = 1)31764065 1
  11. 聚合(行=10588022 循环=3)10588022 3
  12. 排序(行=15048060 循环=3)15048060 3
  13. 将 t_id 上的 Seq 扫描为 t_id_1(行 = 15048060 循环 = 3)15048060 3
  14. 哈希(行=15791738 循环=1) 存储桶:65536 批次:512 内存使用:3585 kB 15791738 1
  15. 收集(行=15791738 循环=1)15791738 1
  16. 散列左连接(行=5263913 循环=3) 哈希条件:(a.tsl_1_2 = (mm.tenant)::numeric) 5263913 3
  17. 哈希内连接(行=5263913 循环=3) 哈希条件:((a.id = struct.id) AND (a.valid_till = (max(struct.valid_till)))) 5263913 3
  18. 对结构进行 Seq 扫描(行 = 5292460 循环 = 3)5292460 3
  19. 哈希(行=13953682 循环=3) 存储桶:131072 批次:256 内存使用:3575 kB 13953682 3
  20. 聚合(行=13953682 循环=3)13953682 3
  21. 排序(行=15877381 循环=3)15877381 3
  22. Seq Scan on struct as struct (rows=15877381 loops=3) 15877381 3
  23. 哈希(行=54 循环=3) 存储桶:1024 批次:1 内存使用:11 kB 54 3
  24. 以 mm 为单位对租户进行 Seq 扫描(行 = 54 个循环 = 3)54 3
  25. 哈希(行=7983 循环=1) 存储桶:8192 批次:1 内存使用量:634 kB 7983 1
  26. 查询扫描(行=7983 循环=1) 过滤器:(a_1.rn = 1) 过滤器删除的行数:12669 7983 1
  27. 排序(行=20652 循环=1)20652 1
  28. 窗口聚合(行=20652 循环=1)20652 1
  29. 窗口聚合(行=20652 循环=1)20652 1
  30. 排序(行=20652 循环=1)20652 1
  31. 收集(行 = 20652 循环 = 1)20652 1
  32. 对 different_id_2 进行 Seq 扫描为 different_id_2(行 = 6884 循环 = 3) 过滤器:(id_2_role = ANY ('{3,8,17}'::numeric[])) 过滤器删除的行:1798703 6884 3
  33. 哈希(行=1815522 循环=1) 存储桶:65536 批次:64 内存使用:3585 kB 1815522 1
  34. 查询扫描(行=1815522 循环=1) 过滤器:(a_2.rn = 1) 过滤器删除的行:3589410 1815522 1
  35. 排序(行=5404932 循环=1)5404932 1
  36. 窗口聚合(行=5404932 循环=1)5404932 1
  37. 窗口聚合(行=5404932 循环=1)5404932 1
  38. 排序(行=5404932 循环=1)5404932 1
  39. 对 different_id_2 进行 Seq 扫描为 different_id_2_1(行 = 5404932 循环 = 1) 过滤器:(id_2_role = ANY ('{6,19}'::numeric[])) 过滤器删除的行数:11829 5404932 1

但我不太明白!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)