PostgresSQL 中的正则表达式连接查询优化

问题描述

如何优化下面的查询,它took 8 hrs to run :

create table rtime.rtime_calc1_jun13tojun19 as(
    explain select pa.api as pa_api,pa.action_type as pa_action_type,max(rt.request_time),avg(rt.request_time),percentile_cont(0.95) within group (order by rt.request_time asc) as percentile_95
    from  public.public_api pa,(select reqtime.* from public.public_api puba 
           right join rtime.rtime_data1_jun13tojun19 reqtime
           on puba.api = reqtime.proxy
           where puba.api is null) as rt   -- to join only regex patterns,and to prevent exact static matches from becoming a part of regex join
    where rt.proxy ~* pa.api_regex
    and   rt.method = pa.action_type
    group by pa.api,pa.action_type
)

下面是解释计划:

GroupAggregate  (cost=1131.43..263846.61 rows=1 width=70)
  Group Key: pa.api,pa.action_type
  ->  nested Loop  (cost=1131.43..263846.59 rows=1 width=54)
        Join Filter: (((reqtime.proxy)::text ~* (pa.api_regex)::text) AND ((reqtime.method)::text = (pa.action_type)::text))
        ->  Index Scan using primary_key_pa on public_api pa  (cost=0.28..565.81 rows=2007 width=90)
        ->  Materialize  (cost=1131.16..263245.66 rows=1 width=49)
              ->  Gather  (cost=1131.16..263245.65 rows=1 width=49)
                    Workers Planned: 2
                    ->  Hash Anti Join  (cost=131.16..262245.55 rows=1 width=49)
                          Hash Cond: ((reqtime.proxy)::text = (puba.api)::text)
                          ->  Parallel Seq Scan on rtime_data1_jun13tojun19 reqtime  (cost=0.00..218885.08 rows=5763908 width=49)
                          ->  Hash  (cost=106.07..106.07 rows=2007 width=42)
                                ->  Seq Scan on public_api puba  (cost=0.00..106.07 rows=2007 width=42)

public.public_api 表有 2007 行。
rtime.rtime_data1_jun13tojun19 表中有 13837305 rows

这是 public_api 表的 DDL:

CREATE TABLE public.public_api (
    api varchar NOT NULL,"type" varchar NULL,api_bin varchar NULL,api_bin_avg_resp_time varchar NULL,api_bin_perc95_resp_time varchar NULL,max_response_time float8 NULL,avg_response_time float8 NULL,percentile_95_response_time float8 NULL,max_tps int4 NULL,min_tps int4 NULL,avg_tps float8 NULL,percentile_90_tps float8 NULL,percentile_99_tps float8 NULL,percentile_95_tps float8 NULL,product varchar NULL,action_type varchar NOT NULL,proxy varchar NULL,CONSTRAINT primary_key_pa PRIMARY KEY (api,action_type)
);

这是 rtime.rtime_data1_jun13tojun19 的 DDL

CREATE TABLE rtime.rtime_data1_jun13tojun19 (
    env varchar NULL,"method" varchar NULL,request_time float8 NULL
);

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)