如何解决这个缓慢的Postgres选择查询

问题描述

我需要有关PostgreSQL查询的一些帮助。我有以下SELECT查询,大约需要30秒才能在具有大约100.000和200.000条目的表上运行。

SELECT i.id,i.debit_nr,i.pat_id,i.pat_name,i.invoice_id,i.invoice_date,i.due_date,i.client_short,i.payment,i.payment_option,i.marker,i.comment,sum(t.Sum) AS i_sum,i.import_date 
FROM invoices AS i 
   LEFT JOIN invoice_items AS t ON t.invoice_id = i.id 
   JOIN jobs AS j ON i.job_id = j.id 
GROUP BY i.id

我发现似乎很慢的部分只是发票表上的SELECT,因为如果我运行

SELECT i.id,i.import_date 
FROM invoices AS i

几乎需要相同的时间。

GroupAggregate  (cost=63048.71..65737.16 rows=110203 width=76) (actual time=1421.792..1785.528 rows=110203 loops=1)
  Group Key: i.id
  ->  Sort  (cost=63048.71..63577.52 rows=211523 width=76) (actual time=1421.772..1573.998 rows=211527 loops=1)
        Sort Key: i.id
        Sort Method: external merge  disk: 19944kB
        ->  Hash Right Join  (cost=24793.35..34938.02 rows=211523 width=76) (actual time=473.877..1010.362 rows=211527 loops=1)
              Hash Cond: (t.invoice_id = i.id)
              ->  Seq Scan on invoice_items t  (cost=0.00..3878.23 rows=211523 width=12) (actual time=0.035..112.034 rows=211523 loops=1)
              ->  Hash  (cost=22123.81..22123.81 rows=110203 width=72) (actual time=472.566..472.566 rows=110203 loops=1)
                    Buckets: 65536  Batches: 4  Memory Usage: 3592kB
                    ->  Hash Join  (cost=777.49..22123.81 rows=110203 width=72) (actual time=7.784..334.883 rows=110203 loops=1)
                          Hash Cond: (i.job_id = j.id)
                          ->  Seq Scan on invoices i  (cost=0.00..19831.03 rows=110203 width=76) (actual time=0.005..170.120 rows=110203 loops=1)
                          ->  Hash  (cost=705.55..705.55 rows=5755 width=8) (actual time=7.707..7.707 rows=5755 loops=1)
                                Buckets: 8192  Batches: 1  Memory Usage: 289kB
                                ->  Seq Scan on jobs j  (cost=0.00..705.55 rows=5755 width=8) (actual time=0.004..4.741 rows=5755 loops=1)
Planning time: 0.874 ms
Execution time: 1824.846 ms

问题是,这是否无关紧要,是否要在id字段或此选择中需要的所有字段上添加索引。

我如何加快速度?

PS:它是Windows Server上的Postgresql 9.0。

解决方法

尝试使用相关子查询编写查询:

SELECT i.*,(SELECT SUM(it.Sum) 
        FROM invoice_items it
        WHERE it.invoice_id = i.id
       ) as i_sum
FROM invoices i ;

避免外部聚合可能有助于提高性能(尽管Postgres具有良好的优化程序,因此并非总是如此。您希望将invoice-items,invoice_id,sum. I left jobs`的索引移出查询,因为似乎没有被使用。