使用大表上的连接进行更新-性能提示?

问题描述

曾经苦苦挣扎,却从未完成

update votings v
set Voter_id = (select pv.number from Voters pv WHERE pv.person_id = v.person_id);

该表当前有9600万条记录

select count(0) from votings;
  count   
----------
 96575239
(1 registro)

更新显然正在使用索引

explain update votings v                             
set Voter_id = (select pv.number from Voters pv WHERE pv.rl_person_id = v.person_id);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Update on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
   ->  Seq Scan on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
         SubPlan 1
           ->  Index Scan using idx_Voter_rl_person_id on Voters pv  (cost=0.56..8.58 rows=1 width=9)
                 Index Cond: (rl_person_id = v.person_id)
(5 registros)

这是我要投票的索引

Índices:
    "votings_pkey" PRIMARY KEY,btree (id)
    "votings_election_id_Voter_id_key" UNIQUE CONSTRAINT,btree (election_id,person_id)
    "votings_external_id_external_source_key" UNIQUE CONSTRAINT,btree (external_id,external_source)
    "idx_votings_updated_at" btree (updated_at DESC)
    "idx_votings_Vote_party" btree (Vote_party)
    "idx_votings_Vote_state_Vote_party" btree (Vote_state,Vote_party)
    "idx_votings_Voter_id" btree (person_id)
Restrições de chave estrangeira:
    "votings_election_id_fkey" FOREIGN KEY (election_id) REFERENCES elections(id)
    "votings_Voter_id_fkey" FOREIGN KEY (person_id) REFERENCES people_all(id)

伙计们,在更新运行缓慢中发挥最大作用的任何想法?行数或正在使用的联接?

解决方法

我在这里可以提出的一个建议是对子查询使用覆盖索引:

CREATE INDEX idx_cover ON voters (person_id,number);

尽管在选择上下文中,这可能比单独在person_id上的当前索引没有太多优势,但在更新上下文中,它可能更重要。原因是,对于该更新,此索引可能使Postgres不必在更新之前就必须以其状态创建和维护原始表的副本。

,

如果您在voting中实际上有91339856行,那么voters上的91339856索引扫描无疑是主要的成本因素。顺序扫描会更快。

如果不强制PostgreSQL执行嵌套循环联接,则可以提高性能:

UPDATE votings
SET voter_id = voters.number
FROM voters
WHERE votings.person_id = voters.person_id;
,

更新表中的所有行将非常昂贵。我建议重新创建表:

package com.example.todo.model;

import javax.persistence.Entity;
import javax.persistence.Transient;

@Entity
public class Todo extends Task {

    private boolean isChecked;
}

对于此查询,您需要create temp_votings as select v.*,vv.vote_id from votings v join voters vv on vv.person_id = v.person_id; 上的索引。我猜测votes(person_id,vote_id)可能已经是主键;如果是这样,则不需要其他索引。

然后,您可以替换现有表-但先备份它:

person_id