每周总计唯一序列号

问题描述

我正在尝试编写一个查询,该查询搜索我的数据库并每周查找设备的总唯一序列号。我当前的代码是:

SELECT date_part('week',"timestamp"),count(disTINCT serialno) 
FROM eddi_minute em 
GROUP BY date_part('week',"timestamp")  

不幸的是,我正在搜索的数据集非常大(约 600Gb),因此搜索所需的时间非常长。我希望能够每周搜索一次,每周搜索一次很短的时间,即 1 分钟,也就是 1 分钟

select count(distinct serialno) as Devices
        from eddi_minute em where "timestamp" >= '2021-06-23 00:01:00' and "timestamp" < '2021-06-23 00:02:00';

但是对于一整年的每周,这样我就可以按一次 Enter 键,它会为整个数据库执行此操作,并避免不必要的计数。

在理想的世界中,我的想法是创建一个我想要搜索的时间表,然后与它和我的数据库进行左连接以减少我正在搜索的数据,但我只有读取权限到服务器,所以这不是一个选项。有没有一种简单的方法可以做到这一点?如果这里有任何不清楚的地方,请道歉,如果有任何不正确的解释,我会详细说明。

表的索引是

CREATE UNIQUE INDEX "PK_4c94f05e4de575488f4a0c2905d" ON ONLY public.eddi_minute USING btree (serialno,"timestamp")

解释分析结果是:

GroupAggregate  (cost=41219561.55..90787854.96 rows=200 width=16) (actual time=7065790.406..8172419.446 rows=53 loops=1)
  Group Key: (date_part('week'::text,em."timestamp"))
  ->  Gather Merge  (cost=41219561.55..88747442.16 rows=408082059 width=16) (actual time=7052726.256..7834672.575 rows=408057194 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=41218561.53..41643646.99 rows=170034187 width=16) (actual time=6956066.331..7201252.404 rows=136019065 loops=3)
              Sort Key: (date_part('week'::text,em."timestamp"))
              Sort Method: external merge  disk: 3368720kB
              Worker 0:  Sort Method: external merge  disk: 3640792kB
              Worker 1:  Sort Method: external merge  disk: 3371808kB
              ->  Parallel Append  (cost=0.00..9256242.79 rows=170034187 width=16) (actual time=0.435..2825202.379 rows=136019065 loops=3)
                    ->  Parallel Seq Scan on eddi_minute_p2021_05 em_11  (cost=0.00..1725776.58 rows=34898767 width=16) (actual time=0.011..1722528.987 rows=83740195 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_06 em_12  (cost=0.00..1488905.33 rows=30102507 width=16) (actual time=1.266..1488189.219 rows=72252984 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_04 em_10  (cost=0.00..1428581.36 rows=28905149 width=16) (actual time=149.934..1290294.249 rows=69366177 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_03 em_9  (cost=0.00..1290438.50 rows=26110040 width=16) (actual time=69.475..483281.530 rows=20887814 loops=3)
                    ->  Parallel Seq Scan on eddi_minute_p2021_02 em_8  (cost=0.00..922294.02 rows=18661202 width=16) (actual time=195.734..931653.840 rows=44786882 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_01 em_7  (cost=0.00..823415.96 rows=16660557 width=16) (actual time=102.708..834900.144 rows=39985282 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_12 em_6  (cost=0.00..293130.95 rows=5931036 width=16) (actual time=182.465..296634.818 rows=14234537 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_11 em_5  (cost=0.00..111271.35 rows=2251388 width=16) (actual time=195.367..110910.685 rows=5403366 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_10 em_4  (cost=0.00..105311.10 rows=2130808 width=16) (actual time=146.920..109340.586 rows=5113938 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_09 em_3  (cost=0.00..93692.39 rows=1895711 width=16) (actual time=87.456..94169.812 rows=4549714 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_08 em_2  (cost=0.00..86189.97 rows=1743918 width=16) (actual time=0.007..88029.891 rows=4185403 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2020_07 em_1  (cost=0.00..33400.45 rows=675796 width=16) (actual time=1.046..14190.279 rows=1621911 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_07 em_13  (cost=0.00..3438.66 rows=88773 width=16) (actual time=0.006..51.229 rows=150887 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_default em_26  (cost=0.00..45.20 rows=1456 width=16) (actual time=0.016..0.639 rows=2477 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_08 em_14  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_09 em_15  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.515 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_10 em_16  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_11 em_17  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2021_12 em_18  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_01 em_19  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_02 em_20  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_03 em_21  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.001 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_04 em_22  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_05 em_23  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_06 em_24  (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
                    ->  Parallel Seq Scan on eddi_minute_p2022_07 em_25  (cost=0.00..15.00 rows=400 width=16) (actual time=0.002..0.003 rows=0 loops=1)
Planning Time: 35.809 ms
Execution Time: 8172556.078 ms

解决方法

一些想法:

尽管 "timestamp" 是有效的列名,但为对象使用保留名称被认为是不好的做法。这可能看起来无害,但从长远来看可能会很烦人。

我认为 "timestamp" 列中的索引应该会显着提高第二个查询的性能:

CREATE INDEX idx_timestamp ON eddi_minute ("timestamp");

关于第一个查询:考虑到您有一个 600GB (!) 的表,在 "timestamp" 列中创建一个 partial index 可能会很有趣,这样时间戳就会按照您将要使用的值进行索引在您的查询中使用,例如周:

CREATE INDEX idx_timestamp_week ON eddi_minute (date_part('week',"timestamp"));

注意:虽然索引可以加快查询速度,但会减慢其他操作,例如插入、更新和删除。如果创建新索引,请测试所有相关操作的性能。

演示:db<>fiddle

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...