如何在PostgreSQL中获取移动窗口argmax

问题描述

我正在尝试使用PostgreSQL中的窗口函数来查找数据库中列的移动argmax。 这是我到目前为止的内容:

select *,(max(case when price = roll_max then (row_num) end) over (partition by roll_max order by s_date)) as argmax
from (
   select s_id,s_date,price,row_number() over (partition by s_id order by s_date) as row_num,max(high_price) over (partition by s_id order by s_date rows 10 preceding) as roll_max
   from sample_table
) tb1
order by s_date

以上代码是从this answer修改而来的。我必须通过s_id添加分区,因为有许多不同的s_id-表的唯一键是:(s_id,s_date)。因此,在所有可用日期中,我需要每对的argmax。

这是一些示例输出数据(窗口大小为10)的输出:

+-------+--------------+---------+---------+----------+------------------------------------------+
| s_id  |    s_date    |  price  | row_num | roll_max |                  argmax                  |
+-------+--------------+---------+---------+----------+------------------------------------------+
| "ABC" | "2020-06-10" | 322.390 |       1 |  322.390 | 1                                        |
| "ABC" | "2020-06-11" | 312.150 |       2 |  322.390 | 1                                        |
| "ABC" | "2020-06-12" | 309.080 |       3 |  322.390 | 1                                        |
| "ABC" | "2020-06-15" | 308.280 |       4 |  322.390 | 1                                        |
| "ABC" | "2020-06-16" | 315.640 |       5 |  322.390 | 1                                        |
| "ABC" | "2020-06-17" | 314.390 |       6 |  322.390 | 1                                        |
| "ABC" | "2020-06-18" | 312.300 |       7 |  322.390 | 1                                        |
| "ABC" | "2020-06-19" | 314.380 |       8 |  322.390 | 1                                        |
| "ABC" | "2020-06-22" | 311.050 |       9 |  322.390 | 1                                        |
| "ABC" | "2020-06-23" | 314.500 |      10 |  322.390 | 1                                        |
| "ABC" | "2020-06-24" | 310.510 |      11 |  322.390 | 1                                        |
| "ABC" | "2020-06-25" | 307.640 |      12 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-26" | 306.390 |      13 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-29" | 304.610 |      14 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-30" | 310.200 |      15 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-07-01" | 311.890 |      16 |  314.500 | NULL /* how to get row_num (10) here? */ |
| "ABC" | "2020-07-02" | 315.700 |      17 |  315.700 | 17                                       |
| "ABC" | "2020-07-06" | 317.680 |      18 |  317.680 | 18                                       |
+-------+--------------+---------+---------+----------+------------------------------------------+

我了解上面编写的查询仅将当前行与最大值匹配,如果匹配,则返回行号-但是这种情况并不总是适用,如上表所示,其中315.640是滚动最大值直到(包括)第12行,但该值来自上一个窗口而不是当前行。

我的问题是:在上面的示例中,如何获取值5代替NULL-即,对于每个实例,获取实际argmax的row_num(315.640的row_num为5) argmax-row_num的值可以用于表格或每个窗口(在此示例中,窗口大小为10)。

我看过other similar个问题,但仍然无法获得想要的结果,因为我要做的是滚动argmax而不是整个问题该表的列。

有人可以为此建议解决方案吗?我也愿意使用UDF。我只有聚合UDF的基本知识,所以我使用临时数组保存最后10个值并取其最大值的方法似乎不是很有效(甚至不确定我是否这样做)。在这一点上,我没有想法:/

解决方法

虽然有点难以理解,但是您可以执行以下操作:

  1. 将该窗口内所有价格值放入数组;
  2. 使用array_position查找滚动最高价格的值;
  3. 通过在输出中添加row_number()(窗口大小)来调整row_number() - 10
  4. 使用GREATEST(row_number() - 10,0)防止出现负数来调整数组的开头:
WITH sample_table(s_id,s_date,price) AS (
    VALUES ('ABC','2020-06-10'::date,322.390),('ABC','2020-06-11'::date,312.150),'2020-06-12'::date,309.080),'2020-06-15'::date,308.280),'2020-06-16'::date,315.640),'2020-06-17'::date,314.390),'2020-06-18'::date,312.300),'2020-06-19'::date,314.380),'2020-06-22'::date,311.050),'2020-06-23'::date,314.500),'2020-06-24'::date,310.510),'2020-06-25'::date,307.640),'2020-06-26'::date,306.390),'2020-06-29'::date,304.610),'2020-06-30'::date,310.200),'2020-07-01'::date,311.890),'2020-07-02'::date,315.700),'2020-07-06'::date,317.680)
)
SELECT s_id,price,row_number() over (PARTITION BY s_id ORDER BY s_date),max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,GREATEST(row_number() over (PARTITION BY s_id ORDER BY s_date) - 10,0)
           + array_position(
                       array_agg(price) over (partition by s_id order by s_date rows 10 preceding),max(price) over (partition by s_id order by s_date rows 10 preceding)
           ) as argmax
FROM sample_table

或者,带有子查询,但更易于阅读:

WITH sample_table(s_id,row_number,roll_max,GREATEST(row_number - 10,0)
           + array_position(
               prices,roll_max
           ) as argmax
FROM (
         SELECT s_id,max(price) over (partition by s_id order by s_date rows 10 preceding)       as roll_max,array_agg(price)
                over (partition by s_id order by s_date rows 10 preceding)                  as prices
         FROM sample_table
     ) as s

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...