postgreSQL 选择间隔并填空

问题描述

我正在开发一个系统来管理不同项目中的问题。

我有以下表格：

项目

id	说明	国家
1	3D 体验	巴西
2	Lorem Epsum	智利

问题

id	idProject	说明
1	1	未加载
2	1	崩溃

问题状态

id	身份问题	状态	开始日期	结束日期
1	1	红色	2020-10-17	2020-10-25
2	1	黄色	2020-10-25	2020-11-20
3	1	红色	2020-11-20
4	2	红色	2020-11-01	2020-11-25
5	2	黄色	2020-11-25	2020-12-22
6	2	红色	2020-12-22	2020-12-23
7	2	绿色	2020-12-23

在上面的例子中，问题1仍然是红色的，问题2是绿色的（没有结束日期）。

我需要在用户选择特定项目时创建一个图表，其中将显示按周（从第一个注册问题的那一周开始）的问题状态。项目 1 的图表应如下所示：

我正在尝试在 postgresql 中编写代码以返回这样的表，以便我可以填充此图表：

周	绿色	黄色	红色
42/20	0	0	1
43/20	0	0	1
44/20	0	1	0
...	...	...	...
04/21	1	0	1

我一直在尝试多种方法，但就是不知道该怎么做，有人可以帮我吗？用 db-fiddle 来帮助：

CREATE TABLE projects (
  id serial NOT NULL,description character varying(50) NOT NULL,country character varying(50) NOT NULL,CONSTRAINT projects_pkey PRIMARY KEY (id)
);

CREATE TABLE problems (
  id serial NOT NULL,id_project integer NOT NULL,CONSTRAINT problems_pkey PRIMARY KEY (id),CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
      REFERENCES projects (id) MATCH SIMPLE
);

CREATE TABLE problems_status (
  id serial NOT NULL,id_problem integer NOT NULL,status character varying(50) NOT NULL,start_date date NOT NULL,end_date date,CONSTRAINT problems_status_pkey PRIMARY KEY (id),CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
      REFERENCES problems (id) MATCH SIMPLE
);

INSERT INTO projects (description,country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description,country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project,'Breaking down');
INSERT INTO problems_status (id_problem,status,start_date,end_date) VALUES
(1,'Red','2020-10-17','2020-10-25'),(1,'Yellow','2020-10-25','2020-11-20'),'2020-11-20',NULL),(2,'2020-11-01','2020-11-25'),'2020-11-25','2020-12-22'),'2020-12-22','2020-12-23'),'Green','2020-12-23',NULL);

解决方法

您可以使用 COALESCE 填空以选择列表中的第一个非空值。

SELECT COALESCE(<some_value_that_could_be_null>,<some_value_that_will_not_be_null>);

如果您想将时间范围的界限强制放入结果集中，您可以UNION使用特定日期的结果集。

SELECT ... -- your data query here
UNION ALL
SELECT end_ts -- WHERE end_ts is a timestamptz type

为了UNION，您需要在联合查询中返回相同的数量和相同类型的字段。您可以填写除时间戳之外的所有内容，并将 NULL 强制转换为匹配类型。

更具体的例子：

WITH data AS -- get raw data
(
    SELECT p.id,ps.status,ps.start_date,COALESCE(ps.end_date,CURRENT_DATE,'01-01-2025'::DATE) -- you can fill in NULL values with COALESCE,pj.country,pj.description,MAX(start_date) OVER (PARTITION BY p.id) AS latest_update
      FROM problems p
      JOIN projects pj ON (pj.id = p.id_project)
      JOIN problem_status ps ON (p.id = ps.id_problem)
     UNION ALL -- force bounds in the following
    SELECT NULL::INTEGER -- could be null or a defaulted value,NULL::TEXT    -- could be null or a defaulted value,start_date -- either as an input param to a function or a hard-coded date,end_date   -- either as an input param to a function or a hard-coded date,NULL::TEXT,NULL::DATE
) -- aggregate in the following
SELECT <week> -- you'll have to figure out how you're getting weeks out of the DATE data,COUNT(*) FILTER (WHERE status = 'Red'),COUNT(*) FILTER (WHERE status = 'Yellow'),COUNT(*) FILTER (WHERE status = 'Green')
  FROM data
 WHERE start_date = latest_update
 GROUP BY <week> 
;

此查询中使用的某些功能非常强大，如果您不熟悉这些功能，并且您将要进行大量报告查询，则应该查找它们。主要是合并、公用表表达式（CTE）、窗口函数和聚合表达式。

Aggregate Expressions

WITH Queries (CTEs)

COALESCE

Window Functions

我写了一个 dbfiddle 让你在更新你的要求后看看 here。

如果我理解正确的话，您的目标是在特定时间段（从最小数据库日期到当前日期）按特定项目的问题状态生成每周计数。此外，如果问题状态跨越一周，则应包括在每个周的计数中。这涉及 2 个时间段，即针对状态开始/结束日期的报告期并检查这些日期是否重叠。现在有5个需要检查的重叠场景；让我们调用范围让 A 在报告期间的任何一周和 B. 状态的开始/结束。现在，允许 A 必须在报告期内结束。但 B 没有我们有以下内容。

A 开始，B 开始，A 结束，B 结束。 B 与 A 的结尾重叠。
A 开始，B 开始，B 结束，A 结束。 B 完全包含在 A 中。
B 开始，A 开始，B 结束，A 结束。 B 与 A 的开头重叠。
B 开始，A 开始，A 结束，B 结束。 A 完全封闭在 B 中。幸运的是，Postgres 提供了处理上述所有功能的功能，这意味着查询不必处理单独的验证。这是 DATERANGE 和 Overlap 运算符。然后困难的工作变成在 A 中定义每周。然后在 A 中每周的日期范围上针对 B 的日期范围（start_date，end_date）使用 Overlap 运算符。然后进行条件聚合。对于检测到的每个重叠。查看完整的example here。

with  problem_list( problem_id ) as 
       -- identify the specific problem_ids desirded
       (select ps.id 
          from projects p
          join problems ps on(ps.id_project =  p.id)
         where p.id  = &selected_project
       )  --select * from problem_list;,report_period(srange,erange) as 
       -- generate the first day of week (Mon) for the
       -- oldest start date through day of week of Current_Date   
       (select min(first_of_week(ps.start_date)),first_of_week(current_date)
          from problem_status ps
          join problem_list pl 
            on (pl.problem_id = ps.id_problem)
       )  --select * from report_period;,weekly_calendar(wk,yr,week_dates) as 
       -- expand the start,end date ranges to week dates (Mon-Sun) 
       -- and identify the week number with year
       (select extract( week from mon)::integer wk,extract( isoyear from mon)::integer yr,daterange(mon,mon+6,'[]'::text) wk_dates
          from (select generate_series(srange,erange,interval '7 days')::date mon
                  from  report_period
               ) d
       )  -- select * from weekly_calendar;,status_by_week(yr,wk,status) as   
     -- determine where problem start_date,end_date overlaps each calendar week
     -- then where multiple statuses exist for any week keep only the lat               
        ( select yr,status  
            from (select  wc.yr,wc.wk,ps.status 
                 --,wc.week_dates,id_problem,row_number() over (partition by ps.id_problem,wk order by yr,start_date desc)  rn
                   from problem_status  ps 
                    join problem_list   pl on (pl.problem_id = ps.id_problem)
                    join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date))  -- actual overlap test  
                 ) ac
           where rn=1
        ) -- select * from status_by_week order by wk;
select 'Project ' || p.id || ': ' || p.description Project,to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK","Red","Yellow","Green"
 from projects p
cross join (select sbw.yr,sbw.wk,count(*) filter (where sbw.status = 'Red')    "Red",count(*) filter (where sbw.status = 'Yellow') "Yellow",count(*) filter (where sbw.status = 'Green')  "Green" 
              from status_by_week sbw 
             group by sbw.yr,sbw.wk
           ) sr
where p.id  = &selected_project
order by yr,wk;

CTE 和 main 操作如下：

problem_list：识别相关的问题 (id_problem) 指定项目。
report_period：标识完整的报告期开始到结束。
weekly_calendar：生成报告周期内每周的开始日期（周一）和结束日期（周日）（上面的 A）。沿着它还收集一年中的第几周和 ISO 年。
status_by_week：这是执行两项任务的真正工作马。首先是按日历中的每一周通过每个问题。它为检测到的每个重叠构建行。然后它强制执行“一状态”规则。
最后，主选择将状态聚合到适当的存储桶并添加获得程序名称的语法糖。

注意函数 first_of_week()。这是一个用户定义的函数，在示例和下面的示例中可用。我前段时间创建了它，发现它很有用。您可以自由使用它。但是您这样做没有任何适用性或保证的声明。

create or replace
function first_of_week(date_in date)
 returns date
language sql
immutable strict
/*
 * Given a date return the first day of the week according to ISO-8601
 * 
 *    ISO-8601 Standard (in short) 
 *    1 All weeks begin on Monday.
 *    2 All Weeks have exactly 7 days.
 *    3 First week of any year is the Monday on or before 4-Jan.
 *      This implies that the last few days on Dec may be in the 
 *      first week of the following year and that the first few 
 *      days of Jan may be in week 53 (53) of the prior year.
 *      (Not at the same time obviously.)  
 *  
 */ 
as $$
   with wk_adj(l_days) as (values  (array[0,1,2,3,4,5,6]))
   select date_in - l_days[ extract (isodow from date_in)::integer ]
     from wk_adj;
$$;

在示例中，我将查询实现为 SQL 函数，因为似乎 dbfiddle 有绑定变量的问题和替代变量，此外它还提供了对其进行参数化的能力。（讨厌硬编码值）。例如我为额外的测试添加了额外的数据，主要是作为不会被选择的数据。还有一个额外的状态（如果遇到不是这 3 个状态值（在本例中为粉红色）的情况会发生什么。这个很容易删除，只需摆脱 OTHER。

您注意到“日期范围覆盖 mon-mon，而不是 mon-sun”是不正确的，尽管对于不习惯看它们的人来说似乎是这样。让我们以第 43 周为例。如果您查询日期范围，它将显示 [2020-10-19,2020-10-26)，是的，这两个日期都是星期一。但是，括号字符是有意义的。前导字符 [ 表示日期要包括，尾随字符 ) 表示日期不包括。标准条件：

somedate && [2020-10-19,2020-10-26) 
is the same as
somedate >= 2020-10-19 and somedate < 2020-10-26

这就是为什么当您将增量从“mon+6”更改为“mon+5”时，您修复第 43 周，但在其他周中引入了错误。

intervals postgresql