根据数据点的重要性sql

问题描述

我正在BigQuery中合并两个表,并根据几个条件对其进行过滤。代码如下:

SELECT,d.id,d.duration,c.action,c.url
FROM
    (
        `table_action_url` c
        INNER JOIN `table_duration` d ON (d.id = c.id)
    )
WHERE c.url LIKE "https://www.mywebpage%" 
AND d.duration = '15000' 
AND c.action in ('First quartile','Midpoint','Third quartile','Complete')

输出为:

id      duration      action                    url 
1         15000        Midpoint           https://www.mywebpage_fashion
1         15000        Complete           https://www.mywebpage_fashion
2         15000        First quartile     https://www.mywebpage_home
2         15000        Midpoint           https://www.mywebpage_home

我需要添加一种逻辑,该逻辑只能从操作中获取一个值。优先级为CompleteThird quartile等。因此,代码需要比较idsurls,以及最大值是否为Complete(对于相同的ID)和网址),然后抓取该网址。 所需的输出是:

id      duration      action                    url 
1         15000        Complete           https://www.mywebpage_fashion
2         15000        Midpoint           https://www.mywebpage_home

解决方法

您可以使用窗口函数和CASE表达式:

SELECT * EXCEPT(rn)
FROM (
    SELECT,d.id,d.duration,c.action,c.url,ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
            WHEN 'Complete' THEN 1
            WHEN 'Third quartile' THEN 2
            WHEN 'Midpoint' THEN 3
            WHEN 'First quartile' THEN 4
        END) rn
    FROM `table_action_url` c
    INNER JOIN `table_duration` d ON d.id = c.id
    WHERE 
        c.url LIKE "https://www.mywebpage%" 
        AND d.duration = '15000' 
        AND c.action in ('First quartile','Midpoint','Third quartile','Complete')
) t
WHERE rn = 1
,

在BigQuery中,您可以使用聚合来实现:

SELECT d.id,( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
     `table_duration` d
     ON d.id = c.id JOIN
     (SELECT 'Complete' as action,1 as ord UNION ALL
      SELECT 'Third quartile' as action,2 as ord UNION ALL
      SELECT 'Midpoint' as action,3 as ord UNION ALL
      SELECT 'First quartile' as action,4 as ord
     ) ao
     ON c.action = ao.action      
WHERE c.url LIKE 'https://www.mywebpage%' AND
      d.duration = '15000' 
GROUP BY d.id,d.duration;
,

我在这里看到的最简单通用的方法就是用下面的代码包装现有查询

#standardSQL
SELECT AS VALUE 
  ARRAY_AGG(current_query_result 
    ORDER BY CASE action
      WHEN 'Complete' THEN 1
      WHEN 'Third quartile' THEN 2
      WHEN 'Midpoint' THEN 3
      WHEN 'First quartile' THEN 4
    END
    LIMIT 1
  )[OFFSET(0)] 
FROM (
  SELECT,c.url
  FROM `table_action_url` c
  INNER JOIN `table_duration` d USING(id)
  WHERE c.url LIKE "https://www.mywebpage%" 
  AND d.duration = '15000' 
  AND c.action in ('First quartile','Complete')
) current_query_result
GROUP BY id,url   

有输出

Row id  duration    action      url  
1   1   15000       Complete    https://www.mywebpage_fashion    
2   2   15000       Midpoint    https://www.mywebpage_home     

如您所见,通过下面的片段实现了订购候选人并选择候选人的方式

ORDER BY CASE action
  WHEN 'Complete' THEN 1
  WHEN 'Third quartile' THEN 2
  WHEN 'Midpoint' THEN 3
  WHEN 'First quartile' THEN 4
END
LIMIT 1    

还有另一种选择,可以减少冗长,更易于管理并且可能更有效率(这还没有被证明-只是我的感觉)代码来实现相同目的

ORDER BY STRPOS('Complete,Third quartile,Midpoint,First quartile',action)
LIMIT 1