问题描述
我正在BigQuery中合并两个表,并根据几个条件对其进行过滤。代码如下:
SELECT,d.id,d.duration,c.action,c.url
FROM
(
`table_action_url` c
INNER JOIN `table_duration` d ON (d.id = c.id)
)
WHERE c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile','Midpoint','Third quartile','Complete')
输出为:
id duration action url
1 15000 Midpoint https://www.mywebpage_fashion
1 15000 Complete https://www.mywebpage_fashion
2 15000 First quartile https://www.mywebpage_home
2 15000 Midpoint https://www.mywebpage_home
我需要添加一种逻辑,该逻辑只能从操作中获取一个值。优先级为Complete
,Third quartile
等。因此,代码需要比较ids
和urls
,以及最大值是否为Complete
(对于相同的ID)和网址),然后抓取该网址。
所需的输出是:
id duration action url
1 15000 Complete https://www.mywebpage_fashion
2 15000 Midpoint https://www.mywebpage_home
解决方法
您可以使用窗口函数和CASE
表达式:
SELECT * EXCEPT(rn)
FROM (
SELECT,d.id,d.duration,c.action,c.url,ROW_NUMBER() OVER(PARTITION BY d.id ORDER BY CASE c.action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END) rn
FROM `table_action_url` c
INNER JOIN `table_duration` d ON d.id = c.id
WHERE
c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile','Midpoint','Third quartile','Complete')
) t
WHERE rn = 1
,
在BigQuery中,您可以使用聚合来实现:
SELECT d.id,( ARRAY_AGG(c.action ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as action,( ARRAY_AGG(c.url ORDER BY ao.ord DESC LIMIT 1) )[ORDINAL(1)] as url
FROM `table_action_url` c JOIN
`table_duration` d
ON d.id = c.id JOIN
(SELECT 'Complete' as action,1 as ord UNION ALL
SELECT 'Third quartile' as action,2 as ord UNION ALL
SELECT 'Midpoint' as action,3 as ord UNION ALL
SELECT 'First quartile' as action,4 as ord
) ao
ON c.action = ao.action
WHERE c.url LIKE 'https://www.mywebpage%' AND
d.duration = '15000'
GROUP BY d.id,d.duration;
,
我在这里看到的最简单通用的方法就是用下面的代码包装现有查询
#standardSQL
SELECT AS VALUE
ARRAY_AGG(current_query_result
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT,c.url
FROM `table_action_url` c
INNER JOIN `table_duration` d USING(id)
WHERE c.url LIKE "https://www.mywebpage%"
AND d.duration = '15000'
AND c.action in ('First quartile','Complete')
) current_query_result
GROUP BY id,url
有输出
Row id duration action url
1 1 15000 Complete https://www.mywebpage_fashion
2 2 15000 Midpoint https://www.mywebpage_home
如您所见,通过下面的片段实现了订购候选人并选择候选人的方式
ORDER BY CASE action
WHEN 'Complete' THEN 1
WHEN 'Third quartile' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First quartile' THEN 4
END
LIMIT 1
还有另一种选择,可以减少冗长,更易于管理并且可能更有效率(这还没有被证明-只是我的感觉)代码来实现相同目的
ORDER BY STRPOS('Complete,Third quartile,Midpoint,First quartile',action)
LIMIT 1