问题描述
我有一个预订酒店的数据集。 date_in 的格式为“yyyy-MM-dd”。我需要按月选择前 10 名访问量最大的酒店。
SELECT top_visits.date_ci,top_visits.hotel_id,top_visits.count_visits
FROM (
SELECT date_ci,hotel_id,COUNT(id) AS count_visits,RANK() OVER (
PARTITION BY date_ci,hotel_id ORDER BY COUNT(id) DESC) as rank
FROM (
SELECT id,SUBSTRING(my_tab.date_in,1,7) as date_ci
FROM my_database.my_tab) x
) top_visits
GROUP BY date_ci,hotel_id HAVING rank <= 10;
我收到以下错误:
错误:编译语句时出错:Failed: SemanticException 未能将 Windowing 调用分解为组。至少 1 组 必须只依赖于输入列。还要检查圆形 依赖关系。基本错误: org.apache.hadoop.hive.ql.parse.SemanticException:第 4:13 行 表达式不在 GROUP BY 键“hotel_id”中
解决方法
我会建议这样的:
SELECT ymh.*
FROM (SELECT YEAR(date_in) as yyyy,MONTH(date_in) as mm,hotel_id,COUNT(*) AS count_visits,ROW_NUMBER() OVER (PARTITION BY YEAR(date_in),MONTH(date_in),COUNT(*) DESC) as seqnum
FROM my_database.my_tab
GROUP BY YEAR(date_in),MONTH(date_in)
) ymh
WHERE seqnum <= 10;
也就是说,有一个聚合和一个窗口函数调用来枚举值。
,将 COUNT(id) 聚合移动到子查询中,添加分组依据:
SELECT top_visits.date_ci,top_visits.hotel_id,top_visits.count_visits
FROM (
SELECT date_ci,count_visits,RANK() OVER (PARTITION BY date_ci,hotel_id ORDER BY count_visits DESC) as rank
FROM (
SELECT hotel_id,SUBSTRING(my_tab.date_in,1,7) as date_ci,COUNT(id) AS count_visits
FROM my_database.my_tab
GROUP BY hotel_id,7)
) x
) top_visits
WHERE rank <= 10;