问题描述
我试图加入两个表:
表X
PlayerID | Name | Team
007 | Sancho | Dortmund
010 | Messi | Barcelona
011 | Werner | Chelsea
001 | De Gea | Man Utd
009 | Lewan..ki | Bayern Mun
006 | Pogba | Man Utd
017 | De Bruyne | Man City
029 | Harvertz | Chelsea
005 | Upamecano | Leipzig
表Y
PlayerID. |Name | Team
010 | Messi | Man City
007 | Sancho | Man Utd
006 | Pogba | Man Utd
017 | De Bruyne| Man City
011 | Werner | Liverpool
006 | Pogba | Real Madrid
使用此query
select avg(y.playerID is not null) as accuracy_ratio
from x
left join y
on y.playerID = x.playerID
and y.name = x.name
and y.team = x.team
但是,当我运行查询时,我得到一个Only numeric or string type arguments are accepted but boolean is passed
。我假设上面的查询只能在MysqL中完成。如何在Hive中重写它?
解决方法
我意识到这与您以前的文章有关,GMB在MySQL中提供了解决方案。这就是您需要做的。
select avg(case when y.playerID is not null then 1 else 0 end) as accuracy_ratio
from x
left join y
on y.playerID = x.playerID
and y.name = x.name
and y.team = x.team
,
@ learning_2_code 我尝试根据您的Dateset在蜂巢中的以下代码。它给我0.22。请让我知道这是否适用于Hive。
select count(y_pid)/count(*) from (
select x.pid,y.pid as y_pid
from tablex x
left join
tabley y
on y.pid = x.pid
and y.ply_name = x.ply_name
and y.team = x.team )A