问题描述
我有如下数据:
Date Value Country
1/1/2020 0 China
1/2/2020 2 China
1/1/2020 8 Mexico
1/2/2020 9 Mexico
1/1/2020 1 Japan
1/2/2020 2 Japan
但是我试图找到一种方法来按日期和国家/地区汇总值,然后在区域级别重新标记新值,以便看起来像这样:
Date Value Region
1/1/2020 2 Asia
1/2/2020 4 Asia
1/1/2020 8 Latin America
1/2/2020 9 Latin America
我已经尝试过ifelse了:
raw %>%
group_by(Region = if_else(Region %in% c("Southern Asia","Eastern Asia","Southeast Asia","Central Asia","Western Asia"),"Asia",if_else(Region %in% c("Northern Africa","Sub-Saharan Africa"),"Africa",if_else(Region %in% c("Southern Europe","Eastern Europe","Western Europe"),"Europe",if_else(Region %in% c("Latin America"),"Latin American and Caribbean","North America")),Date)) %>%
summarise(Value = sum(Value))
但是它不起作用,我确定我错过了一步。如果有人可以帮助我,谢谢。
解决方法
如果不使用嵌套的if_else
,那么如果我们有一个键/值数据集,那就更容易了。 “ poliscidata”中的“ world”数据集具有“ country”和“ regionun”列,我们可以在这些列上进行加入,然后按sum
library(poliscidata)
library(dplyr)
data(world)
world %>%
select(Country = country,Region = regionun) %>%
right_join(raw) %>%
group_by(Date,Region) %>%
summarise(Value = sum(Value),.groups = 'drop')
数据
raw <- structure(list(Date = c("1/1/2020","1/2/2020","1/1/2020","1/2/2020"),Value = c(0L,2L,8L,9L,1L,2L),Country = c("China","China","Mexico","Japan","Japan")),class = "data.frame",row.names = c(NA,-6L))