问题描述
我有df1,其中包含国家/地区和重量(整数)列;和df2,其中每个权重(整数)都有一行,每个国家/地区都有一列,是df2中的数据,是与每种情况相关的成本。
我在df1创建了两个新列,以获取每个国家/地区的列和砝码行的正值,例如:
df1$country.position <- match(df1$country,colnames(df2))
df1$weight.position <- match(df1$weight,rownames(df2))
所以我在df2中有两个新的位置列。我的下一步是:
df1$cost <- df2[df1$weight.position,df1$country.position]
我实际上得到了正确的重量行,但是对于相同的(显然是随机选择的)国家/列,总是会产生这种情况。我不知道该怎么办,因为我以平行的方式查询国家和地区。
为了更好地理解它,我想要做的事情类似于Excel中的组合index(match(),match())。
df1:
+-----------+--------+-----------------+------------------+
| country | weight | weight.position | country.position |
+-----------+--------+-----------------+------------------+
| france | 2 | 2 | 3 |
| venezuela | 1 | 1 | 2 |
| spain | 3 | 3 | 1 |
+-----------+--------+-----------------+------------------+
df2:
+--------+-------+-----------+--------+
| weight | spain | venezuela | france |
+--------+-------+-----------+--------+
| 1 | 3.44 | 4.56 | 3.12 |
| 2 | 4.20 | 5.80 | 4.00 |
| 3 | 5.13 | 7.00 | 4.97 |
+--------+-------+-----------+--------+
Result:
+-----------+--------+-----------------+------------------+------+
| country | weight | weight.position | country.position | cost |
+-----------+--------+-----------------+------------------+------+
| france | 2 | 2 | 3 | 4.00 |
| venezuela | 1 | 1 | 2 | 4.56 |
| spain | 3 | 3 | 1 | 5.13 |
+-----------+--------+-----------------+------------------+------+
解决方法
df2[df1$weight.position,df1$country.position]
的问题在于它选择了df1 $ weight.position内的所有行以及df1 $ country.position内的所有列,而不是row + column的组合。
以下是dplyr和tidyr的另一种方法:
library(dplyr)
library(tidyr)
df1 <- data.frame(country = c("france","venezuela","spain"),weight = c(2,1,3))
df2 <- data.frame(weight = c(1,2,3),spain = c(3.44,4.2,5.13),venezuela = c(4.56,5.8,7.0),france = c(3.12,4.0,4.97))
df_long <- df2 %>%
pivot_longer(-weight,names_to = "country",values_to = "cost")
df1 %>%
left_join(df_long,by = c("weight","country"))
country weight cost
1 france 2 4.00
2 venezuela 1 4.56
3 spain 3 5.13