创建按组加起来等于1的随机比率

问题描述

我有如下数据集:

panelID= c(1:50)
year= c(2005,2010)
country = c("A","B","C","D","E","F","G","H","I","J")
urban = c("A","C")
indust = c("D","F")
sizes = c(1,2,3,4,5)
n <- 2
library(AER)
library(data.table)
library(dplyr)
set.seed(123)
DT <- data.table(   country = rep(sample(country,length(panelID),replace = T),each = n),year = c(replicate(length(panelID),sample(year,n))),sales= round(rnorm(10,10,10),2),industry = rep(sample(indust,urbanisation = rep(sample(urban,size = rep(sample(sizes,each = n))
DT <- DT %>%
group_by(country) %>%
mutate(base_rate = as.integer(runif(1,12.5,37.5))) %>%
group_by(country,year) %>%
mutate(taxrate = base_rate + as.integer(runif(1,-2.5,+2.5)))
DT <- DT %>%
group_by(country,year) %>%
mutate(Vote = sample(c(0,1),Votewon = ifelse(Vote==1,sample(c(0,0))

我想向该数据集添加一个名为ratio的变量。我希望ratio是0到1之间的随机数,并且我希望这些国家/地区的比率之和为1。

我将如何创建这样的列?我唯一想到的就是手动创建矢量,这些矢量加起来等于1,然后从这些矢量中采样。

编辑:国家/地区的条目不相等:

> table(DT$country)

 A  B  C  D  E  F  G  H  I  J 
 6 10 14  6 14 10 10  8 10 12 

ratio_sample_6 <- c(0.1,0.2,0.3,0.05,0.15,0.2)
DT[,ratio:=sample(ratio_sample_6,replace = FALSE),by="country"]

但是即使那样我也无法上班。有什么建议吗?

解决方法

选择随机数并按国家/地区归一化

## data.table version
DT[,ratio := runif(.N)][,ratio := ratio / sum(ratio),by = "country"]

## dplyr version
DT %>% group_by(country) %>%
  mutate(
    ratio = runif(n()),ratio = ratio / sum(ratio)
)