问题描述
寻求关于使用ggalluvium的一些建议,以证明在澳大利亚的偏好分布。
在澳大利亚,我们享有优先投票权。假设我住在一个有4名候选人参加比赛的地区。 投票是根据您的政党/候选人的喜好在方框1-4编号来完成的。 第一次计票后选票比例最低的候选人将被淘汰,其选票将分配给选民在选票上指示的位置。重复此过程,直到剩下两名候选人并在两党的首选票数超过50%的情况下选出一名候选人为止。
我正在尝试使用流程图和ggalluvium可视化上述重复分配过程。
但是,我似乎不太想在美学方面作图,以显示在下一轮投票中将选票投给候选人的流程。
这是到目前为止我得到的:
library(tidyverse)
library(magrittr)
library(ggalluvial)
Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv",skip = 1)
house_of_reps$BallotPosition %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()
cooper <- house_of_reps %>%
filter(DivisionNm == "Cooper") %>%
spread(CalculationType,CalculationValue) %>%
select(4,9,10,14)
cooper %>% ggplot(aes(x = CountNumber,alluvium = PartyNm,stratum = `Preference Percent`,y = `Preference Percent`,fill = PartyAb)) +
geom_alluvium(aes(fill = PartyAb),decreasing = TRUE) +
geom_stratum(decreasing = TRUE) +
geom_text(stat = "stratum",decreasing = TRUE,aes(label = after_stat(fill))) +
stat_stratum(decreasing = TRUE) +
stat_stratum(geom = "text",aes(label = PartyAb),decreasing = TRUE) +
scale_fill_viridis_d() +
theme_minimal()
对于任何如何显示每个后续计数之后的选票将流向下一个阶层的哪个政党的指导,我们将表示赞赏。
解决方法
不幸的是,您的数据集不太适合您所考虑的绘图类型。虽然绘图本身很容易,但是要获得所需的绘图,则需要“一些”数据整理和准备步骤。
普遍的问题是,您的数据集不能显示从一党到另一党的投票流。它仅显示当事方在每次计数中丢失或获得的总票数。
但是,由于每一步中只有一方退出,可以从您的数据中提取丢失的信息。基本思想是,根据选民的次要政党偏好,为每个政党或更确切地说每个政党划分观察对象,这在后来的一项统计中会消失。
不确定每个步骤是否明确,但是我添加了一些解释作为注释,并添加了数据集最终结构的图,希望可以使所有步骤的最终结果更清楚:
library(tidyverse)
library(magrittr)
library(ggalluvial)
# Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv",skip = 1)
house_of_reps$BallotPosition %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()
cooper <- house_of_reps %>%
filter(DivisionNm == "Cooper") %>%
spread(CalculationType,CalculationValue) %>%
select(count = CountNumber,party = PartyAb,pref = `Preference Count`,trans = `Transfer Count`)
# Helper function to
make_rows <- function(x) {
# Name of party which gets dropped in this period
dropped <- filter(x,trans < 0) %>% pull(party)
if (length(dropped) > 0) {
x <- filter(x,trans >= 0)
# Replacements are added two times. Once for the period where the party drops out,# and also for the previous period
xdrop <- mutate(x,party = dropped,pref = trans,trans = 0,is_drop = FALSE)
xdrop1 <- mutate(xdrop,count = count - 1,to = party,is_drop = FALSE)
# For the parties to keep or which receive transfered votes have to adjust the number of votes
xkeep <- mutate(x,pref = pref - trans,trans = 0)
bind_rows(xdrop1,xdrop,xkeep)
} else {
x
}
}
cooper1 <- cooper %>%
# First: Convert count to a numeric. Add a "to" variable for second
# party preference or the party where votes are transferred to. This variable
# will later on be mapped on the "fill" aes
mutate(to = party,count = as.numeric(as.character(count))) %>%
group_by(party) %>%
# Add identifier of obs. to drop. Obs. to drop are obs. of parties which
# drop out in the following count
mutate(is_drop = lead(trans,default = 0) < 0) %>%
ungroup() %>%
# Split obs. to be dropped by secondary party preference,i.e. in count 0 the
# obs for party "IND" is replaced by seven obs. reflecting the secondary preference
# for one of the other seven parties
split(.$count) %>%
map(make_rows) %>%
bind_rows() %>%
# Now drop original obs.
filter(!is_drop,pref > 0) %>%
# Add a unique identifier
group_by(count,party) %>%
mutate(id = paste0(party,row_number())) %>%
ungroup() %>%
# To make the flow chart work we have make the dataset complete,i.e. add
# "empty" obs for each type of voter and each count
complete(count,id,fill = list(pref = 0,is_drop = FALSE)) %>%
# Fill up party and "to" columns
mutate(across(c(party,to),~ if_else(is.na(.),str_extract(id,"[^\\d]+"),.))) %>%
# Filling up the "to" column with last observed value for "to" if any
group_by(id) %>%
mutate(last_id = last(which(party != to)),to = if_else(count >= last_id & !is.na(last_id),to[last_id],to)) %>%
ungroup()
数据集的最终结构可以通过平铺图来说明:
cooper1 %>%
add_count(count,party) %>%
ggplot(aes(count,reorder(id,n),fill = to)) +
geom_tile(color = "white")
正如我说的那样,毕竟,繁琐的数据争夺使流程图本身是最简单的任务,并且可以像这样实现:
cooper1 %>%
ggplot(aes(x = count,alluvium = id,stratum = to,y = pref,fill = to)) +
geom_flow(decreasing = TRUE) +
geom_stratum(decreasing = TRUE) +
scale_fill_viridis_d() +
theme_minimal()