问题描述
@H_404_2@
a b c d
1 2005-01-01 0 ... ...
2 2005-02-22 1 ... ...
3 2005-04-02 0 ... ...
4 2005-12-01 3 ... ...
5 2006-03-03 0 ... ...
6 2006-06-08 1 ... ...
7 2006-10-11 0 ... ...
8 2006-12-02 4 ... ...
9 2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16 ... . ... ...
我想选择2005年的所有行,并查看其中有0、1、2、3或4(例如与b列结合)的行。也许有比例?例如,结果将是:
@H_404_2@output:
2005
0 1 2 3 4
20% 20% 20% 20% 20%
我尝试过table(year(DF$a),c=DF$b)
,但这仅给出了所有年份的概述,没有任何比例或类似内容。我尝试使用%>%
将其传递到比例函数中,但这不起作用。
有人知道该怎么做吗?
解决方法
您可以使用table
和proportions
来获得每年的份额,您可以在margin
(此处为proportions
)中给1
每行。
proportions(table(format(DF$a,"%Y"),DF$b),1) * 100
# 0 1 2 3 4
# 2005 50 25 0 25 0
# 2006 50 25 0 0 25
# 2007 50 0 50 0 0
# 2008 75 0 25 0 0
# 2009 100 0 0 0 0
数据:
DF <- structure(list(a = structure(c(12784,12836,12875,13118,13210,13307,13432,13484,13596,13609,13906,14110,14134,14225,14391),class = "Date"),b = c(0L,1L,0L,3L,4L,2L,0L),c = c("...","...","..."),d = c("...","..."
)),row.names = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15"),class = "data.frame")
,
您可以count
出现b
中的每个值,使用pivot_wider
计算比率并以宽格式(如果需要)获取数据。
library(dplyr)
df %>%
count(year = lubridate::year(a),b) %>%
group_by(year) %>%
mutate(n = n/sum(n) * 100) %>%
arrange(b) %>%
tidyr::pivot_wider(names_from = b,values_from = n,values_fill = 0)
# year `0` `1` `2` `3` `4`
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2005 50 25 0 25 0
#2 2006 50 25 0 0 25
#3 2007 50 0 50 0 0
#4 2008 75 0 25 0 0
#5 2009 100 0 0 0 0