随着时间的推移唯一值的累积计数

问题描述

我有一个像这样的数据框 mydf

| Country    | Year |
| ---------- | ---- |
| Bahamas    | 1982 |
| Chile      | 1817 |
| Cuba       | 1960 |
| Finland    | 1918 |
| Kazakhstan | 1993 |

等等,还有更多的行。

是否有一种简单的方法可以绘制随时间变化的唯一国家/地区的累计数量?换句话说,

  • x 轴 = Year(时间轴),以及
  • y 轴 = 已提及国家的累计数量

我尝试了 stat_ecdf(),但 y 轴未显示国家/地区的绝对计数:

ggplot(mydata,aes(x = Year)) + stat_ecdf()

这是一个 mydf 示例:

> dput(mydf)

structure(list(Country = c("Moldova","Aragon","Abu Dhabi","Uzbekistan","Sweden","Anhalt","Saudi Arabia","Montenegro","Central African Republic","Bulgaria","Argentina","Senegal","Sri Lanka","Cambodia","Benin","Colombia","Algeria","Iraq","DPRK","Italy"),Year = c(1992L,1223L,1966L,1993L,1748L,1835L,1955L,1841L,1959L,1806L,1960L,1995L,1892L,1914L,1981L,1958L,1948L,1900L)),row.names = c(NA,-20L),class = c("data.table","data.frame"))

解决方法

根据第一次出现给国家一个 ID 号,然后累积计数与该 ID 的累积最大值相同:

mydf = mydf[order(mydf$Year,mydf$Country),]
mydf$country_id = as.integer(factor(mydf$Country,levels = unique(mydf$Country)))
mydf$cum_n_country = cummax(mydf$country_id)

如果年份重复,您需要按年份汇总/汇总最大 cum_n_country

library(dplyr)
library(ggplot2)
mydf %>%
  group_by(Year) %>%
  summarize(cum_n_country = max(cum_n_country)) %>%
  ggplot(aes(x = Year,y = cum_n_country)) + 
  geom_line()

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...