用相邻行的值替换 NA 值

问题描述

我需要一些帮助，用周围行中已经存在的其他值填充具有“NA”值的单元格。

我目前有一个关于投资者及其活动的面板数据集。某些行丢失了，因此我完成了面板以包含这些行，将财务交易信息替换为“0”值。

其他变量与更广泛的公司特征相关，例如区域和战略。我不确定如何为每个公司复制这些。

这是我目前的代码。

df <- df %>%
  group_by(investor) %>%
  mutate(min = min(dealyear,na.rm = TRUE),max = max(dealyear,na.rm = TRUE)) %>%
  complete(investor,dealyear = min:max,fill = list(counttotal=0,countgreen=0,countbrown=0)) %>%

完成前的数据示例 - 缺少 2004 年通知。

投资者	交易年	优惠	策略	地区
123IM	2002	5	买断	欧洲
123IM	2003	5	买断	欧洲
123IM	2005	5	买断	欧洲
123IM	2006	5	买断	欧洲

完成后的数据示例，其中添加了缺失的行

投资者	交易年	优惠	策略	地区
123IM	2002	5	买断	欧洲
123IM	2003	5	买断	欧洲
123IM	2004	0	不适用	不适用
123IM	2005	5	买断	欧洲
123IM	2006	5	买断	欧洲

我将如何用每个投资公司的相应信息替换这些 NA 值？

非常感谢

罗里

解决方法

您可以将 complete 与 group_by 一起用作 -

library(dplyr)
library(tidyr)

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear),fill = list(dealcounts = 0)) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 NA       NA    
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe

如果您想替换 NA 和 strategy 列中的 region，您可以使用 fill。

df %>%
  group_by(investor) %>%
  complete(dealyear = min(dealyear):max(dealyear),fill = list(dealcounts = 0)) %>%
  fill(strategy,region) %>%
  ungroup

#  investor dealyear dealcounts strategy region
#  <chr>       <int>      <dbl> <chr>    <chr> 
#1 123IM        2002          5 buyout   europe
#2 123IM        2003          5 buyout   europe
#3 123IM        2004          0 buyout   europe
#4 123IM        2005          5 buyout   europe
#5 123IM        2006          5 buyout   europe

data-wrangling mutate r r tidyverse