问题描述
我的data.frame_1从2017年1月1日到2020年10月1日,每个季度都有相关信息,如下所示:
DATE CLINIC_ID NR_INDIVIDUALS REGION_ID TOTAL_NR_INDIVIDUALS AVERAGE_INDEX
2017-01-01 A11 3 A 100 3
2017-01-01 A11 10 B 100 3
2017-01-01 A12 14 C 130 4
2017-01-01 A13 5 D 110 5
....
2017-04-01 A11 2 A 96 4
2017-04-01 A11 9 B 96 4
2017-04-01 A12 13 C 100 4
2017-04-01 A13 5 D 105 7
....
2017-07-01 A11 2 A 89 4
2017-07-01 A11 8 B 89 4
2017-07-01 A12 14 C 105 5
2017-07-01 A13 5 D 90 7
....
2020-10-01 A11 6 A 97 4
2020-10-01 A11 14 B 97 4
2020-10-01 A12 15 C 90 6
2020-10-01 A13 3 D 92 7
我的data.frame_2仅具有2个时间段的信息(2019-09-01和2020-05-01),如下所示:
DATE REGION_ID CONNECTIVITY PERCENTAGE
2019-09-01 A 0<2Mbit/s 3
2019-09-01 A 2<5Mbit/s 4
2019-09-01 A 5<10Mbit/s 13
2019-09-01 A 10<30Mbit/s 60
2019-09-01 A 30<300Mbit/s 10
2019-09-01 A >=300Mbit/s 10
....
2020-05-01 A 0<2Mbit/s 3
2020-05-01 A 2<5Mbit/s 4
2020-05-01 A 5<10Mbit/s 3
2020-05-01 A 10<30Mbit/s 25
2020-05-01 A 30<300Mbit/s 35
2020-05-01 A >=300Mbit/s 30
我正在做外部联接:
data.frame_3 <- merge(x = data.frame_1,y = data.frame_2,by = c("DATE","REGION_CODE"),all = TRUE)
问题1:自然,我在data.frame_1中获得了CONNECTIVITY
和PERCENTAGE
的所有DATE
的NA。我想用2019-09-01的值填充2019年所有月份的CONNECTIVITY
和PERCENTAGE
的值,而使用2020-05-01的值填充2020年的所有月份的值DATE CLINIC_ID TOTAL_NR_INDIVIDUALS AVERAGE_AGE
2017-01-01 A11 100 40
2017-01-01 A11 100 40
2017-01-01 A12 130 44
2017-01-01 A13 110 43
....
2017-02-01 A11 96 41
2017-02-01 A11 96 41
2017-02-01 A12 100 43
2017-02-01 A13 105 43
....
2017-03-01 A11 89 41
2017-03-01 A11 89 41
2017-03-01 A12 105 42
2017-03-01 A13 90 42
....
2020-10-01 A11 97 42
2020-10-01 A11 97 42
2020-10-01 A12 90 43
2020-10-01 A13 92 43
。我该怎么办?
在另一种情况下,我有data.frame_4,如下所示:
data.frame_5 <- merge(x = data.frame_1,y = data.frame_4,"CLINIC_ID"),all = TRUE)
我正在做外部联接:
AVERAGE_INDEX
问题2 :我想将2017年4月1日的$general = DB::table('generals')
->join('categories','generals.cName','=','categories.id')
->join('tags','generals.jsontext','tags.id')
->select('generals.*','categories.categoryName','tags.tagName')
->get();
(以及data.frame_1中的其他列)中的值复制到2017-03-01和2017- 02-01;从2017-07-01到2017-06-01和2017-05-01下的观察结果,依此类推。该怎么做?
解决方法
请下次提供reproducible example。在这里,我为您创建了一些最小的东西。
# question1 ---------------------------------------------------------------
library(lubridate)
date <- as_date("2017-01-01")+months(0:35)
values <- c(1:36)
df <- data.frame(date,values)
# question 1: replace all 2019 values with May values
df$newvalue <- ifelse(year(df$date)==2019,df$value[df$date=="2019-05-01"],df$values)
tail(df,10)
#> date values newvalue
#> 27 2019-03-01 27 29
#> 28 2019-04-01 28 29
#> 29 2019-05-01 29 29
#> 30 2019-06-01 30 29
#> 31 2019-07-01 31 29
#> 32 2019-08-01 32 29
#> 33 2019-09-01 33 29
#> 34 2019-10-01 34 29
#> 35 2019-11-01 35 29
#> 36 2019-12-01 36 29
#as you can see the newvalues are correctly using May data for 2019
# question 2: replacing the values of months 3 and 2 by 4 --------
# define the correct months to replace for each row
df$refdate <- ifelse(month(df$date) %in% c(2,3),(paste(year(df$date),"04","01",sep="-")),as.character(df$date))
df$refdate <- ifelse(month(df$refdate) %in% c(5,6),"07",as.character(df$refdate))
df$refdate <- as_date(df$refdate)
df$result <- df$values[match(df$refdate,df$date)]
# > head(df[,c("date","refdate","result")],8)
# date refdate result
# 1 2017-01-01 2017-01-01 1
# 2 2017-02-01 2017-04-01 4
# 3 2017-03-01 2017-04-01 4
# 4 2017-04-01 2017-04-01 4
# 5 2017-05-01 2017-07-01 7
# 6 2017-06-01 2017-07-01 7
# 7 2017-07-01 2017-07-01 7
# 8 2017-08-01 2017-08-01 8
# as you can see here feb and march were replaced by april values,# may,june replaced by July
这样,您可以使用非常有用的函数match
避免任何显式循环。在尝试进行任何形式的循环之前,我总是尝试依靠此功能。