问题描述
我有这个数据框:
# A tibble: 6 x 5
STATE STATEFP YEAR GW_ratio SW_ratio
<chr> <dbl> <chr> <dbl> <dbl>
1 AL 1 2000 0.0425 0.958
2 AL 1 2005 0.0474 0.953
3 AL 1 2010 0.0469 0.953
4 AL 1 2015 0.0563 0.944
5 AR 5 2000 0.627 0.373
6 AR 5 2005 0.646 0.354
我想将趋势作为每四年期间(2000、2005、2010、2015)每个州的 SW_ratio 和 GW_ratio 之间的关系,您可以使用 lm
执行。然后我想将它存储在以下数据框中:
STATE STATEFP trend
1 AL 1 0
2 AR 5 0
3 AZ 4 0
4 CA 6 0
5 CO 8 0
6 CT 9 0
有谁知道如何对 R 中的行子集进行回归分析?
解决方法
假设您的数据是这样的:
set.seed(111)
df = data.frame(STATE =rep(c("AL","AR"),each=4),STATEFP = rep(c(1,5),YEAR = rep(c(2000,2005,2010,2015),2),GW_ratio = runif(8),SW_ratio = runif(8))
head(df)
STATE STATEFP YEAR GW_ratio SW_ratio
1 AL 1 2000 0.5929813 0.43216062
2 AL 1 2005 0.7264811 0.09368152
3 AL 1 2010 0.3704220 0.55577991
4 AL 1 2015 0.5149238 0.59022849
5 AR 5 2000 0.3776632 0.06714114
6 AR 5 2005 0.4183373 0.04754785
一种方法是将其旋转很长时间,然后按状态和变量分组,然后进行线性回归:
library(dplyr)
library(tidyr)
library(broom)
df %>%
pivot_longer(-c(STATE,YEAR,STATEFP)) %>%
group_by(STATE,STATEFP,name) %>%
do(augment(lm(value ~ YEAR,data=.)))
你会得到这样的结果:
# A tibble: 16 x 11
# Groups: STATE,name [4]
STATE STATEFP name value YEAR .fitted .resid .hat .sigma .cooksd
<fct> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AL 1 GW_ratio 0.593 2000 0.640 -0.0468 0.7 0.204 0.347
2 AL 1 GW_ratio 0.726 2005 0.581 0.146 0.3 0.137 0.265
3 AL 1 GW_ratio 0.370 2010 0.522 -0.151 0.3 0.128 0.286
4 AL 1 GW_ratio 0.515 2015 0.463 0.0523 0.7 0.200 0.433
5 AL 1 SW_ratio 0.432 2000 0.278 0.155 0.7 0.175 1.69
6 AL 1 SW_ratio 0.0937 2005 0.371 -0.277 0.3 0.0146 0.428
7 AL 1 SW_ratio 0.556 2010 0.465 0.0910 0.3 0.314 0.0460
8 AL 1 SW_ratio 0.590 2015 0.558 0.0318 0.7 0.327 0.0715
剩下的就是再次将其转宽,以下是完整的步骤:
res = df %>%
pivot_longer(-c(STATE,data=.))) %>%
select(c(STATE,name,.fitted)) %>%
ungroup() %>%
pivot_wider(names_from=c(name,YEAR),values_from=.fitted)
data.frame(res)
STATE STATEFP GW_ratio_2000 GW_ratio_2005 GW_ratio_2010 GW_ratio_2015
1 AL 1 0.6397368 0.5807136 0.5216905 0.4626673
2 AR 5 0.3263059 0.3319276 0.3375492 0.3431709
SW_ratio_2000 SW_ratio_2005 SW_ratio_2010 SW_ratio_2015
1 0.277517333 0.3711475 0.4647777 0.5584079
2 -0.007647359 0.1170041 0.2416555 0.3663070
如果您只想保持每个状态下 GW_ratio 与 SW_ratio 的斜率,则为:
df %>% group_by(STATE,STATEFP) %>% do(tidy(lm(GW_ratio ~ SW_ratio,data=.))) %>% filter(term=="SW_ratio")
# A tibble: 2 x 7
# Groups: STATE,STATEFP [2]
STATE STATEFP term estimate std.error statistic p.value
<fct> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 AL 1 SW_ratio -0.567 0.234 -2.43 0.136
2 AR 5 SW_ratio 0.436 0.810 0.539 0.644
,
看起来您的起始数据集和所需输出之间的区别只是形状从长到宽的变化。
假设我们有这些数据(我只是复制了一些 AL 行来完成 AR):
library(tidyverse)
state_info_long <- tibble::tribble(
~STATE,~STATEFP,~YEAR,~GW_ratio,~SW_ratio,"AL",1L,2000L,0.0425,0.958,2005L,0.0474,0.953,2010L,0.0469,2015L,0.0563,0.944,"AR",5L,0.627,0.373,0.646,0.354,0.354
)
我们可以使用 tidyr 旋转更宽以看起来像您的目标输出:
state_info_long %>%
pivot_wider(c(STATE:STATEFP),names_sep = ".",names_from = YEAR,values_from = GW_ratio:SW_ratio)
# A tibble: 2 x 10
STATE STATEFP GW_ratio.2000 GW_ratio.2005 GW_ratio.2010 GW_ratio.2015 SW_ratio.2000 SW_ratio.2005 SW_ratio.2010 SW_ratio.2015
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AL 1 0.0425 0.0474 0.0469 0.0563 0.958 0.953 0.953 0.944
2 AR 5 0.627 0.646 0.627 0.646 0.373 0.354 0.373 0.354