使 lme4 中混合模型的随机部分的语法更短

问题描述

我想知道是否有更简洁的方法来为语法的随机部分编写语法,如下所示?

PS.: 本质上,我试图为我的二元预测变量 lb_wght 的每个级别获得 2 个不相关的随机斜率集(参见随机效应的相关矩阵下面).

library(lme4)

math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
math <- within(math,lb_wght <- as.factor(lb_wght))

m <- lmer(
    math ~ 0 + lb_wght + lb_wght:grade + 
    (0 + dummy(lb_wght,"0") + dummy(lb_wght,"0"):grade|id) +  # This line and
    (0 + dummy(lb_wght,"1") + dummy(lb_wght,"1"):grade|id),# This line
    data = math)

enter image description here

解决方法

我犹豫要不要回答这个问题,因为我怀疑底层的统计模型有问题,但这是 StackExchange 的编程部门,所以这里是编程解决方案:

要点:

  • 表示为 0 和 1 的二进制分类变量已经进行了虚拟编码。通过 factor 告诉 R 它代表分类对比当然没有问题,有时也很有用,但 R 只会在拟合该模型之前将其转回 0 和 1。
  • 0 和 1 也可以解释为布尔值。我们可以通过否定交换布尔值。
  • 使用 dummy(lb_wght,"0") 只是提供原始二进制变量的反向编码。同样,dummy(lb_wght,"1") 本质上是一个空操作——它只是返回原始的虚拟对比变量。
  • 我们可以将 dummy(lb_wght,"0") 的反向编码表示为布尔否定。

数据整理热身

> library("lme4")
> 
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> # A two-level contrast represented as dummy coding means we have 0 and 1
> # This is the same as FALSE and TRUE and so we can swap them using boolean negation
> math <- within(math,{
+                         lb_wght0 <- as.numeric(!lb_wght) 
+                         lb_wght1 <- lb_wght
+                      })
> head(math) # check what those new vars look like
    id female lb_wght anti_k1 math grade occ age men spring anti nb_wght lb_wght1 lb_wght0
1  201      1       0       0   38     1   2 111   0      1    0       1        0        1
2  201      1       0       0   55     3   3 135   1      1    0       1        0        1
3  303      1       0       1   26     0   2 121   0      1    2       1        0        1
4  303      1       0       1   33     3   3 145   0      1    2       1        0        1
5 2702      0       0       0   56     0   2 100   .      1    0       1        0        1
6 2702      0       0       0   58     2   3 125   .      1    2       1        0        1

RE 中没有相关性

> m <- lmer(
+   math ~ 0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade + 
+     (0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade || id),# This line and
+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +  
    ((0 + lb_wght0 | id) + (0 + lb_wght1 | id) + (0 + lb_wght0:grade |          id) + (0 + grade:lb_wght1 | id))
   Data: math

REML criterion at convergence: 15932

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.06601 -0.52971 -0.00312  0.53661  2.54121 

Random effects:
 Groups   Name           Variance Std.Dev.
 id       lb_wght0       61.9312  7.8696  
 id.1     lb_wght1       84.7006  9.2033  
 id.2     lb_wght0:grade  0.7044  0.8393  
 id.3     grade:lb_wght1  0.6598  0.8123  
 Residual                36.3585  6.0298  
Number of obs: 2221,groups:  id,932

Fixed effects:
               Estimate Std. Error t value
lb_wght0       35.48689    0.36619   96.91
lb_wght1       32.81813    1.35307   24.25
lb_wght0:grade  4.29283    0.09059   47.39
grade:lb_wght1  4.86449    0.32029   15.19

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.533  0.000       
grd:lb_wgh1  0.000 -0.489  0.000

匹配OP中的相关结构

> m <- lmer(
+   math ~ 0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade + 
+     (0 + lb_wght0 + lb_wght0:grade | id) + 
+     (0 + lb_wght1 +  lb_wght1:grade | id),+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +  
    (0 + lb_wght0 + lb_wght0:grade | id) + (0 + lb_wght1 + lb_wght1:grade |      id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name           Variance Std.Dev. Corr 
 id       lb_wght0       61.1969  7.8228        
          lb_wght0:grade  0.6711  0.8192   0.04 
 id.1     lb_wght1       97.4494  9.8716        
          lb_wght1:grade  1.4314  1.1964   -0.35
 Residual                36.2755  6.0229        
Number of obs: 2221,932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
grade:lb_wght1   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
grd:lb_wgh1  0.000 -0.567  0.000

为了比较,OP的模型:

> math <- within(math,lb_wght <- as.factor(lb_wght))
> m <- lmer(
+   math ~ 0 + lb_wght + lb_wght:grade + 
+     (0 + dummy(lb_wght,"0") + dummy(lb_wght,"0"):grade|id) +  # This line and
+     (0 + dummy(lb_wght,"1") + dummy(lb_wght,"1"):grade|id),# This line
+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght + lb_wght:grade + (0 + dummy(lb_wght,"0") +  
    dummy(lb_wght,"0"):grade | id) + (0 + dummy(lb_wght,"1") +      dummy(lb_wght,"1"):grade | id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name                      Variance Std.Dev. Corr 
 id       dummy(lb_wght,"0")       61.1969  7.8228        
          dummy(lb_wght,"0"):grade  0.6711  0.8192   0.04 
 id.1     dummy(lb_wght,"1")       97.4494  9.8716        
          dummy(lb_wght,"1"):grade  1.4314  1.1964   -0.35
 Residual                           36.2755  6.0229        
Number of obs: 2221,932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
lb_wght1:grade   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
lb_wght1:gr  0.000 -0.567  0.000

编辑:2021-02-08 20:57 UTC

注意这不是所问的问题,而是事后评论中的要求。尽管如此,指标变量的程序生成和使用基数 R 的公式可能很有趣。这可以重写为使用 apply .... 但这似乎是针对手头问题的过早优化。我不认为将这种方法推广到两个以上的层次而不真正考虑该模型的含义是一个好主意,但同样,这是对编程问题的回答,而不是对统计数据的认可它实现的程序

在实验变量中做任意数量的水平

这将自动生成所有指标变量和公式。

> # GOOD LUCK WITH THIS
> library("lme4")
> 
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> grps <- unique(math$lb_wght)
> 
> form <- "math ~ 0"
> 
> for(g in grps){
+   gname <- paste0("lb_wght",g)
+   math[,gname] <- ifelse(math$lb_wght == g,1,0) 
+   term <- paste("0",gname,paste0(gname,":grade"),sep="+")
+   re <- paste0("(",term,"|id)")
+   form <- paste(form,re,sep="+")
+ }
> 
> form <- as.formula(form)
> 
> m <- lmer(form,data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + 0 + lb_wght0 + lb_wght0:grade + (0 + lb_wght0 + lb_wght0:grade |  
    id) + 0 + lb_wght1 + lb_wght1:grade + (0 + lb_wght1 + lb_wght1:grade |      id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name           Variance Std.Dev. Corr 
 id       lb_wght0       61.1969  7.8228        
          lb_wght0:grade  0.6711  0.8192   0.04 
 id.1     lb_wght1       97.4494  9.8716        
          lb_wght1:grade  1.4314  1.1964   -0.35
 Residual                36.2755  6.0229        
Number of obs: 2221,932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
grade:lb_wght1   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
grd:lb_wgh1  0.000 -0.567  0.000