有没有办法在Sparklyr中使用ml_linear_regression显示标准错误？

问题描述

使用sparklyr运行线性回归时，例如：

cached_cars %>%
  ml_linear_regression(mpg ~ .) %>%
  summary()

结果不包括标准错误

Deviance Residuals:
     Min       1Q   Median       3Q      Max 
-3.47339 -1.37936 -0.06554  1.05105  4.39057 

Coefficients:
(Intercept) cyl_cyl_8.0 cyl_cyl_4.0        disp          hp        drat
16.15953652  3.29774653  1.66030673  0.01391241 -0.04612835  0.02635025
          wt        qsec          vs          am       gear        carb 
 -3.80624757  0.64695710  1.74738689  2.61726546 0.76402917  0.50935118  

R-Squared: 0.8816
Root Mean Squared Error: 2.041

运行此回归时是否可以显示标准错误？
有没有办法在Sparklyr中对标准错误进行聚类？
我也一直试图在sparklyr中运行具有多个组固定效果的线性模型。在base R中，我使用felm这样做。有没有人有在Sparklyr中进行此操作的经验？

使用SparkR解决方案也受到高度赞赏。

解决方法

我在community.rstudio.com收到了第一个问题的有用答案。

yitaoli的回答如下：

library(sparklyr)

spark_version <- "2.4.4" # This is the version of Spark I ran this example code with,# but I think everything that follows should work in all versions of Spark anyways

sc <- spark_connect(master = "local",version = spark_version)

cached_cars <- copy_to(sc,mtcars)
model <- cached_cars %>%
  ml_linear_regression(mpg ~ .)

coeff_std_errs <- invoke(model$model$.jobj,"summary") %>%
  invoke("coefficientStandardErrors")

print(coeff_std_errs)

我认为您正在寻找的是tidy（）。因此，在您的情况下：

regression1 <- cached_cars %>%
  ml_linear_regression(mpg ~ .) 
tidy(regression1)

关于群集标准错误和固定效果，我不知道。

linear-regression r r sparklyr sparkr

有没有办法在Sparklyr中使用ml_linear_regression显示标准错误？

问题描述

解决方法

相关问答