将“ summaryise”dplyr与需要公式的函数一起使用

问题描述

我正在尝试生成一个基于mblm包的自定义函数生成的回归斜率表(此处示例中的函数为简化版本)。该函数需要一个公式作为参数,我想使用dplyr summary将其应用于具有多个变量的大型数据框中的分组样本。 输出应该是样本组和我可以传递给热图函数的响应变量的回归斜率。

library (dplyr)

# Example data

test_data <- 
    rbind (
        data.frame(ID=paste0("someName",c(1:9)),Sample_Type="type1",A=seq(1,17,length.out=9),I=0.1^seq(1,1.8,J=1-0.1^seq(1,length.out=9)),data.frame(ID=paste0("someName",c(10:15)),Sample_Type="type2",7,length.out=6),I=0.1^(1-seq(1,1.5,length.out=6)),J=1-0.1^(1-seq(1,length.out=6))))

# Define an independent and the responding variables - I would like to be able to easily test different independent variables
 
idpVar <- "A"
respVar <- test_data %>% .[!names(.) %in% c("ID","Sample_Type",idpVar)] %>% names()

# Custom function generating numeric value of median slopes (simplified from mblm)

medianSlope <-
function (formula,dataframe) 
{
    if (missing(dataframe)) 
        dataframe <- environment(formula)
    term <- as.character(attr(terms(formula),"variables")[-1])
    x = dataframe[[term[2]]]
    y = dataframe[[term[1]]]
    if (length(term) > 2) {
        stop("Only linear models are accepted")
    }
    xx = sort(x)
    yy = y[order(x)]
    n = length(xx)
    slopes = c()
    smedians = c()
        for (i in 1:n) {
            slopes = c()
            for (j in 1:n) {
                if (xx[j] != xx[i]) {
                  slopes = c(slopes,(yy[j] - yy[i])/(xx[j] - 
                    xx[i]))
                 }
            }
            smedians = c(smedians,median(slopes))
        }
        slope = median(smedians)
    
    slope
}

# Custom function works with test dataframe and a single named dependent variable but "group_by" seems to be ignored:

test_data %>% group_by (Sample_Type) %>% medianSlope( formula(paste("J","~",idpVar)),.)

暂时不考虑分组问题,我试图通过生成多个公式的列表来使“汇总”工作:

粘贴(respVar,“〜”,idpVar) [1]“ B〜A”“ C〜A”“ D〜A”“ E〜A”“ F〜A”“ G〜A”“ H〜A”“ I〜A”“ J〜A”“ K 〜A“” L〜A“

但是

test_data%>%summarise_at(respVar,中位数斜率(粘贴(respVar,“〜”,idpVar),。))

错误:$运算符对于原子向量无效

test_data%>%summarise_at(respVar,位数Slope(paste(get(respVar),“〜”,get(idpVar)),。))

get(idpVar)错误:找不到对象'A'

我对R比较陌生,有点迷路。你能帮忙吗?

谢谢你,托马斯

解决方法

我不确定是否可以使用summarise_at函数来完成此操作。但是,我们可以结合使用map_dblby和其他一些数据清除功能来执行计算:

library(tidyverse)

# split the data using `by` (acts as a group_by)
# use `map_dbl` to iterate over the variables in respVar
# we use setNames so that the returned vector from map_dbl is named
# then,bind the rows together,convert to data frame
# finally convert row names (groups) to a column
by(test_data,test_data$Sample_Type,FUN = function(d) map_dbl(setNames(respVar,respVar),~medianSlope(formula(paste(.x,"~",idpVar)),data = d))) %>%
    do.call("rbind",.) %>%
    as.data.frame() %>%
    rownames_to_column(var = "Sample_Type")

  Sample_Type            I            J
1       type1 -0.004623987  0.004623987
2       type2  0.341974269 -0.341974269
,

弹跳球,再次感谢您的帮助。似乎确实“摘要”和“变异”不能调用使用公式作为输入的函数,尽管我在其他地方都没有看到它的解释。解决方法是有启发性的,但我遵循了您的其他建议并重写了所调用的函数。 仍然是一名学习者,我面临挑战,要替换源自mblm的代码中的“ for”循环,并消除似乎多余的计算(以对RAM的更高要求为代价,但对于我来说,运行速度仍然要快得多)数据存储在我的PC上,并且我计划在开发代码的下一步中重新使用dx矩阵)。这两个解决方案如下。 干杯,托马斯

mblm_2_short <-        # code adapted from mblm(y ~ x,repeated = T),for calculation of repeat median slope only 
function (x,y) 
{
xx = sort(x)
yy = y[order(x)]
n = length(xx)
slopes = c()
smedians = c()

    for (i in 1:n) {
        slopes = c()
        for (j in 1:n) {
            if (xx[j] != xx[i]) {
              slopes = c(slopes,(yy[j] - yy[i])/(xx[j] - 
                xx[i]))
             }
        }
        smedians = c(smedians,median(slopes))
    }
    slope = median(smedians)
}

med_slopesMed <-      # repeat median slope- like mblm(y ~ x,slope only                     
function (xx,yy) 
{
  x = sort(xx)
  y = yy[order(xx)]
  n = length(x)
  
  dx = matrix (rep (0,n^2),ncol=n)
  dy = c()
  z  = matrix (rep (0,ncol=n)
  
  for (i in 1:(n-1)) {                   ### x-axis distances (dx) and slopes (z) between points
    dxi             = x[-(1:i)]-x[i]
    dx [i,(i+1):n] = dxi                # for points 1:n,x-axis distances to all other points
    dyi             = y[-(1:i)]-y[i]
    zi              = dyi/dxi           
    z [i,(i+1):n]  = zi                 # for points 1:n,linear slopes connecting with all other points
  }
  
  z = replace(z,is.infinite(z),NA)           # removes +/-Inf and NaN generated by dx=0
  z  = t(z)[,-n] + z[,-1]

  median (apply(z,1,median,na.rm=T)) 
}

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...