使用NSE构造公式

问题描述

我正在尝试使用NSE构造一个公式,以便可以轻松地在列中进行传递。以下是我想要的用例:

df %>% make_formula(col1,col2,col3)

[1] "col1 ~ col2 + col3"

我首先做了这个功能

varstring <- function(...) {
 as.character(match.call()[-1])
}

这非常适合单个对象或多个对象:

varstring(col)

[1] "col"

varstring(col1,col3)

[1] "col1" "col2" "col3"

我创建我的函数以创建下一个公式:

formula <- function(df,col,...) {
 group <- varstring(col)
 vars <- varstring(...)

 paste(group,"~",paste(vars,collapse = " + "),sep = " ")
}

但是,函数调用formula(df,col1,col3)会产生[1] "group ~ ..1 + ..2 + ..3"

我了解该公式实际上是在评估varstring(group)varstring(...),而实际上并没有像我希望的那样用用户提供的对象进行评估。但是我不知道如何按预期进行这项工作。

解决方法

您可以使用reduce()

将任意数量的参数与二进制函数连接起来
make_formula <- function(lhs,...,op = "+") {
  lhs <- ensym(lhs)
  args <- ensyms(...)

  n <- length(args)

  if (n == 0) {
    rhs <- 1
  } else if (n == 1) {
    rhs <- args[[1]]
  } else {
    rhs <- purrr::reduce(args,function(out,new) call(op,out,new))
  }

  # Don't forget to forward the caller environment
  new_formula(lhs,rhs,env = caller_env())
}

make_formula(disp)
#> disp ~ 1

make_formula(disp,cyl)
#> disp ~ cyl

make_formula(disp,cyl,am,drat)
#> disp ~ cyl + am + drat

make_formula(disp,drat,op = "*")
#> disp ~ cyl * am * drat

使用表达式的一大优势是它对于小型Bobby表(https://xkcd.com/327/)十分健壮:

# User inputs are always interpreted as symbols (variable name)
make_formula(disp,`I(file.remove('~'))`)
#> disp ~ `I(file.remove('~'))`

# With `paste()` + `parse()` user inputs are interpreted as arbitrary code
reformulate(c("foo","I(file.remove('~'))"))
#> ~foo + I(file.remove("~"))
,

我建议使用rlang::enquorlang::as_name来实现:

library(rlang)

formula <- function(df,col,...) {
  group <- enquo(col)
  vars <- enquos(...)

  group_str <- rlang::as_name(group)
  vars_str <- lapply(vars,rlang::as_name)
  
  paste(group_str,"~",paste(vars_str,collapse = " + "),sep = " ")
}

formula(mtcars,col1,col2,col3)
#> [1] "col ~ col1 + col2 + col3"
,

我们可以使用reformulate

formula_fn <- function(dat,...) {
           deparse(reformulate(purrr::map_chr(ensyms(...),rlang::as_string),response = rlang::as_string(ensym(col) )))
      
 }
formula_fn(mtcars,col3)
#[1] "col ~ col1 + col2 + col3"
,

我已经接受了@LionelHenry的建议,并创建了以下函数以及一些我最初提出的问题中未要求的其他功能。

#' Create a formula
#'
#' Creates a new formula object to be used anywhere formulas are used (i.e,`glm`).
#'
#' @param ... any number of arguments to compose the formula
#' @param lhs a boolean indicating if the formula has a left hand side of the argument
#' @param op the operand acting upon the arguments of the right side of the formula.
#' @param group an argument to use as a grouping variable to facet by
#'
#' @return a formula
#'
#' @details If `lhs` is `TRUE`,the first argument provided is used as the left hand side of the formula.
#' The `group` paramenter will add `| group` to the end of the formula. This is useful for packages that support faceting by grouping variables for the purposes of tables or graphs.
#'
#' @export
#'
#' @examples
#' make_formula(var1,var2,var3)
#' make_formula(var1,var3,lhs = FALSE)
#' make_formula(var1,lhs = FALSE,group = var4)
#'
make_formula <- function(...,lhs = TRUE,op = "+",group = NULL) {
  args <- rlang::ensyms(...)
  n <- length(args)
  group <- rlang::enexpr(group)

  if(lhs) {
    left <- args[[1]]
    if (n == 1) {
      right <- 1
    } else if (n == 2) {
      right <- args[[2]]
    } else {
      right <- purrr::reduce(args[-1],new))
    }
  } else {
    left <- NULL
    if (n == 1) {
      right <- args[[1]]
    } else {
      right <- purrr::reduce(args,new))
    }
  }

  if(!is.null(group)) {
    group <- rlang::ensym(group)
    right <- purrr::reduce(c(right,group),new) call("|",new))
  }

  rlang::new_formula(left,right,env = rlang::caller_env()) # Forward to the caller environment
}

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...