R函数组成，用于替换数据帧中的值使用词汇闭包概括任何谓词功能

问题描述

给出以下可重复的示例

我的目标是在数据帧的相邻列中用NA行替换原始值；我知道这是一个已经发布的问题（有很多变体），但是我还没有找到我要实现的方法的解决方案：即通过应用函数组合

在可重现的示例中，驱动用原始值的NA替换的列为a列。

这是我到目前为止所做的

最后一个代码段是我实际上正在搜索的内容的失败尝试...

#-----------------------------------------------------------
# ifelse approach,it works but...
# it's error prone: i.e. copy and paste for all columns can introduce a lot of troubles

df<-data.frame(a=c(1,2,NA),b=c(3,NA,4),c=c(NA,5,6))
df

df$b<-ifelse(is.na(df$a),df$b)
df$c<-ifelse(is.na(df$a),df$c)

df

#--------------------------------------------------------
# extraction and subsitution approach
# same as above

df<-data.frame(a=c(1,6))
df

df$b[is.na(df$a)]<-NA
df$c[is.na(df$a)]<-NA

df

#----------------------------------------------------------
# deFinition of a function
# it's a bit better,but still error prone because of the copy and paste

df<-data.frame(a=c(1,6))
df

fix<-function(x,y){
  ifelse(is.na(x),y)
}

df$b<-fix(df$a,df$b)
df$c<-fix(df$a,df$c)

df

#------------------------------------------------------------
# this approach is not working as expected!
# the idea behind is of function composition;
# lapply does the fix to some columns of data frame

df<-data.frame(a=c(1,6))
df

fix2<-function(x){
  x[is.na(x[1])]<-NA
  x
}

df[]<-lapply(df,fix2)

df

此特定方法有帮助吗？我坚持如何正确构思传递给lapply的替代函数

感谢

解决方法

尝试使用此功能，在输入中您拥有原始数据集，在输出中已清理的数据集：

输入

df<-data.frame(a=c(1,2,NA),b=c(3,NA,4),c=c(NA,5,6))
> df
   a  b  c
1  1  3 NA
2  2 NA  5
3 NA  4  6

功能

   fix<-function(df,var_x,list_y)
{
   df[is.na(df[,var_x]),list_y]<-NA
   return(df)
}

输出

fix(df,"a",c("b","c"))
   a  b  c
1  1  3 NA
2  2 NA  5
3 NA NA NA

使用词汇闭包

如果使用词法闭包-定义一个函数，该函数首先生成所需的函数。然后，您可以根据需要使用此功能。

# given a column all other columns' values at that row should become NA
# if the driver column's value at that row is NA

# using lexical scoping of R function definitions,one can reach that.

df<-data.frame(a=c(1,6))
df

# whatever vector given,this vector's value should be changed
# according to first column's value

na_accustomizer <- function(df,driver_col) {
  ## Returns a function which will accustomize any vector/column
  ## to driver column's NAs
  function(vec) {
    vec[is.na(df[,driver_col])] <- NA
    vec
  }
}

df[] <- lapply(df,na_accustomizer(df,"a"))

df
##    a  b  c
## 1  1  3 NA
## 2  2 NA  5
## 3 NA NA NA

# 
# na_accustomizer(df,"a") returns
# 
#   function(vec) {
#     vec[is.na(df[,"a"])] <- NA
#     vec
#   }
# 
# which then can be used like you want:
# df[] <- lapply(df,na_accustomize(df,"a"))

使用常规功能

df<-data.frame(a=c(1,6))
df

# define it for one column
overtake_NA <- function(df,driver_col,target_col) {
  df[,target_col] <- ifelse(is.na(df[,driver_col]),df[,target_col])
  df
}

# define it for all columns of df
overtake_driver_col_NAs <- function(df,driver_col) {
  for (i in 1:ncol(df)) {
    df <- overtake_NA(df,i)
  }
  df
}

overtake_driver_col_NAs(df,"a")
#    a  b  c
# 1  1  3 NA
# 2  2 NA  5
# 3 NA NA NA

概括任何谓词功能

driver_col_to_other_cols <- function(df,pred) {
  ## overtake any value of the driver column to the other columns of df,## whenever predicate function (pred) is fulfilled.
  # define it for one column
  overtake_ <- function(df,target_col,pred) {
    selectors <- do.call(pred,list(df[,driver_col]))
    if (deparse(substitute(pred)) != "is.na") {
      # this is to 'recorrect' NA's which intrude into the selector vector
      # then driver_col has NAs. For sure "is.na" is not the only possible
      # way to check for NA - so this edge case is not covered fully
      selectors[is.na(selectors)] <- FALSE
    }
    df[,target_col] <- ifelse(selectors,driver_col],target_col])
    df
  }
  for (i in 1:ncol(df)) {
    df <- overtake_(df,i,pred)
  }
  df
}


driver_col_to_other_cols(df,function(x) x == 1)
#    a  b c
# 1  1  1 1
# 2  2 NA 5
# 3 NA  4 6

## if the "is.na" check is not done,then this would give
## (because of NA in selectorvector):
#    a  b  c
# 1  1  1  1
# 2  2 NA  5
# 3 NA NA NA
## hence in the case that pred doesn't check for NA in 'a',## these NA vlaues have to be reverted to the original columns' value.

driver_col_to_other_cols(df,is.na)
#    a  b  c
# 1  1  3 NA
# 2  2 NA  5
# 3 NA NA NA

composition composition composition function function function r r