[.data.table如何防止在S3方法分派之前评估i和j？

问题描述

Hello数据表社区，

我正在尝试在S4类中公开data.table API，以便在PharmacoGx R软件包中使用。我已经阅读了data.table的源代码，似乎[运算符是使用[.data.table S3方法实现的。

这里是源代码为https://github.com/bhklab/PharmacoGx/blob/drug_combos/R/class-LongTable.R

的GitHub存储库的链接。

我创建了自己的S3方法[.long.table，并将S3类添加为“ LongTable” S4类的组成部分。

当前实现为：

'[.long.table' <- function(x,i,j) { eval(substitute(subset(x,j)) }

类似于[.data.table中的操作，我使用subset.long.table内部的替代项捕获i和j，以便稍后可以在正确的上下文中对它们进行评估，在这种情况下，它们位于数据内部。 rowData和colData插槽中的表格对象。

当我直接调用该方法时，一切正常：

`[.long.table`(longTable,cellLine1 == 'VCAP')

但是当我尝试使用[作为运算符时，出现错误消息，找不到cellLine1对象：

longTable[cellLine1 == 'VCAP']

经过反复试验，我确定此错误是在调用[.long.table之前引发的。我认为这是在对[通用方法的调用中，该方法试图在选择方法之前评估i和j参数。

data.table的API有效，因此显然可以解决此问题。我希望有人能启发我，以便我可以为新的S4类使用此功能。

其他信息：

我的班级定义如下：

#' Define an S3 Class
#'
#' Allows use of S3 methods with new S4 class. This is required to overcome
#' limitations of the `[` S4 method.
#'
setOldClass('long.table')

#' LongTable class definition
#'
#' Define a private constructor method to be used to build a `LongTable` object.
#'
#' @param drugs [`data.table`]
#' @param cells [`data.table`]
#' @param assays [`list`]
#' @param metadata [`list`]
#'
#'
#' @return [`LongTable`] object containing the assay data from a
#'
#' @import data.table
#' @keywords internal
.LongTable <- setClass("LongTable",slots=list(rowData='data.table',colData='data.table',assays='list',metadata='list',.intern='environment'),contains='long.table')

#' LongTable constructor method
#'
#' @param rowData [`data.table`,`data.frame`,`matrix`] A table like object
#'   coercible to a `data.table` containing the a unique `rowID` column which
#'   is used to key assays,as well as additional row metadata to subset on.
#' @param rowIDs [`character`,`integer`] A vector specifying
#'   the names or integer indexes of the row data identifier columns. These
#'   columns will be pasted together to make up the row.names of the
#'   `LongTable` object.
#' @param colData [`data.table`,`matrix`] A table like object
#'   coercible to a `data.table` containing the a unique `colID` column which
#'   is used to key assays,as well as additional column metadata to subset on.
#' @param colIDs [`character`,`integer`] A vector specifying
#'   the names or integer indexes of the col data identifier columns. These
#'   columns will be pasted together to make up the col.names of the
#'   `LongTable` object.
#' @param assays A [`list`] containing one or more objects coercible to a
#'   `data.table`,and keyed by rowID and colID corresponding to the rowID and
#'   colID columns in colData and rowData.
#' @param metadata A [`list`] of metadata associated with the `LongTable`
#'   object being constructed
#' @param keep.rownames [`logical` or `character`] Logical: whether rownames
#'   should be added as a column if coercing to a `data.table`,default is FALSE.
#'   If TRUE,rownames are added to the column 'rn'. Character: specify a custom
#'   column name to store the rownames in.
#'
#' @return [`LongTable`] object
#'
#' @import data.table
#' @export
LongTable <- function(rowData,rowIDs,colData,colIDs,assays,metadata=list(),keep.rownames=FALSE) {

    ## TODO:: Handle missing parameters

    if (!is(colData,'data.table')) {
        colData <- data.table(colData,keep.rownames=keep.rownames)
    }

    if (!is(rowData,'data.table')) {
        rowData <- data.table(rowData,keep.rownames=keep.rownames)
    }

    if (!all(vapply(assays,FUN=is.data.table,FUN.VALUE=logical(1)))) {
        tryCatch({
            assays <- lapply(assays,FUN=data.table,keep.rownames=keep.rownames)
        },warning = function(w) {
            warning(w)
        },error = function(e,assays) {
            message(e)
            types <- lapply(assays,typeof)
            stop(paste0('List items are types: ',paste0(types,collapse=','),'\nPlease ensure all items in the assays list are
                        coerced to data.tables!'))
        })
    }

    # Initialize the .internals object to store private metadata for a LongTable
    internals <- new.env()

    ## TODO:: Implement error handling
    internals$rowIDs <-
        if (is.numeric(rowIDs) && max(rowIDs) < ncol(rowData))
            rowIDs
        else
            which(colnames(rowData) %in% rowIDs)
    lockBinding('rowIDs',internals)

    internals$colIDs <-
        if (is.numeric(colIDs) && max(colIDs) < ncol(colData))
            colIDs
        else
            which(colnames(colData) %in% colIDs)
    lockBinding('colIDs',internals)

    # Assemble the pseudo row and column names for the LongTable
    .pasteColons <- function(...) paste(...,collapse=':')
    rowData[,`:=`(.rownames=mapply(.pasteColons,transpose(.SD))),.SDcols=internals$rowIDs]
    colData[,`:=`(.colnames=mapply(.pasteColons,.SDcols=internals$colIDs]

    return(.LongTable(rowData=rowData,colData=colData,assays=assays,metadata=metadata,.intern=internals))
}

我有一个有效的subset方法，使用S3和S4实现：

#' Subset method for a LongTable object.
#'
#' Allows use of the colData and rowData `data.table` objects to query based on
#'  rowID and colID,which is then used to subset all value data.tables stored
#'  in the dataList slot.
#'
#' This function is endomorphic,it always returns a LongTable object.
#'
#' @param x [`LongTable`] The object to subset.
#' @param rowQuery [`character`,`numeric`,`logical` or `expression`]
#'  Character: pass in a character vector of drug names,which will subset the
#'      object on all row id columns matching the vector.
#'
#'  Numeric or Logical: these select based on the rowKey from the `rowData`
#'      method for the `LongTable`.
#'
#'  Expression: Accepts valid query statements to the `data.table` i parameter,#'      this can be used to make complex queries using the `data.table` API
#'      for the `rowData` data.table.
#'
#' @param columnQuery [`character`,which will subset the
#'      object on all drug id columns matching the vector.
#'
#'  Numeric or Logical: these select base don the rowID from the `rowData`
#'      method for the `LongTable`.
#'
#'  Expression: Accepts valid query statements to the `data.table` i parameter,#'      this can be used to make complex queries using the `data.table` API
#'      for the `rowData` data.table.
#'
#' @param values [`character`,`numeric` or `logical`] Optional list of value
#'      names to subset. Can be used to subset the dataList column further,#'      returning only the selected items in the new LongTable.
#'
#' @return [`LongTable`] A new `LongTable` object subset based on the specified
#'      parameters.
#'
#' @importMethodsFrom BiocGenerics subset
#' @import data.table
#' @export
subset.long.table <- function(x,rowQuery,columnQuery,assays) {

    longTable <- x
    rm(x)

    if (!missing(rowQuery)) {
        if (tryCatch(is.character(rowQuery),error=function(e) FALSE)) {
            select <- grep('^cellLine[:digit:]*',colnames(rowData(longTable)),value=TRUE)
            rowQueryString <- paste0(paste0(select,' %in% ',.variableToCodeString(rowQuery)),collapse=' | ')
            rowQuery <- str2lang(rowQueryString)
        } else {
            rowQuery <- substitute(rowQuery)
        }
        rowDataSubset <- rowData(longTable)[eval(rowQuery),]
    } else {
        rowDataSubset <- rowData(longTable)
    }

    if (!missing(columnQuery)) {
        if (tryCatch(is.character(columnQuery),error=function(e) FALSE)) {
            select <- grep('^drug[:digit:]*',colnames(colData(longTable)),value=TRUE)
            columnQueryString <- paste0(paste0(select,.variableToCodeString(columnQuery)),collapse=' | ')
            columnQuery <- str2lang(columnQueryString)
        } else {
            columnQuery <- substitute(columnQuery)
        }
        colDataSubset <- colData(longTable)[eval(columnQuery),]
    } else {
        colDataSubset <- colData(longTable)
    }

    rowKeys <- rowDataSubset$rowKey
    colKeys <- colDataSubset$colKey

    if (missing(assays)) { assays <- assayNames(longTable) }
    keepAssays <- assayNames(longTable) %in% assays

    assayData <- lapply(assays(longTable)[keepAssays],FUN=.filterLongDataTable,indexList=list(rowKeys,colKeys))

    return(LongTable(colData=colDataSubset,[email protected]$colIDs,rowData=rowDataSubset,[email protected]$rowIDs,assays=assayData,metadata=metadata(longTable)))
}

#' S4 Method for subset.long.table
#'
#' @export
setMethod('subset','LongTable',subset.long.table)

我的sessionInfo：

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8    LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] crayon_1.3.4                SummarizedExperiment_1.19.6 DelayedArray_0.15.7         matrixStats_0.56.0         
 [5] Matrix_1.2-18               Biobase_2.49.0              GenomicRanges_1.41.6        GenomeInfoDb_1.25.10       
 [9] IRanges_2.23.10             S4Vectors_0.27.12           BiocGenerics_0.35.4         data.table_1.13.0          

loaded via a namespace (and not attached):
[1] lattice_0.20-41        bitops_1.0-6           grid_4.0.2             zlibbioc_1.35.0        XVector_0.29.3        
[6] tools_4.0.2            RCurl_1.98-1.2         compiler_4.0.2         GenomeInfoDbData_1.2.3

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

lazy-evaluation s4 subset