识别文件夹名称中的字符串以创建变量字符串

问题描述

希望这对您有帮助。

我有一个使用类似于此约定的csv文件列表,“ SubB1V2timecourses_chanHbO_Cond2_202010281527”

我想合并数据集中的所有文件,并添加ID(B1V2),发色团(在这种情况下为HbO;但其他文件标记为Hbb)等变量;条件(在这种情况下为Cond2,但可以为Cond1-Cond9)。

下面有我当前的功能。到目前为止,我可以读取ID,时间(这是一个单独的Excel文档)和数据。但是,我获得了条件和生色团的NA。字符串规范中缺少我什么?

我们非常感谢您的帮助。

保重并保持健康, 卡罗琳

multmerge <- function(mypath){
  require(stringi)
  require(readxl)
  filenames <- list.files(path=mypath,full.names=TRUE) #path=mypath
  datalist <- lapply(filenames,function(x){
    df <- read.csv(file=x,header= TRUE)
    ID <- unlist(stri_extract_all_regex(toupper(x),"B\\d+"))
   Condition <- unlist(stri_extract_all_regex(tolower(x),"Cond\\d+"))
   Chromophore <- ifelse(stri_detect_regex(toupper(x),"HbO"),"HbO",ifelse(stri_detect_regex(toupper(x),"Hbb"),"Hbb","NA"))
     #ifelse(stri_detect_regex(tolower(x),"nonsocial"),"NonSocial",#  ifelse(stri_detect_regex(tolower(x),"social-inverted"),"social_inverted",# ifelse(stri_detect_regex(tolower(x),"social"),"social","NA")))
   # time <- read_excel("time4hz.xlsx")
    df <- data.frame(ID,time,Condition,Chromophore,df)
    return(df)
  }) # end read-in function
  
  Reduce(function(x,y) {merge(x,y,all = TRUE)},datalist)
}

解决方法

也许您想要类似strcapture的东西?例如,如果您有一个像这样的文件名列表

filenames <- c(
  "/path/to/SubB1V2timecourses_chanHbO_Cond2_202010281527","/path/to/SubB4V9timecourses_chanHbb_Cond7_202010011527"
)

然后

strcapture(
  "Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\\d+",basename(filenames),data.frame(ID = character(),chromophore = character(),condition = character())
)

返回

    ID chromophore condition
1 B1V2         HbO     Cond2
2 B4V9         Hbb     Cond7

与您的multmerge组合在一起:

multmerge <- function(mypath){
  filenames <- list.files(path = mypath,full.names = TRUE) #path=mypath
  metadata <- strcapture(
    "Sub([^_]+)timecourses_chan([^_]+)_([^_]+)_\\d+",condition = character())
  )
  datalist <- lapply(seq_along(filenames),function(i,nms,info) {
    df <- read.csv(file = nms[[i]],header = TRUE)
    data.frame(info[i,],df)
  },filenames,metadata)
  Reduce(function(x,y) {merge(x,y,all = TRUE)},datalist)
}