为什么lapply不转发其他参数?

问题描述

我有一个很大的Tweets数据集,其中每一行都是一个唯一的Tweet,并且我有一个列表,如果变量 text 中存在一个或多个,我想从这些Tweet中提取关键词。 >。该关键字列表已被编译成Regex表达式(保存在变量 search_key 中),其中包括一些环视条件和其他条件。

如果使用以下代码,则提取字符串的效果很好:

data$keyword <- stri_extract_all(str = data$text,regex = search_key)

但是为了优化/并行化代码,我想使用apply系列中的函数。但是在执行以下任一行时,我总是会收到错误消息,因为 regex 参数没有传递给 stri_extract_all 函数:

data$keyword <- lapply(data$text,FUN = stri_extract_all(),regex = search_key)
data$keyword <- lapply(data$text,regex = get(search_key))
data$keyword <- lapply(data$text,... = "regex=search_key")

此行为的发生与 search_key text 变量的内容无关,因此可以使用任何文本列和任何有效的Regex进行测试。以下数据是我的数据的简化版本,也可以使用:

data <- structure(list(status_id = c(1112765520644894720,1112938379296104448,1112587129622876160,1113006196259196928,1112840488208531456
),text = c("@LaraFukuro more frilly stuff but i actually found a matching carrot bag which also screamed \"LARA\" inside me xD","@EuroMasochismo @VaeVictis @AlbertoBagnai @Comunardo La selezione fatta a dodici anni favorisce chi è seguito. È come selezionare a 4 anni chi deve giocare a pallone proibendolo a tutti gli altri ...","@SignorErnesto @Cr1st14nM3s14n0 @ggargiulo3 @micheleboldrin Sbagliato io.","@BrownResearchGT On Aconcagua,the permit requires climbers above basecamp to collect their waste and carry it back down where it's taken away by helicopter. They actually weigh the bag! And still,most small rocks had human feces underneath. It's a problem!\r\nHopefully @DenaliNPS will follow suit. ","@Jenn198523 Once you silence a person &amp; cover them with a huge trash bag,beating &amp; killing are not far behind."
)),row.names = c(NA,-5L),class = c("tbl_df","tbl","data.frame"
))

search_key <- "(?<=(^|\\s|\\D))([:alnum:]*|@[:alnum:]*|#[:alnum:]*)bag([:alnum:]*)(?=(\\D|\\s|$))"



我犯了什么错误,怎么解决?
当然,也欢迎任何有关优化此类任务的建议。

解决方法

stri_extract_all已在str上向量化。您无需将其包含在lapply中,如果这样做,则会大大降低代码的速度。

data$keyword <- stri_extract_all(data$text,regex = search_key)

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...