问题描述
能帮我吗?
我正在尝试修改一位同事编写的 R函数。该函数接收一个具有科学名称(拉丁二元组)的字符向量,就像这样:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
然后,它应该缩写科学名称,仅使用属的前三个字母(第一个术语)和上位词(第二个术语)来编写短代码。例如, Serradomys scotti 应该成为 Cersco 。
这是原始功能:
Abbreviatednames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector," ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),splitnames[[i]][2],sep = "")
}
else {
paste(substr(splitnames[[i]][1],substr(splitnames[[i]][2],sep = "")
}
}
vector
}
使用类似这样的简单列表,该功能可以完美运行。但是,当列表中缺少一些元素或多余元素时,它将不起作用。当循环遇到与模式不匹配的第一行时,循环停止。让我们以这个更复杂的列表为例:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis
看到 Morfosp1 只有1个词。并且 Vismia cf brasiliensis 在中间还有一个附加词(cf)。
例如,我尝试通过以下方式调整功能:
Abbreviatednames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector," ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
paste(substr(splitnames[[i]][1],sep = "")
}
}
vector
}
但是,它不起作用。我收到此错误消息:
Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { :
valor ausente onde TRUE/FALSE necessário
我该如何创建函数:
预期结果:Morfosp1-> Morfosp1(保持不变)
预期结果:Vismia cf brasiliensis-> Visbra(中间的术语被忽略)
非常感谢您!
解决方法
这样的事情很简洁:
test <- c("Cerradomys scotti","Oligoryzomys sp","Latingstuff","Latin staff more")
# function to truncate a given name
trunc_str <- function(latin_name) {
# split it on a space
name_split <- unlist(strsplit(latin_name," ",fixed = TRUE))
# if one name,just return it
if (length(name_split) == 1) return(name_split)
# truncate to first 3 letters
name_trunc <- substr(name_split,1,3)
# paste the first and last term together (skipping any middle ones)
paste0(head(name_trunc,1),tail(name_trunc,1))
}
# iterate over all
vapply(test,trunc_str,"")
# Cerradomys scotti Oligoryzomys sp Latingstuff Latin staff more
# "Cersco" "Olisp" "Latingstuff" "Latmor"
如果您不想输出命名矢量,则可以在USE.NAMES = FALSE
中使用vapply()
。或随时在此处使用循环。
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector," ")
for (i in 1:length(vector)){
# One name
if(length(splitnames[[i]])==1){
vector[i] <- paste(substr(splitnames[[i]][1],3),substr(splitnames[[i]][2],sep = "")
}
# Two names
else if(length(splitnames[[i]])==2){
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],splitnames[[i]][2],sep = "")
}
else {
paste(substr(splitnames[[i]][1],sep = "")
}
}
# Three names
else if(length(splitnames[[i]])==3){
vector[i] <- paste(substr(splitnames[[i]][1],substr(splitnames[[i]][3],sep = "")
# Assuming that the unwanted word is always in the middle
}
}
return(vector)
}
我在您提供的列表上进行了测试,它似乎可以正常工作,请告诉我是否需要更通用的代码
,Ricardo和Adam非常感谢您的帮助!我已经在GitHub上向其他使用交互网络的人提供了该代码,并且需要缩写科学名称以用于图形中。