问题描述
例如:
string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9
string2 <- "80111 is in this string"
#desired_output for string2
1
string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5
解决方法
在这里,我只使用grep
和strsplit
作为基本的R选项:
sapply(input,function(x) grep("\\d+",strsplit(x," ")[[1]]))
Hello I'd like to extract where the first 1010 is in this string
9
80111 is in this string
1
extract where the first 97865 is in this string
5
数据:
input <- c("Hello I'd like to extract where the first 1010 is in this string","80111 is in this string","extract where the first 97865 is in this string")
,
这是一种返回所需输出的方法:
library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string," ",simplify = TRUE))))))
它是这样工作的:
str_split(string,simplify = TRUE) # converts your string to a vector/matrix,splitting at space
as.numeric(...) # tries to convert each element to a number,returning NA when it fails
suppressWarnings(...) # suppresses the warnings generated by as.numeric
!is.na(...) # returns true for the values that are not NA (i.e. the numbers)
which(...) # returns the position for each TRUE values
min(...) # returns the first position
输出:
min(which(!is.na(suppressWarnings(as.numeric(str_split(string1,simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2,simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3,simplify = TRUE))))))
[1] 5
,
这是另一种方法。我们可以在第一个数字的第一位数字之后剪掉其余字符。然后,只需找到最后一个单词的位置即可。 \\b
匹配单词边界,而\\S+
匹配一个或多个非空白字符。
first_numeric_word <- function(x) {
x <- substr(x,1L,regexpr("\\b\\d+\\b",x))
lengths(gregexpr("\\b\\S+\\b",x))
}
输出
> first_numeric_word(x)
[1] 9 1 5
数据
x <- c(
"Hello I'd like to extract where the first 1010 is in this string","extract where the first 97865 is in this string"
)
,
在这里,我将保留完整的tidyverse
方法:
library(purrr)
library(stringr)
map_dbl(str_split(strings," "),str_which,"\\d+")
#> [1] 9 1 5
map_dbl(str_split(strings[1],"\\d+")
#> [1] 9
请注意,它可以同时使用一个和多个字符串。
strings
在哪里:
strings <- c("Hello I'd like to extract where the first 1010 is in this string","extract where the first 97865 is in this string")
,
尝试以下操作:
library(stringr)
position_first_number <- function(string) {
min(which(str_detect(str_split(string,"\\s+",simplify = TRUE),"[0-9]+")))
}
使用示例字符串:
> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5
,
这是使用rapply()
w / grep()
来遍历strsplit()
的结果并使用字符串向量的基本解决方案。
注意:如果要在任何空格而不是文字空间上拆分字符串,请用" "
和fixed = TRUE
(默认)交换"\\s+"
和fixed = FALSE
。 / p>
rapply(strsplit(strings,fixed = TRUE),function(x) grep("[0-9]+",x))
[1] 9 1 5
数据:
strings = c("Hello I'd like to extract where the first 1010 is in this string","extract where the first 97865 is in this string")