在字符串[R]中找到第一个数字的位置

问题描述

如何在R中创建一个函数来定位字符串中第一个数字的单词位置?

例如:

string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9

string2 <- "80111 is in this string"
#desired_output for string2
1

string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5

解决方法

在这里,我只使用grepstrsplit作为基本的R选项:

sapply(input,function(x) grep("\\d+",strsplit(x," ")[[1]]))

Hello I'd like to extract where the first 1010 is in this string
                                                               9
                                         80111 is in this string
                                                               1
                 extract where the first 97865 is in this string
                                                               5

数据:

input <- c("Hello I'd like to extract where the first 1010 is in this string","80111 is in this string","extract where the first 97865 is in this string")
,

这是一种返回所需输出的方法:

library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string," ",simplify = TRUE))))))

它是这样工作的:

str_split(string,simplify = TRUE) # converts your string to a vector/matrix,splitting at space

as.numeric(...) # tries to convert each element to a number,returning NA when it fails

suppressWarnings(...) # suppresses the warnings generated by as.numeric

!is.na(...) # returns true for the values that are not NA (i.e. the numbers)

which(...) # returns the position for each TRUE values

min(...) # returns the first position

输出:

min(which(!is.na(suppressWarnings(as.numeric(str_split(string1,simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2,simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3,simplify = TRUE))))))
[1] 5
,

这是另一种方法。我们可以在第一个数字的第一位数字之后剪掉其余字符。然后,只需找到最后一个单词的位置即可。 \\b匹配单词边界,而\\S+匹配一个或多个非空白字符。

first_numeric_word <- function(x) {
  x <- substr(x,1L,regexpr("\\b\\d+\\b",x))
  lengths(gregexpr("\\b\\S+\\b",x))
}

输出

> first_numeric_word(x)
[1] 9 1 5

数据

x <- c(
  "Hello I'd like to extract where  the first 1010 is in this string","extract where the   first  97865 is in this string"
)
,

在这里,我将保留完整的tidyverse方法:

library(purrr)
library(stringr)

map_dbl(str_split(strings," "),str_which,"\\d+")
#> [1] 9 1 5

map_dbl(str_split(strings[1],"\\d+")
#> [1] 9

请注意,它可以同时使用一个和多个字符串。


strings在哪里:

strings <- c("Hello I'd like to extract where the first 1010 is in this string","extract where the first 97865 is in this string")
,

尝试以下操作:

library(stringr)

position_first_number <- function(string) {
  min(which(str_detect(str_split(string,"\\s+",simplify = TRUE),"[0-9]+")))
}

使用示例字符串:

> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
 
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
 
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5
,

这是使用rapply() w / grep()来遍历strsplit()的结果并使用字符串向量的基本解决方案。

注意:如果要在任何空格而不是文字空间上拆分字符串,请用" "fixed = TRUE(默认)交换"\\s+"fixed = FALSE。 / p>

rapply(strsplit(strings,fixed = TRUE),function(x) grep("[0-9]+",x))
[1] 9 1 5

数据

strings = c("Hello I'd like to extract where the first 1010 is in this string","extract where the first 97865 is in this string")