如何在从R中现有列中提取名称的data.frame中添加列?

问题描述

我有DF data.frame。我想添加一个column (i.e.,call it station_no),它将在extrac的{​​{1}}之后number underscore

Variables column

所需的输出

library(lubridate)
library(tidyverse)

set.seed(123)

DF <- data.frame(Date = seq(as.Date("1979-01-01"),to = as.Date("1979-12-31"),by = "day"),Grid_2 = runif(365,1,10),Grid_20 = runif(365,5,15)) %>% 
      pivot_longer(-Date,names_to = "Variables",values_to = "Values")

解决方法

容易的选项是parse_number,它返回数字转换后的值

library(dplyr)
DF %>% 
   mutate(Station_no  = readr::parse_number(Variables))

或使用str_extract(以防我们想按模式使用)

library(stringr)
DF %>%
   mutate(Station_no  = str_extract(Variables,"(?<=_)\\d+"))

或使用base R

DF$Station_no <-  trimws(DF$Variables,whitespace = '\\D+')
,

base R解决方案是:

#Code
DF$Station_no <- sub("^[^_]*_","",DF$Variables)

输出(某些行):

# A tibble: 730 x 4
   Date       Variables Values Station_no
   <date>     <chr>      <dbl> <chr>     
 1 1979-01-01 Grid_2      3.59 2         
 2 1979-01-01 Grid_20    12.8  20        
 3 1979-01-02 Grid_2      8.09 2         
 4 1979-01-02 Grid_20     6.93 20        
 5 1979-01-03 Grid_2      4.68 2         
 6 1979-01-03 Grid_20     5.18 20        
 7 1979-01-04 Grid_2      8.95 2         
 8 1979-01-04 Grid_20     9.07 20        
 9 1979-01-05 Grid_2      9.46 2         
10 1979-01-05 Grid_20     9.83 20        
# ... with 720 more rows