问题描述
你好我的数据集df
如下
pateint_id NAME A country
1001 kam 0..8 IND
1002 kam 0..8 IND
1003 kam 1.2. IND
1004 sat 5.4 ( 6.30 PM ) IND
1005 sat 0.6 {2.00 AM} IND
1006 sat 1-0 IND
1007 bas 76 MMOL IND
1008 bas 2.3 (Re-Checked) IND
1009 bas 72 MMOL \L IND
1010 bas <0.3 IND
我希望输出成为
pateint_id NAME A country
1001 kam 0.8 IND
1002 kam 0.8 IND
1003 kam 1.2 IND
1004 sat 5.4 IND
1005 sat 0.6 IND
1006 sat 1 IND
1007 bas 76 IND
1008 bas 2.3 IND
1009 bas 72 IND
1010 bas 0.3 IND
我尝试了使用特定列的gsub,但结果为NA
df$A <- as.numeric(as.character(gsub('[a-zA-Z]',"",df$A)))
预先感谢.....
解决方法
如果用单个点替换前两个值中的两个点,则可以直接使用parse_number
中的readr
来获取数字格式的数据。
readr::parse_number(sub('\\.{1,}','.',df$A))
#[1] 0.8 0.8 1.2 5.4 0.6 1.0 76.0 2.3 72.0 0.3
或使用str_extract
:
as.numeric(stringr::str_extract(sub('\\.{1,df$A),'\\d+\\.?\\d?'))
数据
df <- structure(list(pateint_id = 1001:1010,NAME = c("kam","kam","sat","bas","bas"),A = c("0..8","0..8","1.2.","5.4 ( 6.30 PM )","0.6 {2.00 AM}","1-0","76 MMOL","2.3 (Re-Checked)","72 MMOL \\L","<0.3"),country = c("IND","IND","IND"
)),class = "data.frame",row.names = c(NA,-10L))