问题描述
我有一些想要转换为表格的字符串。每行中的标识符可以有空格,我需要删除它们而又不要删除数字之间的空格。是否可以使用正则表达式来实现?
例如,数据如下所示:
A B C 5.65 7.8
DC 5.65 7.8
D AB 7.9 12.2
D AB C 7.9 1.2
A BC 13.88 2.4
AB C 7.9 12.2
我想了解这个:
ABC 5.65 7.8
DC 5.65 7.8
DAB 7.9 12.2
DABC 7.9 1.2
ABC 13.88 2.4
ABC 7.9 12.2
编辑:根据要求,这是数据类型和接收方式的示例。它有16行,每行有6列数据,但是第一列是字母标识符。
# Data as I receive it.
data <- c("A","a","2.07","2.35","39.00","82.20","8.8","3.80","B","2.26","2.25","40.00","80.80","8.1","1.86","D","Et","2.22","41.00","83.80","3.87","F","2.05","2.15","43.00","8.4","3.11","Bc","2.08","2.12","48.00","82.60","8.3","2.47","Gf","H","I","2.10","46.00","2.90","J","K","1.95","38.00","83.40","8.7","1.63","L","M","1.89","45.00","9.0","1.84","N","2.06","80.60","4.09","O","P","2.04","81.60","8.6","2.60","Qst","R","2.03","44.00","82.80","1.40","S","2.02","81.40","8.2","1.74","T","2.01","81.80","2.30","Unh","1.96","2.00","9.2","2.40","V","W","C","1.98","1.97","82.00","1.15","Yu","1.90","9.6","Z","bi","42.00","84.20","1.69")
# required format
data2 <- c("Aa","DEt","GfHI","JK","LM","OP","QstR","VWC","Zabi","1.69")
df <- data.frame(matrix(data2,ncol=7,byrow=T))
解决方法
要在R环境中按要求执行操作,一种方法是将向量转换为字符串,对字符串应用正则表达式过滤器,然后将字符串转换回向量。
请参阅下面的详细信息,希望这可以为您指明正确的方向。
解决方案
data <- c("A","a","2.07","2.35","39.00","82.20","8.8","3.80","B","2.26","2.25","40.00","80.80","8.1","1.86","D","Et","2.22","41.00","83.80","3.87","F","2.05","2.15","43.00","8.4","3.11","Bc","2.08","2.12","48.00","82.60","8.3","2.47","Gf","H","I","2.10","46.00","2.90","J","K","1.95","38.00","83.40","8.7","1.63","L","M","1.89","45.00","9.0","1.84","N","2.06","80.60","4.09","O","P","2.04","81.60","8.6","2.60","Qst","R","2.03","44.00","82.80","1.40","S","2.02","81.40","8.2","1.74","T","2.01","81.80","2.30","Unh","1.96","2.00","9.2","2.40","V","W","C","1.98","1.97","82.00","1.15","Yu","1.90","9.6","Z","bi","42.00","84.20","1.69")
# Use stringi base regular expression engine
require(stringi)
# Convert the vector data to be a string sequence - so we can manipulate as text
data1 <- toString(data)
# Now we can apply the regular expression substitution to the data (formatted as a string...
# Here we do a:
#
# (?<!\d) - Negative look behind to prevent a digit.
#,- A literal combination of quotes,comma and space. We drop the "," in conversion to string...
# (?!\d) - Negative look ahead to prevent a digit.
#
data3 = stri_replace_all_regex(str = data1,pattern = '(?<!\\d),(?!\\d)',replacement = '')
# OK,check the string data...
data3
# Now we convert the string back to be a vector...
newData = strsplit(data3," ")[[1]]
newData
# Now we convert to a dataframe...
df <- data.frame(matrix(newData,ncol=7,byrow=T))
df
# Done
输出
> data <- c("A",+ "B",+ "Et",+ "2.05",+ "2.12",+ "2.08",+ "1.95",+ "1.89",+ "2.04",+ "2.03",+ "40.00",+ "81.80",+ "9.2",+ "82.00",+ "9.6",+ "84.20","1.69")
>
> # Use stringi base regular expression engine
> require(stringi)
>
> # Convert the vector data to be a string sequence - so we can manipulate as text
> data1 <- toString(data)
>
> # Now we can apply the regular expression substitution to the data (formatted as a string...
> # Here we do a:
> #
> # (?<!\d) - Negative look behind to prevent a digit.
> #," in conversion to string...
> # (?!\d) - Negative look ahead to prevent a digit.
> #
> data3 = stri_replace_all_regex(str = data1,replacement = '')
> # OK,check the string data...
> data3
[1] "Aa,2.07,2.35,39.00,82.20,8.8,3.80,B,2.26,2.25,40.00,80.80,8.1,1.86,DEt,2.22,41.00,83.80,3.87,F,2.05,2.15,43.00,8.4,3.11,Bc,2.08,2.12,48.00,82.60,8.3,2.47,GfHI,2.10,46.00,2.90,JK,1.95,38.00,83.40,8.7,1.63,LM,1.89,45.00,9.0,1.84,N,2.06,80.60,4.09,OP,2.04,81.60,8.6,2.60,QstR,2.03,44.00,82.80,1.40,S,2.02,81.40,8.2,1.74,T,2.01,81.80,2.30,Unh,1.96,2.00,9.2,2.40,VWC,1.98,1.97,82.00,1.15,Yu,1.90,9.6,Zabi,42.00,84.20,1.69"
>
> # Now we convert the string back to be a vector...
> newData = strsplit(data3," ")[[1]]
> newData
[1] "Aa," "2.07," "2.35," "39.00," "82.20," "8.8," "3.80," "B," "2.26," "2.25," "40.00," "80.80,"
[13] "8.1," "1.86," "DEt," "2.07," "2.22," "41.00," "83.80," "3.87," "F," "2.05," "2.15,"
[25] "43.00," "8.4," "3.11," "Bc," "2.08," "2.12," "48.00," "82.60," "8.3," "2.47," "GfHI,"
[37] "2.08," "2.10," "46.00," "8.1," "2.90," "JK," "1.95," "2.08," "38.00," "83.40," "8.7,"
[49] "1.63," "LM," "1.89," "2.07," "45.00," "9.0," "1.84," "N," "2.06," "2.05,"
[61] "80.60," "4.09," "OP," "1.86," "2.04," "81.60," "8.6," "2.60," "QstR," "1.95,"
[73] "2.03," "44.00," "82.80," "1.40," "S," "2.03," "2.02," "81.40," "8.2," "1.74,"
[85] "T," "1.95," "2.01," "43.00," "81.80," "2.30," "Unh," "1.96," "2.00,"
[97] "9.2," "2.40," "VWC," "1.98," "1.97," "82.00," "1.15," "Yu," "1.90," "1.96,"
[109] "41.00," "9.6," "2.08," "Zabi," "1.90," "42.00," "84.20," "1.69"
>
> # Now we convert to a dataframe...
> df <- data.frame(matrix(newData,byrow=T))
> df
X1 X2 X3 X4 X5 X6 X7
1 Aa,2 B,3 DEt,4 F,5 Bc,6 GfHI,7 JK,8 LM,9 N,10 OP,11 QstR,12 S,13 T,14 Unh,15 VWC,16 Yu,17 Zabi,1.69
> # Done