如何删除字母而不是数字之间的空格?

问题描述

我有一些想要转换为表格的字符串。每行中的标识符可以有空格,我需要删除它们而又不要删除数字之间的空格。是否可以使用正则表达式来实现?

例如,数据如下所示:

A B C 5.65 7.8 
DC 5.65 7.8
D AB   7.9  12.2
D AB C  7.9  1.2
A BC 13.88 2.4
AB C  7.9  12.2

我想了解这个:

ABC 5.65 7.8 
DC 5.65 7.8
DAB   7.9  12.2
DABC  7.9  1.2
ABC 13.88 2.4
ABC  7.9  12.2

编辑:根据要求,这是数据类型和接收方式的示例。它有16行,每行有6列数据,但是第一列是字母标识符。

 # Data as I receive it.

data <- c("A","a","2.07","2.35","39.00","82.20","8.8","3.80","B","2.26","2.25","40.00","80.80","8.1","1.86","D","Et","2.22","41.00","83.80","3.87","F","2.05","2.15","43.00","8.4","3.11","Bc","2.08","2.12","48.00","82.60","8.3","2.47","Gf","H","I","2.10","46.00","2.90","J","K","1.95","38.00","83.40","8.7","1.63","L","M","1.89","45.00","9.0","1.84","N","2.06","80.60","4.09","O","P","2.04","81.60","8.6","2.60","Qst","R","2.03","44.00","82.80","1.40","S","2.02","81.40","8.2","1.74","T","2.01","81.80","2.30","Unh","1.96","2.00","9.2","2.40","V","W","C","1.98","1.97","82.00","1.15","Yu","1.90","9.6","Z","bi","42.00","84.20","1.69")
    
# required format

data2 <- c("Aa","DEt","GfHI","JK","LM","OP","QstR","VWC","Zabi","1.69")

df <- data.frame(matrix(data2,ncol=7,byrow=T))

解决方法

要在R环境中按要求执行操作,一种方法是将向量转换为字符串,对字符串应用正则表达式过滤器,然后将字符串转换回向量。

请参阅下面的详细信息,希望这可以为您指明正确的方向。

解决方案

data <- c("A","a","2.07","2.35","39.00","82.20","8.8","3.80","B","2.26","2.25","40.00","80.80","8.1","1.86","D","Et","2.22","41.00","83.80","3.87","F","2.05","2.15","43.00","8.4","3.11","Bc","2.08","2.12","48.00","82.60","8.3","2.47","Gf","H","I","2.10","46.00","2.90","J","K","1.95","38.00","83.40","8.7","1.63","L","M","1.89","45.00","9.0","1.84","N","2.06","80.60","4.09","O","P","2.04","81.60","8.6","2.60","Qst","R","2.03","44.00","82.80","1.40","S","2.02","81.40","8.2","1.74","T","2.01","81.80","2.30","Unh","1.96","2.00","9.2","2.40","V","W","C","1.98","1.97","82.00","1.15","Yu","1.90","9.6","Z","bi","42.00","84.20","1.69")

# Use stringi base regular expression engine
require(stringi)

# Convert the vector data to be a string sequence - so we can manipulate as text
data1 <- toString(data)

# Now we can apply the regular expression substitution to the data (formatted as a string...
# Here we do a:
#
#     (?<!\d) - Negative look behind to prevent a digit.
#,- A literal combination of quotes,comma and space. We drop the "," in conversion to string...
#     (?!\d) - Negative look ahead to prevent a digit.
#
data3 = stri_replace_all_regex(str = data1,pattern = '(?<!\\d),(?!\\d)',replacement = '')
# OK,check the string data...
data3

# Now we convert the string back to be a vector...
newData = strsplit(data3," ")[[1]]
newData 

# Now we convert to a dataframe...
df <- data.frame(matrix(newData,ncol=7,byrow=T))
df
# Done

输出

> data <- c("A",+           "B",+           "Et",+           "2.05",+           "2.12",+           "2.08",+           "1.95",+           "1.89",+           "2.04",+           "2.03",+           "40.00",+           "81.80",+           "9.2",+           "82.00",+           "9.6",+           "84.20","1.69")
> 
> # Use stringi base regular expression engine
> require(stringi)
> 
> # Convert the vector data to be a string sequence - so we can manipulate as text
> data1 <- toString(data)
> 
> # Now we can apply the regular expression substitution to the data (formatted as a string...
> # Here we do a:
> #
> #     (?<!\d) - Negative look behind to prevent a digit.
> #," in conversion to string...
> #     (?!\d) - Negative look ahead to prevent a digit.
> #
> data3 = stri_replace_all_regex(str = data1,replacement = '')
> # OK,check the string data...
> data3
[1] "Aa,2.07,2.35,39.00,82.20,8.8,3.80,B,2.26,2.25,40.00,80.80,8.1,1.86,DEt,2.22,41.00,83.80,3.87,F,2.05,2.15,43.00,8.4,3.11,Bc,2.08,2.12,48.00,82.60,8.3,2.47,GfHI,2.10,46.00,2.90,JK,1.95,38.00,83.40,8.7,1.63,LM,1.89,45.00,9.0,1.84,N,2.06,80.60,4.09,OP,2.04,81.60,8.6,2.60,QstR,2.03,44.00,82.80,1.40,S,2.02,81.40,8.2,1.74,T,2.01,81.80,2.30,Unh,1.96,2.00,9.2,2.40,VWC,1.98,1.97,82.00,1.15,Yu,1.90,9.6,Zabi,42.00,84.20,1.69"
> 
> # Now we convert the string back to be a vector...
> newData = strsplit(data3," ")[[1]]
> newData 
  [1] "Aa,"    "2.07,"  "2.35,"  "39.00," "82.20," "8.8,"   "3.80,"  "B,"     "2.26,"  "2.25,"  "40.00," "80.80,"
 [13] "8.1,"   "1.86,"  "DEt,"   "2.07,"  "2.22,"  "41.00," "83.80,"   "3.87,"  "F,"     "2.05,"  "2.15," 
 [25] "43.00," "8.4,"   "3.11,"  "Bc,"    "2.08,"  "2.12,"  "48.00," "82.60," "8.3,"   "2.47,"  "GfHI," 
 [37] "2.08,"  "2.10,"  "46.00," "8.1,"   "2.90,"  "JK,"    "1.95,"  "2.08,"  "38.00," "83.40," "8.7,"  
 [49] "1.63,"  "LM,"    "1.89,"  "2.07,"  "45.00," "9.0,"   "1.84,"  "N,"     "2.06,"  "2.05,"
 [61] "80.60,"   "4.09,"  "OP,"    "1.86,"  "2.04," "81.60," "8.6,"   "2.60,"  "QstR,"  "1.95," 
 [73] "2.03,"  "44.00," "82.80,"   "1.40,"  "S,"     "2.03,"  "2.02," "81.40," "8.2,"   "1.74," 
 [85] "T,"     "1.95,"  "2.01,"  "43.00," "81.80,"   "2.30,"  "Unh,"   "1.96,"  "2.00,"
 [97] "9.2,"   "2.40,"  "VWC,"   "1.98,"  "1.97," "82.00,"   "1.15,"  "Yu,"    "1.90,"  "1.96," 
[109] "41.00," "9.6,"   "2.08,"  "Zabi,"  "1.90,"  "42.00," "84.20,"   "1.69"  
> 
> # Now we convert to a dataframe...
> df <- data.frame(matrix(newData,byrow=T))
> df
      X1    X2    X3     X4     X5   X6    X7
1    Aa,2     B,3   DEt,4     F,5    Bc,6  GfHI,7    JK,8    LM,9     N,10   OP,11 QstR,12    S,13    T,14  Unh,15  VWC,16   Yu,17 Zabi,1.69
> # Done

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...