问题描述
我无法通过正则表达式获取键值对
到目前为止的代码:
String raw = '''
MA1
D. Mueller Gießer
MA2 Peter
Mustermann 2. Mann
MA3 Ulrike Mastorius Schmelzer
MA4 Heiner Becker
s 3.Mann
MA5 Rudolf Peters
Gießer
'''
Map map = [:]
ArrayList<String> split = raw.findAll("(MA\\d)+(.*)"){ full,name,value -> map[name] = value }
println map
输出为: [MA1:,MA2:彼得,MA3:乌尔里克·马斯托里乌斯·施梅泽,MA4:海纳·贝克尔,MA5:鲁道夫·彼得斯]
在我的情况下,关键是: MA1,MA2,MA3,MA \ d(因此MA带有任意一位数字)
在下一个键出现之前,值绝对是所有内容(包括换行符,制表符,空格等)
有人知道如何执行此操作吗?
预先感谢, 塞巴斯蒂安
解决方法
您可以在第二组中捕获键之后的所有内容以及所有不以键开头的行
^(MA\d+)(.*(?:\R(?!MA\d).*)*)
模式匹配
-
^
字符串的开头 -
(MA\d+)
捕获匹配MA和1位以上数字的第1组 -
(
捕获第2组-
.*
匹配其余行 -
(?:\R(?!MA\d).*)*
匹配所有不以MA开头且后跟数字的行,其中\R
匹配任何unicode换行符序列
-
-
)
关闭第2组
在Java中,转义的反斜杠加倍
final String regex = "^(MA\\d+)(.*(?:\\R(?!MA\\d).*)*)";
,
使用
(?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)
请参见proof。
说明
EXPLANATION
--------------------------------------------------------------------------------
(?ms) set flags for this block (with ^ and $
matching start and end of line) (with .
matching \n) (case-sensitive) (matching
whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\z the end of the string
--------------------------------------------------------------------------------
) end of look-ahead