问题描述
我在一个文件中有 16k 行文本。
使用 Notepad++
每隔几行就会出现这样一个:
a few rows of text before
Unique Identifier
628612012-078
Title
another few rows of text until the next Unique Identifier row
再多写几行,然后用不同的数字再次重复
a few rows of text before
Unique Identifier
1991-18613-001
Title
another few rows of text until the next Unique Identifier row
数据图片:
获取(复制/保存)位于每个 Unique Identifier
和 Title
标签/行之间的每个 ID 编号的正则表达式是什么样的?
我不介意它是否删除文件中的其余文本或将输出另存为另一个文件或其他任何内容。理想情况下,我只需要按顺序列出这些数字。
试过这个/调整这个Find and copy text between { and } in Notepad++ - 无法让它工作
解决方法
如果文件有 Unix(LF) 行结尾,则正则表达式为
(?<=Unique Identifier\n).+(?=\nTitle)
然后使用 Mark All 和 Copy Marked Text 将所有数学输入剪贴板
,使用
^Unique Identifier\R(\d+(?:-\d+)*)(?=\RTitle$)|^(?!Unique Identifier$).*\R?
替换为:(?1$1\n:)
- 首先使用换行或不捕获组值。
说明
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
\R any line ending sequence
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group,but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\R 'R'
--------------------------------------------------------------------------------
Title 'Title'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the line
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
Unique Identifier 'Unique Identifier'
--------------------------------------------------------------------------------
$ the end of the line
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\R? any line ending sequence (optional)