Notepad++ regex 获取位于两个重复标签之间的每个唯一数字

问题描述

我在一个文件中有 16k 行文本。 使用 Notepad++

打开它

每隔几行就会出现这样一个

a few rows of text before
    Unique Identifier
    628612012-078
    Title
another few rows of text until the next Unique Identifier row

再多写几行,然后用不同的数字再次重复

a few rows of text before
Unique Identifier
1991-18613-001
Title
another few rows of text until the next Unique Identifier row

数据图片

enter image description here

获取(复制/保存)位于每个 Unique IdentifierTitle 标签/行之间的每个 ID 编号的正则表达式是什么样的?

我不介意它是否删除文件中的其余文本或将输出另存为另一个文件或其他任何内容。理想情况下,我只需要按顺序列出这些数字。

试过这个/调整这个Find and copy text between { and } in Notepad++ - 无法让它工作

解决方法

如果文件有 Unix(LF) 行结尾,则正则表达式为

(?<=Unique Identifier\n).+(?=\nTitle)

然后使用 Mark All 和 Copy Marked Text 将所有数学输入剪贴板 enter image description here

,

使用

^Unique Identifier\R(\d+(?:-\d+)*)(?=\RTitle$)|^(?!Unique Identifier$).*\R?

替换为:(?1$1\n:) - 首先使用换行或不捕获组值。

说明

--------------------------------------------------------------------------------
  ^                        the beginning of the line
--------------------------------------------------------------------------------
  Unique Identifier        'Unique Identifier'
--------------------------------------------------------------------------------
  \R                       any line ending sequence
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group,but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      -                        '-'
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \R                       'R'
--------------------------------------------------------------------------------
    Title                    'Title'
--------------------------------------------------------------------------------
    $                        the end of the line
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  ^                        the beginning of the line
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    Unique Identifier        'Unique Identifier'
--------------------------------------------------------------------------------
    $                        the end of the line
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \R?                      any line ending sequence (optional)