在两个单词之间分割字符串

问题描述

以下字符串

text = 'FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\nFortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n'

应分为以下内容：

output = [
    'FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n'
]

分割后不应删除定界符。

delimiters = '(GigabitEthernet\d*/\d*/\d*\s.*|FortyGigE\d*/\d*/\d*\s.*)'

我试图这样做：

output = re.split(delimiters,text)

但是我的输出将是这样，拆分次数比我预期的要多：

['','FortyGigE1/0/53\r','\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r','\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n']

解决方法

至少在您的示例中，您可以执行以下操作：

>>> re.split(r'(?<=DOWN\r\n\r\n)(?=FortyGigE)',text)
['FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n']

与您声明的所需输出相比：

>>> output==re.split(r'(?<=DOWN\r\n\r\n)(?=FortyGigE)',text)
True

通过使用零宽度回溯(?<=DOWN\r\n\r\n)和零宽度超前(?=FortyGigE)作为分割点来工作。

Here is a regex101 demo; \r被删除，因为该平台不支持它们。

您的小费为我解决了我的问题。这是我的脚本的摘录：

f = open(file,"r")
content = f.read()
f.close()
#
# This deliminator is only an example. The interface names are much longer
deliminators = r'(?=\nBridge-Aggregation|\nHundredGigE|\nFortyGigE|\nTen-GigabitEthernet)'
#
dev_interfaces = re.split(deliminators,content)
max_interfaces = len(dev_interfaces)
# Delete the beginning Linefeed (\n) of each interface
dev_interfaces[index] = dev_interfaces[index].lstrip('\n')

delimiter python regex split