问题描述
以下字符串
text = 'FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\nFortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n'
应分为以下内容:
output = [
'FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n'
]
分割后不应删除定界符。
delimiters = '(GigabitEthernet\d*/\d*/\d*\s.*|FortyGigE\d*/\d*/\d*\s.*)'
我试图这样做:
output = re.split(delimiters,text)
['','FortyGigE1/0/53\r','\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r','\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n']
解决方法
至少在您的示例中,您可以执行以下操作:
>>> re.split(r'(?<=DOWN\r\n\r\n)(?=FortyGigE)',text)
['FortyGigE1/0/53\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n','FortyGigE1/0/54\r\nCurrent state: DOWN\r\nLine protocol state: DOWN\r\n\r\n']
与您声明的所需输出相比:
>>> output==re.split(r'(?<=DOWN\r\n\r\n)(?=FortyGigE)',text)
True
通过使用零宽度回溯(?<=DOWN\r\n\r\n)
和零宽度超前(?=FortyGigE)
作为分割点来工作。
Here is a regex101 demo; \r
被删除,因为该平台不支持它们。
您的小费为我解决了我的问题。这是我的脚本的摘录:
f = open(file,"r")
content = f.read()
f.close()
#
# This deliminator is only an example. The interface names are much longer
deliminators = r'(?=\nBridge-Aggregation|\nHundredGigE|\nFortyGigE|\nTen-GigabitEthernet)'
#
dev_interfaces = re.split(deliminators,content)
max_interfaces = len(dev_interfaces)
# Delete the beginning Linefeed (\n) of each interface
dev_interfaces[index] = dev_interfaces[index].lstrip('\n')