问题描述
您可以使用简化这一个正则表达式re.S
中,DOTALL标志。
import re
def GetTheSentences(infile):
with open(infile) as fp:
for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
print result
# extract me
# extract me
# extract me
这也利用了非贪婪运算符.*?
,因此将找到多个DELIMITER1-DELIMITER2对的非重叠块。
解决方法
我有以下格式的文本文件:
DELIMITER1
extract me
extract me
extract me
DELIMITER2
我想提取extract me
.txt文件中DELIMITER1和DELIMITER2之间的每个s块
这是我当前的无效代码:
import re
def GetTheSentences(file):
fileContents = open(file)
start_rx = re.compile('DELIMITER')
end_rx = re.compile('DELIMITER2')
line_iterator = iter(fileContents)
start = False
for line in line_iterator:
if re.findall(start_rx,line):
start = True
break
while start:
next_line = next(line_iterator)
if re.findall(end_rx,next_line):
break
print next_line
continue
line_iterator.next()
有任何想法吗?