重复提取文本文件Python中两个定界符之间的行

问题描述

您可以使用简化这一个正则表达式re.S中,DOTALL标志

import re
def GetTheSentences(infile):
     with open(infile) as fp:
         for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
             print result
# extract me
# extract me
# extract me

这也利用了非贪婪运算符.*?,因此将找到多个DELIMITER1-DELIMITER2对的非重叠块。

解决方法

我有以下格式的文本文件:

DELIMITER1
extract me
extract me
extract me
DELIMITER2

我想提取extract me.txt文件中DELIMITER1和DELIMITER2之间的每个s块

这是我当前的无效代码:

import re
def GetTheSentences(file):
     fileContents =  open(file)
     start_rx = re.compile('DELIMITER')
     end_rx = re.compile('DELIMITER2')

     line_iterator = iter(fileContents)
     start = False
     for line in line_iterator:
           if re.findall(start_rx,line):

                start = True
                break
      while start:
           next_line = next(line_iterator)
           if re.findall(end_rx,next_line):
                break

           print next_line

           continue
      line_iterator.next()

有任何想法吗?