Python:搜索文本文件并将包括前一行在内的行块写入另一个文件

问题描述

我正在搜索一个文本文件,并想在另一个文本文件中复制和写入与匹配项相关联的行块。找到搜索条件后,我想将前一行和后 9 行(共 10 行)复制/写出到每个匹配项的文件中。

搜索的示例输入文件

Line 1: File sent to xyz blah blah:
                             Line 2: Search Criteria here
                             Line 3
                             Line 4
                             Line 5
                             Line 6
                             Line 7
                             Line 8
                             Line 9
                             Line 10

Line 1: File sent to xyz blah blah:
                             Line 2: Search Criteria here
                             Line 3
                             Line 4
                             Line 5
                             Line 6
                             Line 7
                             Line 8
                             Line 9
                             Line 10

我已经开始的代码

searchList = []
searchStr = "Search Criteria here"

with open('','rt') as fInput:
    prevIoUs = next(fInput)
    for line in fInput:
        if line.find(searchStr) != -1:
            searchList.append(prevIoUs)
            searchList.append(line.lstrip('\n'))


with open('Output.txt','a') as fOutput:
    OutPut.write("\n".join(searchList))

上面的代码保存到这样的文件中,第一行和第二行之间有空格:

mm/dd/yyy  hh:mm:ss.MMM File sent to xyz:

                             Line 2: Search Criteria here

mm/dd/yyy  hh:mm:ss.MMM File sent to xyz:

                             Line 2: Search Criteria here

我想保存所有 10 行,就像它们在输入文件中一样。

解决方法

首先,读取文件并找到匹配的行号。跟踪行号以备后用。

all_lines = []
match_lines = []

with open('in_file.txt','r') as fInput:
    for number,line in enumerate(fInput):
        all_lines.append(line)
        if searchStr in line:
            match_lines.append(number)

然后,循环遍历 match_lines 列表并从 all_lines 输出您关心的行:

num_lines_before = 1
num_lines_after = 10
with open('out_file.txt','w') as fOutput:
    for line_number in match_lines:
        # Get a slice containing the lines to write out
        output_lines = all_lines[line_number-num_lines_before:line_number+num_lines_after+1]
        fOutput.writelines(output_lines)    

为了测试这一点,我将创建一个 io.StringIO 对象来将字符串读/写为文件,并要求在前一行和后两行:

import io

strIn = """This is some text
12345
2 searchforthis
34567
45678
5 searchforthis
63r23tf
7pr9e2380
89spver894
949erc8m9
100948rm42"""

all_lines = []
match_lines = []
searchStr = "searchforthis"

# with open('in_file.txt','r') as fInput:
with io.StringIO(strIn) as fInput:
    for number,line in enumerate(fInput):
        all_lines.append(line)
        if searchStr in line:
            match_lines.append(number)

num_lines_before = 1
num_lines_after = 2



# with open('out_file.txt','w') as fOutput:
with io.StringIO("") as fOutput:
    for line_number in match_lines:
        # Get a slice containing the lines to write out
        output_lines = all_lines[line_number-num_lines_before:line_number+num_lines_after+1]
        fOutput.writelines(output_lines)    
        fOutput.write("----------\n") # Just to distinguish matches when we test
    
    fOutput.seek(0)
    print(fOutput.read())

给出这个输出:

12345
2 searchforthis
34567
45678
----------
45678
5 searchforthis
63r23tf
7pr9e2380
----------