正则表达式在特定文本文件的模式匹配时返回单行

问题描述

我有多个文本文件,并且想要在特定模式匹配时提取字符串,并将其附加到带有文件名和字符串的数据框中。在我的情况下,这些文本文件中存在多个相同的模式。

sample.txt:
"government high school
Govt high school physics department
Employee Designation School Assistant"

What I am getting:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department
sample.txt           government high school                   Employee Designation School Assistant

What I am looking for:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department

这是我正在使用的代码

prs_path = "C://Users//subhr//scope_txt//"

df3 = [] 
for file in os.listdir(prs_path):
    Name = None
    with open(prs_path + file) as fd:
        for line in fd:
            line = line.lower()
            match = re.search('r(^.*government.*$)',line,re.I)
            Org = ""
            if match:
                Org = match.group()
                df3.append([file,Org])
            Org2 = ""
            Org3 = ""
            Org = ""
            if match is None:
                match2 = re.search('r(^.*school.*$)|(^.*college.*$)',re.I)
                if match2:
                    Org2 = match2.group()
                    df3.append([file,Org,Org2])
                if match2 is None:
                    match3 = re.search('r(^.*power.*$)',re.I)
                    if match3:
                        Org3 = match3.group()
                        df3.append([file,Org2,Org3])
                    if match3 is None:
                        continue

我要去哪里错了?

解决方法

尝试使用这种情况r"^(.*?):$\n\"(.*?) (.*?)$\n(.*?) (.*? .*?) (.*?)$"

您的输入将分为6组,请检查一下以进行测试。

https://regex101.com/r/UN9cjZ/1