如何显示完整结果,而不是在 python 中显示正则表达式搜索中的匹配文本

问题描述

我正在创建一个基于关键字搜索文件的脚本,我的输出应该是整个观察结果,而不仅仅是匹配的文本,但我发现 .group 对此不起作用。

import re 
import os 
 
pers_info = pd.read_csv(r".....StateWorkforceMailingList_2-7-19a.csv",encoding='utf-8')

Pers_info['State'] = Texas,Florida etc... 

 files=os.listdir(r"....\State Files")
 
Files = list of WORKFORCE_2017_ALABAMA_FILE.xlsx,...,n

matches=re.findall(pers_info.State[4],files.replace("_"," "),re.I)
print(match) 

我的预期输出是 WORKFORCE_2017_ALABAMA_FILE.xlsx 相反,我得到了“阿拉巴马”

我应该尝试布尔掩码吗?

解决方法

我想你的 Pers_info 看起来像这样:

Pers_info = {"state": ["Texas","Alabama","Florida"],"somethingelse": "stuff"}

你的文件是这样的:

files = ["WORKFORCE_2017_ALABAMA_FILE.xlsx","WORKFORCE_2017_TEXAS_FILE.xlsx","SOMETHING.xlsx"]

(你不需要正则表达式)

files = [file.lower() for file in files]
peers = [file.lower() for file in Pers_info['state']]
result = []

for x in peers:
    try:
        indx = peers.index(x)

        if any(peers[indx] in s for s in files):
            result.append(files[indx])
    except:
        break
print(result)
,

使用

>>> import pandas as pd
>>> Pers_info = pd.DataFrame({'State':['Texas','Alabama','Florida']})
>>> Files = ['WORKFORCE_2017_ALABAMA_FILE.xlsx','WORKFORCE_2017_FILE.xlsx']
>>> pattern = re.compile(rf'(?<![^\W_])(?:{"|".join(Pers_info["State"].to_list())})(?![^\W_])',re.I)
>>> list(filter(pattern.search,Files))
['WORKFORCE_2017_ALABAMA_FILE.xlsx']

regex proof

说明

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    [^\W_]                   any character except: non-word
                             characters (all but a-z,A-Z,0-9,_),'_'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (?:                      group,but do not capture:
--------------------------------------------------------------------------------
    Texas                    'Texas'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Alabama                  'Alabama'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Florida                  'Florida'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [^\W_]                   any character except: non-word
                             characters (all but a-z,'_'
--------------------------------------------------------------------------------
  )                        end of look-ahead