正则表达式模式匹配-CSV文件中单词的子字符串

问题描述

'Neighborhood,eattend10,eattend11,eattend12,eattend13,mattend10,mattend11,mattend12,mattend13,hsattend10,hsattend11,hsattend12,hsattend13,eenrol11,eenrol12,eenrol13,menrol11,menrol12,menrol13,hsenrol11,hsenrol12,hsenrol13,aastud10,aastud11,aastud12,aastud13,wstud10,wstud11,wstud12,wstud13,hstud10,hstud11,hstud12,hstud13,abse10,abse11,abse12,abse13,absmd10,absmd11,absmd12,absmd13,abshs10,abshs11,abshs12,abshs13,susp10,susp11,susp12,susp13,farms10,farms11,farms12,farms13,sped10,sped11,sped12,sped13,ready11,ready12,ready13,math310,math311,math312,math313,read310,read311,read312,read313,math510,math511,math512,math513,read510,read511,read512,read513,math810,math811,math812,math813,read810,read811,read812,read813,hSAEng10,hSAEng11,hSAEng12,hSAEng13,hsabio10,hsabio11,hsabio12,hsabio13,hsagov10,hsagov11,hsagov13,hsaalg10,hsaalg11,hsaalg12,hsaalg13,drop10,drop11,drop12,drop13,compl10,compl11,compl12,compl13,sclsw11,sclsw12,sclsw13,sclemp13\

我有这个数据集。我需要知道有多少drop个单词并打印出来。

或者类似地为mattend之类的任何单词打印这些。

我如何在RegEx中完成此操作？

我尝试使用findall，但我认为这是不正确的

我认为我们可以使用re.search或re.match。

预先感谢

解决方法

您可以在len()上使用re.findall()来获取返回列表的长度：

import re
with open('example.csv') as f:
  data = f.read().strip()
print(len(re.findall('drop',data)))

我认为re.findall应该正确。来自python re模块文档：

搜索：

浏览字符串以查找此正则表达式产生匹配项的第一个位置，然后返回相应的匹配对象。

匹配：

如果字符串开头的零个或多个字符与此正则表达式匹配，则返回相应的匹配对象。

Findall：

以字符串列表的形式返回字符串中所有不重复的模式匹配项。从左到右扫描字符串，并以找到的顺序返回匹配项。如果模式中存在一个或多个组，则返回一个组列表；否则，返回一个列表。如果模式包含多个组，则这将是一个元组列表。空匹配项包含在结果中。

我在您的示例中进行了尝试，对我有用： re.findall("drop",str)

如果要在其后看到数字，可以尝试以下操作： re.findall("drop\d*",str)

如果您想计算单词数，可以使用： len(re.findall("drop\d*",str))

pattern-matching python-3.x regex regex regex