问题描述
我想编写一个关键字上下文脚本,在该脚本中,我首先将文本文件作为枚举列表读取,然后返回给定的关键字和接下来的五个单词。
我看到有人问了 C# 类似的问题,我在 Python 中找到了 enum
模块的解决方案,但我希望有一个只使用 enumerate()
函数的解决方案。
这是我目前得到的:
# Find keywords in context
import string
# open input txt file from local path
with open('C:\\Users\\somefile.txt','r',encoding='utf-8',errors='ignore') as f: # open file
data1=f.read() # read content of file as string
data2=data1.translate(str.maketrans('','',string.punctuation)).lower() # remove punctuation
data3=" ".join(data2.split()) # remove additional whitespace from text
indata=list(data3.split()) # convert string to list
print(indata[:4])
searchterms=["text","book","history"]
def wordsafter(keyword,source):
for i,val in enumerate(source):
if val == keyword: # cannot access the enumeration value here
return str(source[i+5]) # intend to show searchterm and subsequent five words
else:
continue
for s in searchterms: # iterate through searchterms
print(s)
wordsafter(s,indata)
print("done")
我希望我可以像在这里一样简单地访问枚举的值,但似乎并非如此。
解决方法
感谢@jasonharper,您改进的代码:
import string
def wordsafter(keyword,source):
for i,val in enumerate(source):
if val == keyword:
return ' '.join(source[i:i + 5]) # intend to show searchterm and subsequent five words
# wordsafter() for all instances
def wordsafter(keyword,source):
instances = []
for i,val in enumerate(source):
if val == keyword:
instances.append(' '.join(source[i:i + 5]))
return instances
# open input txt file from local path
with open('README.md','r',encoding='utf-8',errors='ignore') as f: # open file
data1 = f.read() # read content of file as string
data2 = data1.translate(str.maketrans('','',string.punctuation)).lower() # remove punctuation
data3 = " ".join(data2.split()) # remove additional whitespace from text
indata = list(data3.split()) # convert string to list
searchterms = ["this","book","history"]
for string in searchterms: # iterate through searchterms
result = wordsafter(string,indata)
if result:
print(result)