问题描述
我遇到一个运动问题,我被要求编写一个循环遍历50个文本文件的函数,并计算每个文本文件中所选单词的出现频率。目前,我的代码如下:
def count(term):
frequencies = 0
work_dir = "C:/my_work_directory"
for i in range(1,51):
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir,name)
with io.open(path,"r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','',lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
我想要的输出是,如果我输入count(“ Man”),则每个文本文件中出现“ Man”一词的频率有50种不同的频率。但是,我现在得到的只是50个零。我相当确定这是因为我已将变量“频率”初始化为0,然后对其未执行任何操作。谁能帮助我解决此问题或告诉我我要去哪里错了?任何帮助将不胜感激,谢谢。
解决方法
好吧,您的“男人”有一个大写字母,并且所有单词均为小写。因此,第一件事就是在lower()
变量上调用term
函数。第二件事是错误的,您稍后会注意到,这是您正在运行的计数而不是每个文件的计数。因此,将频率变量的初始化移到for循环中。所以它应该看起来像这样。
def count(term):
term = term.lower()
work_dir = "C:/my_work_directory"
for i in range(1,51):
frequencies = 0
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir,name)
with io.open(path,"r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','',lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
,
我运行了它,并且在更改work_dir =“”后它可以正常工作(因此它在本地查看)。所以我认为您应该检查工作目录路径或断言term参数是否正确