遍历多个txt文件并计算Python中所选单词的频率

问题描述

我遇到一个运动问题，我被要求编写一个循环遍历50个文本文件的函数，并计算每个文本文件中所选单词的出现频率。目前，我的代码如下：

def count(term):
    frequencies = 0
    
    work_dir = "C:/my_work_directory"
    for i in range(1,51):
        name = "chapter-{i}.txt".format(i=i)
        path = os.path.join(work_dir,name)
        with io.open(path,"r") as fd:
            content = fd.read()
    
        chapter = io.StringIO(content)
        line = chapter.readline()
        print(chapter)
        while line:
            lower = line.lower()
            cleaned = re.sub('[^a-z ]','',lower)
            words = cleaned.strip().split(' ')
            for word in words:
                if word == term:
                    frequencies += 1
            line = chapter.readline()
        
        print(frequencies)

我想要的输出是，如果我输入count（“ Man”），则每个文本文件中出现“ Man”一词的频率有50种不同的频率。但是，我现在得到的只是50个零。我相当确定这是因为我已将变量“频率”初始化为0，然后对其未执行任何操作。谁能帮助我解决此问题或告诉我我要去哪里错了？任何帮助将不胜感激，谢谢。

解决方法

好吧，您的“男人”有一个大写字母，并且所有单词均为小写。因此，第一件事就是在lower()变量上调用term函数。第二件事是错误的，您稍后会注意到，这是您正在运行的计数而不是每个文件的计数。因此，将频率变量的初始化移到for循环中。所以它应该看起来像这样。

def count(term):
    term = term.lower()
    
    work_dir = "C:/my_work_directory"
    for i in range(1,51):
        frequencies = 0

        name = "chapter-{i}.txt".format(i=i)
        path = os.path.join(work_dir,name)
        with io.open(path,"r") as fd:
            content = fd.read()
    
        chapter = io.StringIO(content)
        line = chapter.readline()
        print(chapter)
        while line:
            lower = line.lower()
            cleaned = re.sub('[^a-z ]','',lower)
            words = cleaned.strip().split(' ')
            for word in words:
                if word == term:
                    frequencies += 1
            line = chapter.readline()
        
        print(frequencies)

我运行了它，并且在更改work_dir =“”后它可以正常工作（因此它在本地查看）。所以我认为您应该检查工作目录路径或断言term参数是否正确

python text-files word-frequency

遍历多个txt文件并计算Python中所选单词的频率

问题描述

解决方法

相关问答