我正在尝试从 python 中的 csv 中获取单词的单词计数器和单词出现次数但是我收到一个错误

问题描述

我是 Python 新手,正在尝试了解如何使用单词搜索和出现。

我有一个名为 SwitchedProviders.csv 的 CSV 文件,我想在其中查找多个单词并列出每个单词在 CSV 中出现的次数

csv 的数据框看起来像-

data.head()
Out[71]: 
                                          transcript
0  thank you for calling health in...
1                     hi Chris this is Carol number 
2                                                my nation 
3  you have a pen and I'll give you my account nu...
4  I actually already have it pulled up from your...

我试过了:

    import csv
    import collections
    
    words = collections.Counter()
    with open('SwitchedProviders_TopicModel.csv') as input_file:
        for row in csv.reader(input_file,delimiter=';'):
            words[row[1]] += 1
    
    print ('Number of times each word: %s' % word['Nation','Embrace','Companion','Health'])

但是,我收到错误

      File "<ipython-input-70-4c7724bbede8>",line 7,in <module>
        words[row[1]] += 1
    
    IndexError: list index out of range

我只希望我的最终输出看起来像一个 df,其中:

    Word   Count
    Nation   20
    Embrace  21
    Health    3

我做错了什么?

解决方法

您的错误很直接,不要假设列表具有特定长度。查看。您假设文件中的每一行至少有两个标记。我生成的样本数据有 0 或 1

from lorem_text import lorem
import numpy as np

# generate some sample data....
cwords = ['Nation','Embrace','Companion','Health']
text = lorem.paragraphs(5).split(" ")
for n in np.random.randint(0,len(text),35):
    text[n] = np.random.choice(cwords)
text = "\n".join(text)
with open('SwitchedProviders_TopicModel.csv',"w") as f: f.write(text)

# OP question
import csv
import collections
words = collections.Counter()
with open('SwitchedProviders_TopicModel.csv') as input_file:
    for row in csv.reader(input_file,delimiter=';'):
        # there is no guarantee that list has any items,protect against it...
        if len(row)>0 : words[row[0]] += 1
# re-wrote print to work a bit better
print ('Number of times each word: %s' % "\n".join([f"{k}\t{v}" 
                                                    for k,v in words.items() if k in 
                                                    ['Nation','Health'] ]))