使用 NLTK 在文本文档中查找前 10 个最常见名词

问题描述

有谁知道如何使用 NLTK 在文本文档中找到最常见的名词或形容词？我知道如何找到常用词，但不知道如何找到最常用的名词或形容词。我还必须从文档中删除停用词。

import nltk
from nltk.corpus import stopwords
from nltk import Freqdist,word_tokenize
from nltk.stem.porter import Porterstemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import re
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Pizza = []
openfile = open('Pizza.txt','r')
r = csv.reader(openfile)
for i in r:
Pizza.append(i)    
openfile.close()
print(Pizza)


def text_processing(Pizza):
tokens = str(Pizza)
tokens = tokens.lower()
tokens = re.sub("[^a-zA-Z0-9]"," ",tokens)
tokens = word_tokenize(tokens)
wordnet_lemmatizer = WordNetLemmatizer()
tokens = (wordnet_lemmatizer.lemmatize(word) for word in tokens)
more_stopwords = set(('cant','aint','today'))
extra_stoplist = set(stopwords.words('english')) | more_stopwords
tokens = (word for word in tokens if word not in extra_stoplist)
tokens = (word for word in tokens if word.isalpha())
tokens = (word for word in tokens if len(word) >= 3)
return tokens

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

jupyter-notebook nltk python word-count

使用 NLTK 在文本文档中查找前 10 个最常见名词

问题描述

解决方法

相关问答