问题描述
有谁知道如何使用 NLTK 在文本文档中找到最常见的名词或形容词?我知道如何找到常用词,但不知道如何找到最常用的名词或形容词。我还必须从文档中删除停用词。
import nltk
from nltk.corpus import stopwords
from nltk import Freqdist,word_tokenize
from nltk.stem.porter import Porterstemmer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import re
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Pizza = []
openfile = open('Pizza.txt','r')
r = csv.reader(openfile)
for i in r:
Pizza.append(i)
openfile.close()
print(Pizza)
def text_processing(Pizza):
tokens = str(Pizza)
tokens = tokens.lower()
tokens = re.sub("[^a-zA-Z0-9]"," ",tokens)
tokens = word_tokenize(tokens)
wordnet_lemmatizer = WordNetLemmatizer()
tokens = (wordnet_lemmatizer.lemmatize(word) for word in tokens)
more_stopwords = set(('cant','aint','today'))
extra_stoplist = set(stopwords.words('english')) | more_stopwords
tokens = (word for word in tokens if word not in extra_stoplist)
tokens = (word for word in tokens if word.isalpha())
tokens = (word for word in tokens if len(word) >= 3)
return tokens
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)