使用Jupyter Notebook进行文本预处理

问题描述

我在文本挖掘方面遇到问题。请帮助我

这是我的代码：

file = open('c:/Users/Ramin/Desktop/Nixon.txt','r+')
text =file.read()  
import spacy
import re
text = re.sub(r'\n',"",text) # remove extra newlines
nlp = spacy.load('en')
text_nlp = nlp(text)
# print named entities in article
ner_tagged = [(word.text,word.ent_type_) for word in text_nlp]
print(ner_tagged)
from spacy import displacy
# visualize named entities
displacy.render(text_nlp,style='ent',jupyter=True)

我搜索了此错误并找到了一些东西，但是帮不了我。

我收到此错误：

[E050]找不到型号'en'。它似乎不是快捷方式链接，Python包或数据目录的有效路径。

这是我的文字：https://github.com/raminbazr/Nixon

解决方法

尝试安装所需的软件包

python -m spacy download en

jupyter-notebook nlp python text-mining visualize