问题描述
from __future__ import unicode_literals,print_function
from spacy.lang.en import English
nlp = English()
sentencizer = nlp.create_pipe("sentencizer")
nlp.add_pipe(sentencizer)
assert len(list(doc.sents)) == 2
这是回溯:
AttributeError Traceback (most recent call last)
<ipython-input-81-0459326012bf> in <module>
5 sentencizer = nlp.create_pipe("sentencizer")
6 nlp.add_pipe(sentencizer)
----> 7 assert len(list(doc.sents)) == 2
AttributeError: 'list' object has no attribute 'sents'
解决方法
如果您的目标是标记(拆分)句子,则下面是使用spaCy的代码示例。
import spacy
nlp = spacy.load('en_core_web_lg')
raw_text = 'Hello,world. Here are two sentences.'
doc = nlp(raw_text)
sentences = [sent.string.strip() for sent in doc.sents]
assert len(sentences) == 2
print(sentences)
输出:
['Hello,world.','Here are two sentences.']