问题描述
我在这里使用了 jupyter notebook。
此代码来自 YouTube 视频。它在 youtuber 的计算机上工作,但我的引发了 stopiteration 错误
在这里,我试图获取与“Go”语言相关的所有标题(来自 csv 的问题)
import pandas as pd
df = pd.read_csv("Questions.csv",encoding = "ISO-8859-1",usecols = ["Title","Id"])
titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains(" go "," golang ")]['Title']]
#新单元格
import spacy
nlp = spacy.load("en_core_web_sm",disable= ["ner"])
#新单元格
def has_golang(text):
doc = nlp(text)
for t in doc:
if t.lower_ in [' go ','golang']:
if t.pos_ != 'VERB':
if t.dep_ == 'pobj':
return True
return False
g = (title for title in titles if has_golang(title))
[next(g) for i in range(10)]
#这是错误
stopiteration Traceback (most recent call last)
<ipython-input-56-862339d10dde> in <module>
9
10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]
<ipython-input-56-862339d10dde> in <listcomp>(.0)
9
10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]
stopiteration:
解决方法
StopIteration
是在耗尽的迭代器上调用 next()
的结果,即 g
产生的结果少于 10 个。您可以从 help()
函数获取此信息。
help(next)
Help on built-in function next in module builtins:
next(...)
next(iterator[,default])
Return the next item from the iterator. If default is given and the iterator
is exhausted,it is returned instead of raising StopIteration.
编辑
您的 has_golang
不正确。第一个测试总是 False
因为 nlp
标记单词,即修剪前导和尾随空格。试试这个:
def has_golang(text):
doc = nlp(text)
for t in doc:
if t.lower_ in ['go','golang']:
if t.pos_ != 'VERB':
if t.dep_ == 'pobj':
return True
return False
我通过找到一个标题来解决这个问题,该标题应该导致 True
的 has_golang
。然后我运行了以下代码:
doc = nlp("Making a Simple FileServer with Go and Localhost Refused to Connect")
print("\n".join(str((t.lower_,t.pos_,t.dep_)) for t in doc))
('making','VERB','csubj') ('a','DET','det') ('simple','PROPN','compound') ('fileserver','dobj') ('with','ADP','prep') ('go','pobj') ('and','CCONJ','cc') ('localhost','conj') ('refused','ROOT') ('to','PART','aux') ('connect','xcomp')
然后看('go','pobj')
,很明显PROPN不是动词,pobj是pobj,所以问题必须出在token上:go,特别是"go"
而不是" go "
。
原始回复
如果您只想要满足 3 个 if
条件的标题,请跳过生成器:
g = list(filter(has_golang,titles))
如果你需要生成器但也想要一个列表:
g = (title for title in titles if has_golang(title))
list(g)