问题描述
假设下面是我的火车数据
TRAIN_DATA = [
("PRICE INCREASED EVERY 6 HOURS",{"entities": [(16,29,"TIME"),(22,"TIME")]}),("disCOUNT SALES ANNOUNCED FOR 2 TIMES DAILY",{"entities": [(30,43,"ORG"),(38,"ORG")]})]
如何使用spacy.util.filter_spans()
将我的训练数据更正为更长的时间。
解决方法
也许有更好的方法(我也是spacy noob),但是我像这样运行匹配器:
matches = []
for matcher in matchers:
matches += matcher(doc)
spans = [Span(doc,s,e,label=i) for i,e in matches]
spans = filter_spans(spans)
for span in spans:
pass # Do stuff here
这应该可以帮助您,但是我很好奇其他人是否有更好的解决方案。