问题描述
我正在从事一个项目,并坚持研究堆栈的特定方面,因为我不确定我是否正确表达了搜索查询。希望堆栈可以提供帮助!
我已经刮掉了一大批文档,完成了NLP的所有文本清理(规范化,词干,词根化),应用了Spacy命名实体识别,现在我的原始数据集充斥着新标签。我的数据对象看起来像这样:
{
"doc_id": "123","content": "Facebook is to build its own “village” of 1,500 homes for workers struggling to pay soaring rents as the housing crisis in Silicon Valley deepens.
The social networking company has submitted plans to the local council to create a new neighbourhood of homes,shops and a public plaza across the street from its global headquarters. Mark Zuckerberg’s company said it was being forced to build the “mixed-use village” Menlo Park,about 30 miles south of San Francisco,because the regional government’s “failure” to invest in infrastructure has led to sky-high rents and hours-long commutes to work.","url":"https://www.theguardian.com/technology/2017/jul/09/facebook-addresses-silicon-valleys-affordable-housing-crisis","entities:
{
"entity":
[
{ "id": "1001","type": "PERSON","ent": "Mark Zuckerberg" },{ "id": "1002","type": "LOC","ent": "Silicon Valley" },{ "id": "1003","type": "ORG","ent": "Facebook" },]
},}
现在,我想允许我的一个用户“订阅”实体,例如ORG“ Facebook”。因此,下一次我的Web抓取工具>文本处理器> Spacy命名的实体识别管道“听到”,有与现有实体“ Facebook”相关的新内容更新到我的数据库时,将生成一个带有“ Entity-match”有效负载或类似内容的Webhook
我觉得该服务确实存在,但是我所有的Google结果都产生了日志监控服务,例如NewRelic,而不是“新数据提取匹配规则”类型的服务。甚至叫什么?
谢谢您对我的教育!
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)