如何在NLP中提取与某些特定类别相关的单词

问题描述

我想做什么

我尝试提取与某些特定类别相关的单词,如下所示。

# sample sentences
["the lion and puma,ostrich and rhea.","She touched the water of the lake and groaned.","I like to see this mountain from my room","The pile of mail she left on her desk.","Could we make a car that can go 300 mph?","Shop for airplane tickets to New York City."]

# preferred output
# animals
["lion","puma","ostrich","rhea"]

# geography
["lake","mountain"]

# furniture/housing
["room","desk"]

# vehicles
["car","airplane"]

一些例句来自https://sentence.yourdictionary.co

方法和问题

我尝试使用WordNet来找出更广泛的术语和概念,作为术语“ hypernym”。 因为我首先需要确定WordNet上可用的更广泛的术语。

但是,我只能找到一层以上的单词,并且不知道WordNet的广阔视野。

有没有可用的WordNet树结构可视化工具或网页? 要么 还有其他方法可以实现我想做什么吗?

代码

试图找出同义词集(概念集群),然后得到其更广泛的术语。

from nltk.corpus import wordnet as wn
for syn in wn.synsets('car'):
    print(syn.name(),syn.lemma_names())
from nltk.corpus import wordnet as wn
key = wn.synset('car.n.01') #'airplane.n.01'
upperwords = key.hypernyms()
#keyword = input()
print('hypernyms:')
print(sorted([lemma.name() for synset in upperwords for lemma in synset.lemmas()]))

输出 汽车与飞机之间没有共同或共同的条件

car.n.01 ['car','auto','automobile','machine','motorcar']
car.n.02 ['car','railcar','railway_car','railroad_car']
car.n.03 ['car','gondola']
car.n.04 ['car','elevator_car']
cable_car.n.01 ['cable_car','car']
hypernyms:
['automotive_vehicle','motor_vehicle']


airplane.n.01 ['airplane','aeroplane','plane']
hypernyms:
['heavier-than-air_craft']

代码参考:WordNet Interface

我到目前为止在Google上搜索内容

WordNet Search - 3.1

在WordNet上搜索单词的Web界面

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)