问题描述
我正在尝试获取列名中每个单词的同义词列表。但是,当我运行wordnet.synsets()时,它将仅对一个单词的列名起作用。如何在多个单词上运行它并像下面的期望输出一样输出它?还有没有办法只显示前4个结果以提高可读性?
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd
df = ['Unnamed 0','business id','name','postal code',]
syns = {w : [] for w in df}
for k,v in syns.items():
for synset in wordnet.synsets(k):
for lemma in synset.lemmas():
if lemma.name() not in syns:
v.append(lemma.name())
pd.DataFrame([syns],columns = syns.keys())
当前输出:
Unnamed 0 business id name postal code
[] [] [gens,figure,public_figure,epithet,call,i... []
所需的输出:
Unnamed 0 business id name postal code
Unnamed[deFinitions],business[deFinitions],[gens,public_figure] postal[deFinitions],0[deFinitions] id[deFinitions] code[deFinitions]
解决方法
简单易用
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0','business id','name','postal code',]
df = pd.DataFrame(
{tuple([k,t]):pd.Series(np.unique([l.name()
for s in wordnet.synsets(t)
for l in s.lemmas() if "_" not in l.name()])).to_dict()
for k in df
for t in nltk.word_tokenize(k)
}).fillna("")
df.columns.set_names(["sentance","word"],inplace = True)
df.loc[:4] # just first 5 matches...
只需更改列表/字典理解为熊猫格式
{"colA":[1,2],"colB":[3,4]}
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0',]
mr = max([len(k.split(" ")) for k in df])
pd.DataFrame(
# column for each requesed space delimited request
# use f-string to format as requested....
{k:[f"{v}:{np.unique([l.name() for s in wordnet.synsets(v) for l in s.lemmas() ]).tolist()}"
# need to pad request with fewer tokend to meet pandas required format
for v in f"{k}{(mr-len(k.split(' ')))*' '}".split(" ")]
for k in df}).replace({":[]":""})
输出
Unnamed 0 business id name postal code
0 Unnamed:['nameless','unidentified','unknown'... business:['business','business_concern','bus... name:['advert','appoint','bring_up','call',... postal:['postal']
1 0:['0','cipher','cypher','nought','zero'] id:['Gem_State','I.D.','ID','Idaho','id'] code:['cipher','code','codification','compu...