从文本列 NLTK

问题描述

我想从数据框文本列中提取名称

该列已经被标记化,它对一个单元格效果很好,但我希望它遍历整个列以最终获得一个在文本列中指示的名称

data['text6']
0     [Walking,After,I,finished,my,class,...
1     [Long,day,Long,but,it,was,not,bad,...
2     [Travelling,work,today,had,a,...
3     [Exam,Day,long,working,w...
4     [lovey,Friday,It,lovely,.,...
5     [Highway,waked,up,early,in,the,mornin...
6     [Work,Quiet,at,found,so,...

pos_tags = nltk.pos_tag(data['text6'][0])
chunks = nltk.ne_chunk(pos_tags,binary=False) #either NE or not NE

for chunk in chunks:
    print(chunk)

entities =[]
labels =[]
for chunk in chunks:
    if hasattr(chunk,'label'):
        #print(chunk)
        entities.append(' '.join(c[0] for c in chunk))
        labels.append(chunk.label())
        
entities_labels = list(set(zip(entities,labels)))
entities_df = pd.DataFrame(entities_labels)
entities_df.columns = ["Entities","Labels"]
entities_df = entities_df[entities_df['Labels']=='PERSON']
entities_df = entities_df.replace({'GPE': 'Person'})
entities_df

[![Output shown Now for only one cell][1]][1]

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)