如何从nltk pos_tag获取标签集?

问题描述

我正在尝试从nltk pos_tag获取完整标签,但是我找不到使用nltk的简单方法。例如,使用lambda

tagsets='universal'

解决方法

对我写的论文进行NLP分析时,我遇到了同样的问题。我必须使用这样的映射函数:

import nltk
from nltk.tokenize import word_tokenize

def get_full_tag_pos(pos_tag):
    tag_dict = {"J": "ADJ","N": "NOUN","V": "VERB","R": "ADV"}
    # assuming pos_tag comes in as capital letters i.e. 'JJR' or 'NN'
    return tag_dict.get(pos_tag[0],'NOUN')

# example
words = word_tokenize(text)
words_pos = nltk.pos_tag(words)
full_tag_words_pos = [word_pos[0] + "/" + get_full_tag_pos(word_pos[1]) for word_pos in words_pos]