在 Python 中从字符串中提取抽象名词和形容词

问题描述

我正在处理一个数据框，其中一列包含反馈文本。它已被清理干净。我只需要知道如何从字符串中提取抽象名词和形容词。

这是我拥有的清理文本示例：

输出必须仅包含来自每个反馈的抽象名词和形容词。

例如，反馈是：

"smells good also nice taste in love with it"

输出应该是：

good nice love

我尝试使用 nltk pos 标记器和 textblob 词典。我可以使用 textblob 提取所有形容词，但在名词的情况下，它标记所有名词。我无法仅将抽象名词分开，例如上述示例中的“爱”。

解决方法

这句话在语法上不正确（漏掉了第一个主语“it”）。 NLP 工具通常不适合解析不正确的句子。然而，NLTK 甚至几乎正确地解析了原始句子：

s = "smells good also nice taste in love with it"
nltk.pos_tag(nltk.word_tokenize(s))
#[('smells','NNS'),('good','JJ'),('also','RB'),('nice',# ('taste','NN'),('in','IN'),('love',('with',('it','PRP')]

如果您更正语法，则结果 100% 正确：

s = "it smells good also nice taste in love with it"
nltk.pos_tag(nltk.word_tokenize(s))
#[('it','PRP'),('smells','VBZ'),# ('nice',('taste',# ('with','PRP')]

以及你想要的输出：

[word for word,tag in nltk.pos_tag(nltk.word_tokenize(s)) if tag[0] in "NJ"]
#['good','nice','taste','love']

nlp pos-tagger python python-3.x sentiment-analysis