使用lambda函数对整个列进行定形

问题描述

我已经对该代码测试了一个句子,我想对其进行转换,以便可以对整列进行词组化,其中每一行包含单词,而不会出现标点符号,例如:deportivas calcetin hombres deportivas shoes

    import wordnet,nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd

df = pd.read_excel(r'C:\Test2\test.xlsx')
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
sentence = 'FINAL_KEYWORDS'
def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,"N": wordnet.NOUN,"V": wordnet.VERB,"R": wordnet.ADV}

    return tag_dict.get(tag,wordnet.NOUN)



#Lemmatize a Sentence with the appropriate POS tag
sentence = "The striped bats are hanging on their feet for best"
print([lemmatizer.lemmatize(w,get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

让我们假设列名称为df ['keywords'],您能帮我使用lambda函数来使整个列都具有词性吗?

非常感谢

解决方法

您在这里:

  1. 使用apply应用于该列的句子
  2. 使用可以获取sentence作为输入并应用您编写的函数的lambda表达式,类似于在print语句中使用的方式

作为词干化关键字:

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: [lemmatizer.lemmatize(w,get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

作为修饰词句({'{3}}个关键字使用''):

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: ' '.join([lemmatizer.lemmatize(w,get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]))