是否可以在二维列表上应用 sklearn.preprocessing.LabelEncoder()？

问题描述

假设我有一个列表，如下所示：

l = [
       ['PER','O','GEO'],['ORG','O'],['O','PER','O']
    ]

我想用 LabelEncoder() 对二维列表进行编码。

它应该看起来像：

l = [
       [1,2],[3,0],[0,1,0]
    ]

有可能吗？如果没有，有什么解决方法吗？

提前致谢！

解决方法

您可以将列表展平，用所有潜在值拟合编码器，然后使用编码器对每个子列表进行变换，如下所示：

from sklearn.preprocessing import LabelEncoder

l = [
       ['PER','O','GEO'],['ORG','O'],['O','PER','O']
    ]

flattened_l = [e for sublist in l for e in sublist]

# flattened_l is ['PER','GEO','ORG','O']

le = LabelEncoder().fit(flattened_l)

# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0,'GEO'),(1,'O'),(2,'ORG'),(3,'PER')]

# And,finally,transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res

# Getting the result you want:
# [[3,1,0],[2,1],[1,3,1]]

label-encoding python