是否可以在二维列表上应用 sklearn.preprocessing.LabelEncoder()?

问题描述

假设我有一个列表,如下所示:

l = [
       ['PER','O','GEO'],['ORG','O'],['O','PER','O']
    ]

我想用 LabelEncoder() 对二维列表进行编码。

它应该看起来像:

l = [
       [1,2],[3,0],[0,1,0]
    ]

有可能吗? 如果没有,有什么解决方法吗?

提前致谢!

解决方法

您可以将列表展平,用所有潜在值拟合编码器,然后使用编码器对每个子列表进行变换,如下所示:

from sklearn.preprocessing import LabelEncoder

l = [
       ['PER','O','GEO'],['ORG','O'],['O','PER','O']
    ]

flattened_l = [e for sublist in l for e in sublist]

# flattened_l is ['PER','GEO','ORG','O']

le = LabelEncoder().fit(flattened_l)

# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0,'GEO'),(1,'O'),(2,'ORG'),(3,'PER')]

# And,finally,transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res

# Getting the result you want:
# [[3,1,0],[2,1],[1,3,1]]