问题描述
l = [
['PER','O','GEO'],['ORG','O'],['O','PER','O']
]
我想用 LabelEncoder() 对二维列表进行编码。
它应该看起来像:
l = [
[1,2],[3,0],[0,1,0]
]
有可能吗? 如果没有,有什么解决方法吗?
提前致谢!
解决方法
您可以将列表展平,用所有潜在值拟合编码器,然后使用编码器对每个子列表进行变换,如下所示:
from sklearn.preprocessing import LabelEncoder
l = [
['PER','O','GEO'],['ORG','O'],['O','PER','O']
]
flattened_l = [e for sublist in l for e in sublist]
# flattened_l is ['PER','GEO','ORG','O']
le = LabelEncoder().fit(flattened_l)
# See the mapping generated by the encoder:
list(enumerate(le.classes_))
# [(0,'GEO'),(1,'O'),(2,'ORG'),(3,'PER')]
# And,finally,transform each sublist:
res = [list(le.transform(sublist)) for sublist in l]
res
# Getting the result you want:
# [[3,1,0],[2,1],[1,3,1]]