值错误:y 包含以前未见过的标签:

问题描述

我使用了 Decision Tree Classifier 并且我想输入我的 input 作为 string 而不是给出 integer 价值,但它给了我 error 喜欢:

Traceback (most recent call last):
  File "D:/backup code for odoo project/New folder/New folder/main.py",line 38,in <module>
    theme_res = lebel_encoder.transform(theme_input)
  File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 277,in transform
    _,y = _encode(y,uniques=self.classes_,encode=True)
  File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 121,in _encode
    return _encode_numpy(values,uniques,encode,File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 50,in _encode_numpy
    raise ValueError("y contains prevIoUsly unseen labels: %s"
ValueError: y contains prevIoUsly unseen labels: ['Food','cafe','sticky'] 

代码

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn import tree
df = pd.read_csv("new_data.csv",encoding='latin1')

inputs = df.drop('selected_theme',axis='columns')
target = df['selected_theme']

lebel_encoder = LabelEncoder()

inputs['main_cat_n'] = lebel_encoder.fit_transform(inputs['main_cat'])
inputs['sub_cat_n'] = lebel_encoder.fit_transform(inputs['sub_cat'])
inputs['nav_bar_n'] = lebel_encoder.fit_transform(inputs['nav_bar'])

inputs_n = inputs.drop(['main_cat','sub_cat','nav_bar'],axis='columns')

 model = tree.DecisionTreeClassifier()
 model.fit(inputs_n,target)
 print(model.score(inputs_n,target))

 theme_input = ['Food','sticky']
 theme_res = lebel_encoder.transform(theme_input)
 result_theme = model.predict(theme_res)
 print(result_theme)

解决方法

错误发生在之前分类器,它发生在这一行

 theme_res = lebel_encoder.transform(theme_input)

错误消息告诉您,您的 label_encoder 从未见过诸如“食物”、“咖啡馆”、“粘性”之类的类别。发生这种情况是因为您重写了 LabelEncoders。您应该为不同的功能使用单独的 LabelEncoders,例如:

categorical_features = ['main_cat','sub_cat','nav_bar']
encoders = dict()
for cat in categorical_features:
    encoders[cat] = LabelEncoder()
    inputs[f'{cat}_n'] = encoders[cat].fit_transform(inputs[cat])

inputs_n = inputs.drop(['main_cat','nav_bar'],axis='columns')
...