问题描述
我使用了 Decision Tree Classifier
并且我想输入我的 input
作为 string
而不是给出 integer
价值,但它给了我 error
喜欢:
Traceback (most recent call last):
File "D:/backup code for odoo project/New folder/New folder/main.py",line 38,in <module>
theme_res = lebel_encoder.transform(theme_input)
File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 277,in transform
_,y = _encode(y,uniques=self.classes_,encode=True)
File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 121,in _encode
return _encode_numpy(values,uniques,encode,File "C:\Users\Dell\AppData\Local\Programs\Python\python38\lib\site-packages\sklearn\preprocessing\_label.py",line 50,in _encode_numpy
raise ValueError("y contains prevIoUsly unseen labels: %s"
ValueError: y contains prevIoUsly unseen labels: ['Food','cafe','sticky']
代码:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn import tree
df = pd.read_csv("new_data.csv",encoding='latin1')
inputs = df.drop('selected_theme',axis='columns')
target = df['selected_theme']
lebel_encoder = LabelEncoder()
inputs['main_cat_n'] = lebel_encoder.fit_transform(inputs['main_cat'])
inputs['sub_cat_n'] = lebel_encoder.fit_transform(inputs['sub_cat'])
inputs['nav_bar_n'] = lebel_encoder.fit_transform(inputs['nav_bar'])
inputs_n = inputs.drop(['main_cat','sub_cat','nav_bar'],axis='columns')
model = tree.DecisionTreeClassifier()
model.fit(inputs_n,target)
print(model.score(inputs_n,target))
theme_input = ['Food','sticky']
theme_res = lebel_encoder.transform(theme_input)
result_theme = model.predict(theme_res)
print(result_theme)
解决方法
错误发生在之前分类器,它发生在这一行
theme_res = lebel_encoder.transform(theme_input)
错误消息告诉您,您的 label_encoder
从未见过诸如“食物”、“咖啡馆”、“粘性”之类的类别。发生这种情况是因为您重写了 LabelEncoders。您应该为不同的功能使用单独的 LabelEncoders,例如:
categorical_features = ['main_cat','sub_cat','nav_bar']
encoders = dict()
for cat in categorical_features:
encoders[cat] = LabelEncoder()
inputs[f'{cat}_n'] = encoders[cat].fit_transform(inputs[cat])
inputs_n = inputs.drop(['main_cat','nav_bar'],axis='columns')
...