问题描述
我正在尝试在 python 中安装 LightGBM 回归器,但它给了我一个错误。基本上,我有一个数据集,其中所有预测变量都是分类变量,而我的目标变量是连续数字。因为我所有的 X 变量都是分类变量,所以我使用标签编码将它们转换为数字形式。 之后,我将我的分类变量传递给 LGBMRegressor,以便算法相应地处理它们。
# lightgbm for regression
import numpy as np
import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
df = pd.read_csv("TrainModelling.csv")
df.drop(df.columns[0],axis=1,inplace=True) #Remove index column
y = df["Target"]
X = df.drop("Target",axis=1)
le = preprocessing.LabelEncoder()
X = X.apply(le.fit_transform)
X_train,X_test,y_train,y_test = train_test_split( X,y,test_size=0.2,random_state=42)
hyper_params = {
'task': 'train','boosting_type': 'gbdt','objective': 'regression','metric': ['l2','auc'],'learning_rate': 0.005,'feature_fraction': 0.9,'bagging_fraction': 0.7,'bagging_freq': 10,'verbose': 0,"max_depth": 8,"num_leaves": 128,"max_bin": 512,"num_iterations": 100000,"n_estimators": 1000
}
cat_feature_list = np.where(X.dtypes != float)[0]
gbm = lgb.LGBMRegressor(**hyper_params,categorical_feature=cat_feature_list)
gbm.fit(X_train,eval_set=[(X_test,y_test)],eval_metric='l1',early_stopping_rounds=1000)
错误:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
解决方法
这一行有问题:
cat_feature_list = np.where(X.dtypes != float)[0]
(我希望你分享错误的整个追溯,它可以节省时间..)
X.dtypes != float
给出一个熊猫系列布尔值,然后 numpy
尝试评估其真实性,从而评估错误。获取列表中分类列的名称:
cat_feature_list = X.select_dtypes("object").columns.tolist()