输入包含NaN,无穷大或dtype'float64'太大的值解决办法是什么

问题描述

从时间导入时间 从sklearn.metrics导入f1_score

def train_classifier(clf,X_train,y_train): '''使分类器适合训练数据。 '''

# Start the clock,train the classifier,then stop the clock
start = time()
clf.fit(X_train,y_train)
end = time()

# Print the results
print("Trained model in {:.4f} seconds".format(end - start))

defpredict_labels(clf,功能,目标): '''使用基于F1得分的适合分类器进行预测。 '''

# Start the clock,make predictions,then stop the clock
start = time()
y_pred = clf.predict(features)

end = time()
# Print and return results
print("Made predictions in {:.4f} seconds.".format(end - start))

return f1_score(target,y_pred,pos_label='H'),sum(target == y_pred) / float(len(y_pred))

def train_predict(clf,X_train,y_train,X_test,y_test): '''使用基于F1分数的分类器进行训练和预测。 '''

# Indicate the classifier and the training set size
print("Training a {} using a training set size of {}. . .".format(clf.__class__.__name__,len(X_train)))

# Train the classifier
train_classifier(clf,X_train,y_train)

# Print the results of prediction for both training and testing
f1,acc = predict_labels(clf,y_train)
print(f1,acc)
print("F1 score and accuracy score for training set: {:.4f},{:.4f}.".format(f1,acc))

f1,X_test,y_test)
print("F1 score and accuracy score for test set: {:.4f},acc))

clf_A = LogisticRegression(随机状态= 42) train_predict(clf_A,X_train,y_train,X_test,y_test) 打印('')

解决方法

您只是将数据拆分为测试/训练,然后将数据拟合到模型中,这为什么会导致错误“输入包含Nan值”。

首先,您需要在使用熊猫读取数据集之后应用预处理,以删除数据集中的Nan值。然后转向拆分数据然后构建模型。

要完成此操作,您可以关注Link