输入包含NaN，无穷大或dtype'float64'太大的值解决办法是什么

问题描述

从时间导入时间从sklearn.metrics导入f1_score

def train_classifier（clf，X_train，y_train）： '''使分类器适合训练数据。 '''

# Start the clock,train the classifier,then stop the clock
start = time()
clf.fit(X_train,y_train)
end = time()

# Print the results
print("Trained model in {:.4f} seconds".format(end - start))

defpredict_labels（clf，功能，目标）： '''使用基于F1得分的适合分类器进行预测。 '''

# Start the clock,make predictions,then stop the clock
start = time()
y_pred = clf.predict(features)

end = time()
# Print and return results
print("Made predictions in {:.4f} seconds.".format(end - start))

return f1_score(target,y_pred,pos_label='H'),sum(target == y_pred) / float(len(y_pred))

def train_predict（clf，X_train，y_train，X_test，y_test）： '''使用基于F1分数的分类器进行训练和预测。 '''

# Indicate the classifier and the training set size
print("Training a {} using a training set size of {}. . .".format(clf.__class__.__name__,len(X_train)))

# Train the classifier
train_classifier(clf,X_train,y_train)

# Print the results of prediction for both training and testing
f1,acc = predict_labels(clf,y_train)
print(f1,acc)
print("F1 score and accuracy score for training set: {:.4f},{:.4f}.".format(f1,acc))

f1,X_test,y_test)
print("F1 score and accuracy score for test set: {:.4f},acc))

clf_A = LogisticRegression（随机状态= 42） train_predict（clf_A，X_train，y_train，X_test，y_test）打印（''）

解决方法

您只是将数据拆分为测试/训练，然后将数据拟合到模型中，这为什么会导致错误“输入包含Nan值”。

首先，您需要在使用熊猫读取数据集之后应用预处理，以删除数据集中的Nan值。然后转向拆分数据然后构建模型。

要完成此操作，您可以关注Link

jupyter-notebook sklearn-pandas valueerror