问题描述
当我逐个单元格运行以下代码时,它会在数据框中查找特殊字符。当我使用代码 df['country'] = df['country'].replace('?',np.nan)
运行单元格(将特殊字符替换为 nan
,然后删除列)时,会引发 KeyError
。
# Import libraries
import numpy as np # linear algebra
import pandas as pd # data processing,import pandas.util.testing as tm
# Libraries for data visualization
import matplotlib.pyplot as pplt
import seaborn as sns
from pandas.plotting import scatter_matrix
# Import scikit_learn module for the algorithm/model: Linear Regression
from sklearn.linear_model import LogisticRegression
# Import scikit_learn module to split the dataset into train.test sub-datasets
from sklearn.model_selection import train_test_split
# Import scikit_learn module for k-fold cross validation
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
# import the metrics class
from sklearn import metrics
# import stats for accuracy
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore")
from google.colab import files
upload=files.upload()
#load the dataset provided
salary_dataset = pd.read_csv('adult.csv')
salary_dataset.head()
# salary dataset info to find columns and count of the data
salary_dataset.info()
#replacing some special character columns names with proper names
df = pd.DataFrame([])
df.rename(columns={'capital-gain': 'capital gain','capital-loss': 'capital loss','native-country': 'country','hours-per-week': 'hours per week','marital-status': 'marital'},inplace=True)
df.columns
#Finding the special characters in the data frame
df.isin(['?']).sum(axis=0)
# code will replace the special character to nan and then drop the columns
df['country'] = df['country'].replace('?',np.nan)
df['workclass'] = df['workclass'].replace('?',np.nan)
df['occupation'] = df['occupation'].replace('?',np.nan)
#dropping the NaN rows Now
df.dropna(how='any',inplace=True)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)