如何解决“国家”键的这个 KeyError 问题?

问题描述

当我逐个单元格运行以下代码时,它会在数据框中查找特殊字符。当我使用代码 df['country'] = df['country'].replace('?',np.nan) 运行单元格(将特殊字符替换为 nan,然后删除列)时,会引发 KeyError

# Import libraries 

import numpy as np # linear algebra
import pandas as pd # data processing,import pandas.util.testing as tm

# Libraries for data visualization
import matplotlib.pyplot as pplt  
import seaborn as sns 
from pandas.plotting import scatter_matrix

# Import scikit_learn module for the algorithm/model: Linear Regression
from sklearn.linear_model import LogisticRegression
# Import scikit_learn module to split the dataset into train.test sub-datasets
from sklearn.model_selection import train_test_split 
# Import scikit_learn module for k-fold cross validation
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
# import the metrics class
from sklearn import metrics
# import stats for accuracy 
import statsmodels.api as sm

import warnings
warnings.filterwarnings("ignore")

from google.colab import files
upload=files.upload()

#load the dataset provided
salary_dataset = pd.read_csv('adult.csv')
salary_dataset.head()

# salary dataset info to find columns and count of the data 
salary_dataset.info()

#replacing some special character columns names with proper names 
df = pd.DataFrame([])
df.rename(columns={'capital-gain': 'capital gain','capital-loss': 'capital loss','native-country': 'country','hours-per-week': 'hours per week','marital-status': 'marital'},inplace=True)
df.columns

#Finding the special characters in the data frame 
df.isin(['?']).sum(axis=0)

# code will replace the special character to nan and then drop the columns 
df['country'] = df['country'].replace('?',np.nan)
df['workclass'] = df['workclass'].replace('?',np.nan)
df['occupation'] = df['occupation'].replace('?',np.nan)
#dropping the NaN rows Now 
df.dropna(how='any',inplace=True)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)