问题描述
我正在尝试通过深度学习对数据集进行一些缺失值插补。我已经被这个问题困扰了很长时间,需要帮助。此行产生 KeyError
:imputer.fit(train_df = sampleDf,num_epochs = 50)
。请注意,数据集中的训练集和测试集之间没有分离。下面是一个可重现的例子:
#!pip install datawig
import datawig
columns = np.arange(4).tolist()
print(len(columns))
d = {'col1': [1,2],'col2': [3,4],'col3': [5,6],'col4': [9,np.nan]}
sampleDf = pd.DataFrame(data = d)
#Iterate through each column of the dataset and impute that column. The resulting dataset will be used to impute the subsequent column etc.
for i in range(len(columns)):
#No need to impute the column if it has no null values.
if (sampleDf[sampleDf.columns[i]].isnull().sum() == 0):
continue
inputColumns = columns[:i] + columns[i+1:]
print([str(value) for value in inputColumns])
print(str(i))
#Initialize a SimpleImputer model
imputer = datawig.SimpleImputer(
input_columns= [str(value) for value in inputColumns],#Column(s) containing information about the column we want to impute
output_column = str(i),#The column for which we would like to impute values
output_path = 'imputerModel' #Stores model data and metrics
)
#Fit an imputer model on the train data
imputer.fit(train_df = sampleDf,num_epochs = 50)
#Impute missing values and return original dataframe with predictions
imputedDatasetNN = imputer.predict(data_frame = sampleDf)
我尝试了此代码的许多变体,但所有版本都会导致相同的错误。如果可能,我将不胜感激。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)