fastai:使用预先分割的数据集评估表格预测模型

问题描述

鉴于预先分割的数据集用于训练和测试,我想知道如何在fastai中相应地应用预测以访问MAE和RMSE值。

以下示例来自fastai,并使用sklearn的train_test_split进行了一些修改

import numpy as np
from sklearn.model_selection import train_test_split
from fastai.tabular.all import *
import pandas as pd

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

train,test = train_test_split(df,test_size=0.20,random_state=42)

cat_names = ['workclass','education','marital-status','occupation','relationship','race']
cont_names = ['age','fnlwgt','education-num']
procs = [Categorify,FillMissing,normalize]
dls = TabularDataLoaders.from_df(train,path,procs=procs,cat_names=cat_names,cont_names=cont_names,y_names="salary")
learn = tabular_learner(dls)


learn.fit_one_cycle(5)

epoch   train_loss  valid_loss  time
0   0.378432    0.356029    00:05
1   0.369692    0.358837    00:05
2   0.355757    0.348524    00:05
3   0.342714    0.348011    00:05
4   0.334072    0.346690    00:05


learn.unfreeze()
learn.fit_one_cycle(10,max_lr=slice(10e-4,10e-3))

epoch   train_loss  valid_loss  time
0   0.343953    0.350457    00:05
1   0.349379    0.353308    00:04
2   0.360508    0.352564    00:04
3   0.338458    0.351742    00:05
4   0.334585    0.352128    00:05
5   0.342312    0.351003    00:04
6   0.329152    0.350455    00:05
7   0.334460    0.351833    00:05
8   0.328608    0.351415    00:05
9   0.333205    0.352079    00:04

现在如何将学习模型应用于测试集以计算指标?像下面这样的东西对我不起作用:

learn.predict(test)

在这里我收到以下错误AttributeError: 'DataFrame' object has no attribute 'to_frame'

谢谢您的帮助!

解决方法

我最终为每个预测编写了一个简单的for循环。

当然,这远非高效,但解决了我的问题。如果您有任何建议可以克服缓慢的循环问题,请在下面发表评论。

System.Type desiredType = System.Type.GetNestedTypesIncludingInherited("Avocado.Pit");
// desiredType.FullName would be Avocado.Pit,not Fruit.Pit.