WEKA 交叉验证线性回归 - 我可以获得 RMSPE 吗?

问题描述

是否可以在交叉验证模型后获得 RMSPE?我知道我可以轻松获得 RMSE - 但是均方根百分比误差呢?

我与 WEKA 线性回归交叉验证放在一起的示例代码

        // loads data and set class index
        final ArrayList<Attribute> attributes = new ArrayList<>();
        attributes.add(new Attribute("x"));
        attributes.add(new Attribute("y"));

        Instances data = new Instances("name",attributes,0);
        data.add(new DenseInstance(1d,new double[]{5,80}));
        // ... add more data

        // -c last
        data.setClassIndex(data.numAttributes() - 1);

        // classifier
        final LinearRegression cls = new LinearRegression();

        // other options
        int seed = 129;
        int folds = 3;

        // randomize data
        Random rand = new Random(seed);
        Instances randData = new Instances(data);
        randData.randomize(rand);
        if (randData.classAttribute().isNominal())
            randData.stratify(folds);

        // perform cross-validation
        Evaluation eval = new Evaluation(data);

        eval.crossValidateModel(cls,data,3,new Random(seed));

        System.out.println("rootMeanSquaredError " + eval.rootMeanSquaredError());
        System.out.println("rootRelativeSquaredError " + eval.rootRelativeSquaredError());
        System.out.println("rootMeanPriorSquaredError " + eval.rootMeanPriorSquaredError());

        // output evaluation
        System.out.println();
        System.out.println("=== Setup ===");
        System.out.println("Classifier: " + cls.getClass().getName() + " " + Utils.joinoptions(cls.getoptions()));
        System.out.println("Dataset: " + data.relationName());
        System.out.println("Folds: " + folds);
        System.out.println("Seed: " + seed);
        System.out.println();
        System.out.println(eval.toSummaryString("=== " + folds + "-fold cross-validation ===",true));


        /*

        === Setup ===
        Classifier: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
        Dataset: name
        Folds: 3
        Seed: 129

        === 3-fold cross-validation ===
        Correlation coefficient                  0.6289
        Mean absolute error                      7.5177
        Root mean squared error                  8.262
        Relative absolute error                 85.7748 %
        Root relative squared error             77.9819 %
        Total Number of Instances               15

         */

解决方法

Weka 默认不计算 RMSPE。我已经组合了一个小 Weka 包,它应该可以为数字类提供技巧(注意:只完成了有限的测试),称为 rmspe-weka-package

评估运行(安装了该软件包)后,您应该能够检索统计信息,如下所示:

Evaluation eval = ... // initialize your evaluation object
...                   // perform your evaluation
double rmspe = eval.getPluginMetric("weka.classifiers.evaluation.RMSPE").getStatistic("RMSPE");