如何在Weka中使用Isolationforest?

问题描述

我试图在weka中使用隔离林,但是我找不到一个简单的示例来说明如何使用它,谁可以帮助我?请先感谢

import weka.classifiers.misc.IsolationForest;

public class Test2 {
    public static void main(String[] args) {
        IsolationForest isolationForest = new IsolationForest();
        .....................................................
    }
}

解决方法

我强烈建议您研究一下 IslationForest 的实现。 以下代码加载一个带有 Class 第一列的 CSV 文件(注意:如果它是二进制的,则单个类值只会产生(1-异常分数),您也将获得异常分数。否则它只会返回错误)。注意我跳过了第二列(在我的例子中是异常检测不需要的 uuid)

 private static void findOutlier(File in,File out) throws Exception {
    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(in.getAbsolutePath()));

    Instances data = loader.getDataSet();
    // setting class attribute if the data format does not provide this information
    // For example,the XRFF format saves the class attribute information as well
    if (data.classIndex() == -1)
        data.setClassIndex(0);

    String[] options = new String[2];
    options[0] = "-R";                                    // "range"
    options[1] = "2";                                     // first attribute
    Remove remove = new Remove();                         // new instance of filter
    remove.setOptions(options);                           // set options
    remove.setInputFormat(data);                          // inform filter about dataset **AFTER** setting options
    Instances newData = Filter.useFilter(data,remove);   // apply filter

    IsolationForest randomForest = new IsolationForest();
    randomForest.buildClassifier(newData);
   // System.out.println(randomForest);

    FileWriter fw = new FileWriter(out);
    final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
    for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
        fw.write(e.name());
        fw.write(",");
    }
    fw.write("(1 - anomaly score),anomaly score\n");
    for (int i = 0; i < data.size(); ++i) {
        Instance inst = data.get(i);
        final double[] distributionForInstance = randomForest.distributionForInstance(inst);
        fw.write(inst + "," + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
        fw.write(",\n");
    }
    fw.flush();
}

上一个函数将在 CSV 的最后一列添加异常值。请注意,我使用的是单个类,因此为了获得相应的异常,我执行 1 - distributionForInstance[0] 否则您可以简单地执行 distributionForInstance[1] 。

用于获取(1-异常分数)的示例 input.csv:

Class,ignore,feature_0,feature_1,feature_2
A,1,21,31,31
A,2,41,61,81
A,3,37,34

用于获取(1-异常分数)和异常分数的示例 input.csv:

Class,31
B,34

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...