如何在Weka中使用Isolationforest？

问题描述

我试图在weka中使用隔离林，但是我找不到一个简单的示例来说明如何使用它，谁可以帮助我？请先感谢

import weka.classifiers.misc.IsolationForest;

public class Test2 {
    public static void main(String[] args) {
        IsolationForest isolationForest = new IsolationForest();
        .....................................................
    }
}

解决方法

我强烈建议您研究一下 IslationForest 的实现。以下代码加载一个带有 Class 第一列的 CSV 文件（注意：如果它是二进制的，则单个类值只会产生（1-异常分数），您也将获得异常分数。否则它只会返回错误）。注意我跳过了第二列（在我的例子中是异常检测不需要的 uuid）

 private static void findOutlier(File in,File out) throws Exception {
    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(in.getAbsolutePath()));

    Instances data = loader.getDataSet();
    // setting class attribute if the data format does not provide this information
    // For example,the XRFF format saves the class attribute information as well
    if (data.classIndex() == -1)
        data.setClassIndex(0);

    String[] options = new String[2];
    options[0] = "-R";                                    // "range"
    options[1] = "2";                                     // first attribute
    Remove remove = new Remove();                         // new instance of filter
    remove.setOptions(options);                           // set options
    remove.setInputFormat(data);                          // inform filter about dataset **AFTER** setting options
    Instances newData = Filter.useFilter(data,remove);   // apply filter

    IsolationForest randomForest = new IsolationForest();
    randomForest.buildClassifier(newData);
   // System.out.println(randomForest);

    FileWriter fw = new FileWriter(out);
    final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
    for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
        fw.write(e.name());
        fw.write(",");
    }
    fw.write("(1 - anomaly score),anomaly score\n");
    for (int i = 0; i < data.size(); ++i) {
        Instance inst = data.get(i);
        final double[] distributionForInstance = randomForest.distributionForInstance(inst);
        fw.write(inst + "," + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
        fw.write(",\n");
    }
    fw.flush();
}

上一个函数将在 CSV 的最后一列添加异常值。请注意，我使用的是单个类，因此为了获得相应的异常，我执行 1 - distributionForInstance[0] 否则您可以简单地执行 distributionForInstance[1] 。

用于获取（1-异常分数）的示例 input.csv：

Class,ignore,feature_0,feature_1,feature_2
A,1,21,31,31
A,2,41,61,81
A,3,37,34

用于获取（1-异常分数）和异常分数的示例 input.csv：

Class,31
B,34

weka