问题描述

例如，我可能经常使用字符{A，B，C，G}的项目集。我需要生成关联规则的所有可能的前提。在这种情况下：ABC，ABG，ACG，AB，AC，AG，BC，BG，CG，A，B，C，G。我不知道从哪里开始。数小时的研究使我了解了术语和概念，但是没有任何东西可以解释如何执行此特定步骤。到目前为止，这就是我所需要的方法。所有项目集都以字符串形式保存，并作为ArrayList存储在一起。我已经制作了一个有效的Apriori算法来生成频繁项集。

public static ArrayList<String> associationRules(ArrayList<String> data,ArrayList<String> freqItemsets,int minConf){
        ArrayList<String> generatedRules = new ArrayList<String>();
        for(int i = 0; i < freqItemsets.size(); i++) {
            String currentItemset = freqItemsets.get(i);
            if(currentItemset.length() < 2) {
                continue;
            }
            
        }
        
        
        return null; // temporary return statement to avoid compile error
    }

尽管有关此步骤和后续步骤的代码，反馈和建议当然会提供巨大帮助，但我真正需要的只是英语说明如何执行此步骤（与使用不同数据的伪代码或另一种工作方法相反）类型）。其他一切似乎都可以管理。

解决方法

假设您已定义了实际需要的定义（所有子集均按原始列表排序），则可以通过这样思考并使用这些属性来做到这一点：

在您的列表中进行排序
有限
可划分

您需要做的就是多次浏览您的角色列表，每次决定每个chraracter，这次是否包括它。如果您经历并抓住了所有可能性，那么您就完成了。为此，您应该找到一种可靠的方法来计算可能的结果字符串。

迭代解决方案

考虑可能的位状态。您有n个字符，并为每个字符分配了一点（在您的情况下为4）。然后，每个可能的位状态定义子集的合法排列，例如为{A,B,C,G}：

1001将是AG

我们知道，位集中的所有可能状态都是“可计数的”，换句话说，您可以通过加1从最小状态到最高状态对它们进行计数。

进行一个从1到2 ^ n-1（其中n是您拥有的字符数）的循环，然后通过（以正确的顺序）添加您拥有1的所有字符来构建String作为它们的代表位，并省略带有0的字符。然后您对所有可能的合法排列进行“计数”。

这种实现高度依赖于程序员及其风格，但对我而言，它看起来像这样：

public static List<String> associationRules(List<String> elements) {
  List<String> result = new ArrayList<>();
  long limit = 1 << elements.size(); // thanks to saka1029 for this correction. My code was n^2 not 2^n.

  // count from 1 to n^2 - 1
  for (long i = 1; i < limit; ++i) {
    StringBuilder seq = new StringBuilder();

    // for each position (character) decide,whether to include it based on the state of the bit.
    for (int pos = 0; pos < elements.size(); ++pos) {
      boolean include = ((i >> pos) % 2) == 1; // this line will give you true,if the in 'i' the bit at 'pos' (from behind) is 1,and false otherwise.
      if (include) {
        seq.append(elements.get(pos));
      }
    }

    // add to the final result the newly generated String.
    result.add(seq.toString());
  }

  return result;
}

，结果如下所示： [A,AB,AC,BC,ABC,G,AG,BG,ABG,CG,ACG,BCG,ABCG]

这是一种迭代（非递归）解决方案，但也有一种递归解决方案，它可能（也可能不会）更易于实现。

递归解决方案

递归解决方案可以简单地通过创建一个方法来工作，该方法将一组排序的字符和一个布尔状态（包括或不包括）作为参数，并返回所有可能的排序子置换的列表。然后，您可以使用一个公共方法来调用此方法，该方法传递字符和0作为位置，并将true或false作为初始状态（另一个稍后出现）。

然后该方法适用于分而治之。您将字符包含在定义的位置（基于是否设置了include标志），然后使用不包含第一个字符的克隆字符（子）集再次调用自己的方法。

目前，让我们假设您从不开始，包括每个序列的第一个字符（但以后要包括它）。如果将这样的字符集{A,G}传递给该方法，则该方法将（开始）按以下方式操作：

A: recurse on {B,G}
  B: recurse on {C,G}
    C: recurse on {G}
      G: set is empty,G: Add to the result all Strings with 'G' prefixed and without.
      G: return {"G",""}
    C: Add to the result all Strings with 'C' prefixed and without.
    C: {"CG","C","G",""}
    ...

这样，您将递归收集所有排序的子集排列。根据是否允许空字符串，您可以在最后删除它，也可以根本不添加它。

我是这样实现的，但是还有其他正确的方法：

public static List<String> associationRules2(List<String> elements) {
    List<String> result = new ArrayList<>();
    String thisElement = elements.get(0);
    
    // build the subset list (leaving out the first element
    List<String> remaining = new ArrayList<>();
    boolean first = true;
    for (String s : elements) {
        if (first) {
            first = false;
        } else {
            remaining.add(s);
        }
    }
    
    // if the subset is not empty,we recurse.
    if (! remaining.isEmpty()) {
        List<String> subPermutations = associationRules2(remaining);
        
        // add all permutations without thisElement.
        result.addAll(subPermutations);
        
        // add all permutations *with* thisElement.
        for (String s : subPermutations) {
            result.add(thisElement + s);
        }
    }
    
    // finally add thisElement on it's own.
    result.add(thisElement);
    
    return result;
}

结果：[G,ABCG,A]

apriori data-mining java java

Java生成关联规则的所有前提

问题描述

解决方法

迭代解决方案

递归解决方案