计算二维数组中的出现次数

问题描述

我正在尝试从包含大量代码（数字）的文本文件中计算每行出现的次数。

9045,9107,2376,9017
2387,4405,4499,7120
9107,3559,3488
9045,4499

我想比较从文本字段中得到的一组相似的数字，例如：

9107,2387,4499

我正在寻找的唯一结果是它是否包含来自文本文件的 2 个以上的数字（每行）。所以在这种情况下它会是真的，因为：

9045,9107,9017 - 错误 (1)
2387、4405、4499、7120 - 正确 (3)
9107、2387、3559、3488 - 错误 (2)
9045,4425,4490 - 假 (0)

据我所知，最好的方法是使用二维数组，并且我已成功导入文件：

Scanner in = null;
try { 
    in = new Scanner(new File("areas.txt"));
} catch (FileNotFoundException ex) {
    Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE,null,ex);
}
List < String[] > lines = new ArrayList < > ();
while ( in .hasNextLine()) {
    String line = in .nextLine().trim();
    String[] splitted = line.split(",");
    lines.add(splitted);
}

String[][] result = new String[lines.size()][];
for (int i = 0; i < result.length; i++) {
    result[i] = lines.get(i);
}

System.out.println(Arrays.deepToString(result));

我得到的结果：

[[9045,9017],[2387,7120],[9107,3488],[9045,4499],[],[]]

从这里开始，我有点坚持逐行检查代码。有什么建议或意见吗？二维数组是最好的方法，还是有更简单或更好的方法？

解决方法

预期的输入数量定义了您应该使用的搜索算法类型。

如果您不是搜索数千行，那么一个简单的算法就可以了。如有疑问，请选择简单而不是复杂且难以理解的算法。

虽然它不是一种有效的算法，但在大多数情况下，一个简单的嵌套 for 循环就可以解决问题。

一个简单的实现如下所示：

final int FOUND_THRESHOLD = 2;

String[] comparedCodes = {"9107","4405","2387","4499"};
String[][] allInputs = {
    {"9045","9107","2376","9017"},// This should not match
    {"2387","4499","7120"},// This should match
    {"9107","3559","3488"},// This should not match
    {"9045","4499"},// This should match
};

List<String[] > results = new ArrayList<>();
for (String[] input: allInputs) {
    int numFound = 0;

    // Compare the codes
    for (String code: input) {
        for (String c: comparedCodes) {
            if (code.equals(c)) {
                numFound++;
                break; // Breaking out here prevents unnecessary work
            }
        }

        if (numFound >= FOUND_THRESHOLD) {
            results.add(input);
            break; // Breaking out here prevents unnecessary work
        }
    }
}

for (String[] result: results) {
    System.out.println(Arrays.toString(result));
}

它为我们提供了输出：

[2387,4405,4499,7120]
[9045,3559,4499]

为了扩展我的评论，以下是您可以做的粗略概述：

String textFieldContents = ... //get it

//build a set of the user input by splitting at commas 
//a stream is used to be able to trim the elements before collecting them into a set
Set<String> userInput = Arrays.stream(textFieldContents .split(","))
                              .map(String::trim).collect(Collectors.toSet());

//stream the lines in the file
List<Boolean> matchResults = Files.lines(Path.of("areas.txt"))
         //map each line to true/false
        .map(line -> {          
           //split the line and stream the parts
           return Arrays.stream(line.split(","))
                        //trim each part
                        .map(String::trim) 
                        //select only those contained in the user input set
                        .filter(part -> userInput.contains(part))
                        //count matching elements and return whether there are more than 2 or not
                        .count() > 2l;          
        })
        //collect the results into a list,each element position should correspond to the zero-based line number
        .collect(Collectors.toList());

如果您需要收集匹配的行而不是每行一个标志，您可以将 map() 替换为 filter()（相同的内容）并将结果类型更改为 List<String>。

arrays arrays arrays compare compare count count count java java multidimensional-array