Java Regex-定义捕获组时可能出现错误

问题描述

Correction of an algorithm for String analysis的伪续集。无需阅读链接即可理解问题,但可以给出上下文。

我的目标是从满足正则表达式的字符串中提取捕获组的内容。我的问题是(显然)我没有正确识别/定义捕获组。

在下面提供的示例中,正则表达式为^[A-Za-z0-9][A-Za-z0-9] s(?<season>[A-Za-z0-9])e(?<episode>[A-Za-z0-9]) [A-Za-z0-9].mp4。从理论上讲,这应该是任何具有以下条件的字符串都满足的正则表达式:

  • 随机(甚至无效)前缀;
  • 后跟字符串 s
  • 后跟许多字母数字字符(命名为season组);
  • 后跟字符串e
  • 后跟一些字母数字字符(命名为episode组);
  • 后跟字符串 一个简单的空白);
  • 后跟一个随机(甚至无效)的字符串;
  • 后跟字符串.mp4

理论上,下面的一小段代码应筛选输入字符串(Star Wars Rebels s02e18 - The Forgotten Droid.mp4),并将02识别为命名组season18的值。值episode。相反,我收到了java.lang.IllegalStateException: No match found错误消息。最糟糕的是,在修改代码以使其显示命名组的名称之后,似乎没有 no 命名组。我很困惑。

我在做什么错?定义命名组有问题吗?还记得用方法matcher.group(...)召回他们吗?正则表达式写错了吗?

警告:这是我第一次使用Java正则表达式,因此可能存在多个问题。


代码

String regex = "^[A-Za-z0-9][A-Za-z0-9] s(?<season>[A-Za-z0-9])e(?<episode>[A-Za-z0-9]) [A-Za-z0-9].mp4";
String string = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
        
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
        
System.out.println("\t\t pattern: " + pattern.toString() + "\n\t\t Named groups: ");
for(int i=0 ; i<matcher.groupCount() ; i++)
    System.out.println("\t\t\t Group " + i + "-th: " + matcher.group(i));
        
System.out.println("Season value: " + matcher.group("season"));
System.out.println("\t Episode value: " + matcher.group("episode"));

解决方法

一些替代正则表达式,用于从字符串中提取季节和情节

备用1-定义组名:

"\\w+\\s+s(?<season>\\d+)e(?<episode>\\d+).*\\w+\\.mp4"

请参阅上下文中的替代1正则表达式:

public static void main(String[] args) {
    String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";

    // From the input the following is matched: 'Rebels s02e18 - The Forgotten Droid.mp4'
    String regex = "\\w+\\s+s(?<season>\\d+)e(?<episode>\\d+).*\\w+\\.mp4";
    Matcher matcher = Pattern.compile(regex).matcher(input);

    // The condition have to be true before using "matcher.group()",// or 'java.lang.IllegalStateException: No match found' will be thrown.
    if(matcher.find()) { 
        // Output: 'Season value:  02'
        System.out.println("Season value:\t" + matcher.group("season"));
        // Output: 'Episode value: 18'
        System.out.println("Episode value:\t" + matcher.group("episode"));
    }
}

替代2-无组名,匹配整个输入:

"^\\S+\\s.*?\\ss(\\S+)e(\\S+).*\\S+\\.mp4"

上下文中:

public static void main(String[] args) {
    String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";

    // From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
    String regex = "^\\S+\\s.*?\\ss(\\S+)e(\\S+).*\\S+\\.mp4";
    // MULTILINE due to Boundary matcher '^'
    Matcher matcher = Pattern.compile(regex,Pattern.MULTILINE).matcher(input);

    // The condition have to be true before using "matcher.group()",// or 'java.lang.IllegalStateException: No match found' will be thrown.
    if(matcher.find()) {
        String season = matcher.group(1);
        String episode = matcher.group(2);
        // Output: 'Season value:  02'
        System.out.println("Season value:\t" + season);
        // Output: 'Episode value: 18'
        System.out.println("Episode value:\t" + episode);
    }
}

替代项3:

"^\\w+\\s.+?\\ss(?<season>\\w+)e(?<episode>\\w+).*\\w+\\.mp4"

上下文中:

public static void main(String[] args) {
    String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";

    // From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
    String regex = "^\\w+\\s.+?\\ss(?<season>\\w+)e(?<episode>\\w+).*\\w+\\.mp4";
    // MULTILINE due to Boundary matcher '^'
    Matcher matcher = Pattern.compile(regex,// or 'java.lang.IllegalStateException: No match found' will be thrown.
    if(matcher.find()) {
        // Output: 'Season value:  02'
        System.out.println("Season value:\t" + matcher.group("season"));
        // Output: 'Episode value: 18'
        System.out.println("Episode value:\t" + matcher.group("episode"));
    }
}

替代4:

"^[A-Za-z0-9]+\\s.+?s(?<season>[A-Za-z0-9]+)e(?<episode>[A-Za-z0-9]+).*[A-Za-z0-9]\\.mp4"

上下文中

public static void main(String[] args) {
    String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";

    // From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
    String regex = "^[A-Za-z0-9]+\\s.+?s(?<season>[A-Za-z0-9]+)e(?<episode>[A-Za-z0-9]+).*[A-Za-z0-9]\\.mp4";
    // MULTILINE due to Boundary matcher '^'
    Matcher matcher = Pattern.compile(regex,// or 'java.lang.IllegalStateException: No match found' will be thrown.
    if(matcher.find()) {
        // Output: 'Season value:  02'
        System.out.println("Season value:\t" + matcher.group("season"));
        // Output: 'Episode value: 18'
        System.out.println("Episode value:\t" + matcher.group("episode"));
    }
}

有很多方法可以匹配输入以提取季节和情节。 这是Java的“类模式”的列表,提供更多选项:

https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html