问题描述
Correction of an algorithm for String analysis的伪续集。无需阅读链接即可理解问题,但可以给出上下文。
我的目标是从满足正则表达式的字符串中提取捕获组的内容。我的问题是(显然)我没有正确识别/定义捕获组。
在下面提供的示例中,正则表达式为^[A-Za-z0-9][A-Za-z0-9] s(?<season>[A-Za-z0-9])e(?<episode>[A-Za-z0-9]) [A-Za-z0-9].mp4
。从理论上讲,这应该是任何具有以下条件的字符串都满足的正则表达式:
- 随机(甚至无效)前缀;
- 后跟字符串
s
; - 后跟许多字母数字字符(命名为
season
组); - 后跟字符串
e
; - 后跟一些字母数字字符(命名为
episode
组); - 后跟字符串
- 后跟一个随机(甚至无效)的字符串;
- 后跟字符串
.mp4
;
理论上,下面的一小段代码应筛选输入字符串(Star Wars Rebels s02e18 - The Forgotten Droid.mp4
),并将02
识别为命名组season
和18
的值。值episode
。相反,我收到了java.lang.IllegalStateException: No match found
错误消息。最糟糕的是,在修改代码以使其显示命名组的名称之后,似乎没有 no 命名组。我很困惑。
我在做什么错?定义命名组有问题吗?还记得用方法matcher.group(...)
召回他们吗?正则表达式写错了吗?
警告:这是我第一次使用Java正则表达式,因此可能存在多个问题。
代码:
String regex = "^[A-Za-z0-9][A-Za-z0-9] s(?<season>[A-Za-z0-9])e(?<episode>[A-Za-z0-9]) [A-Za-z0-9].mp4";
String string = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
System.out.println("\t\t pattern: " + pattern.toString() + "\n\t\t Named groups: ");
for(int i=0 ; i<matcher.groupCount() ; i++)
System.out.println("\t\t\t Group " + i + "-th: " + matcher.group(i));
System.out.println("Season value: " + matcher.group("season"));
System.out.println("\t Episode value: " + matcher.group("episode"));
解决方法
一些替代正则表达式,用于从字符串中提取季节和情节
备用1-定义组名:
"\\w+\\s+s(?<season>\\d+)e(?<episode>\\d+).*\\w+\\.mp4"
请参阅上下文中的替代1正则表达式:
public static void main(String[] args) {
String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
// From the input the following is matched: 'Rebels s02e18 - The Forgotten Droid.mp4'
String regex = "\\w+\\s+s(?<season>\\d+)e(?<episode>\\d+).*\\w+\\.mp4";
Matcher matcher = Pattern.compile(regex).matcher(input);
// The condition have to be true before using "matcher.group()",// or 'java.lang.IllegalStateException: No match found' will be thrown.
if(matcher.find()) {
// Output: 'Season value: 02'
System.out.println("Season value:\t" + matcher.group("season"));
// Output: 'Episode value: 18'
System.out.println("Episode value:\t" + matcher.group("episode"));
}
}
替代2-无组名,匹配整个输入:
"^\\S+\\s.*?\\ss(\\S+)e(\\S+).*\\S+\\.mp4"
上下文中:
public static void main(String[] args) {
String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
// From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
String regex = "^\\S+\\s.*?\\ss(\\S+)e(\\S+).*\\S+\\.mp4";
// MULTILINE due to Boundary matcher '^'
Matcher matcher = Pattern.compile(regex,Pattern.MULTILINE).matcher(input);
// The condition have to be true before using "matcher.group()",// or 'java.lang.IllegalStateException: No match found' will be thrown.
if(matcher.find()) {
String season = matcher.group(1);
String episode = matcher.group(2);
// Output: 'Season value: 02'
System.out.println("Season value:\t" + season);
// Output: 'Episode value: 18'
System.out.println("Episode value:\t" + episode);
}
}
替代项3:
"^\\w+\\s.+?\\ss(?<season>\\w+)e(?<episode>\\w+).*\\w+\\.mp4"
上下文中:
public static void main(String[] args) {
String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
// From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
String regex = "^\\w+\\s.+?\\ss(?<season>\\w+)e(?<episode>\\w+).*\\w+\\.mp4";
// MULTILINE due to Boundary matcher '^'
Matcher matcher = Pattern.compile(regex,// or 'java.lang.IllegalStateException: No match found' will be thrown.
if(matcher.find()) {
// Output: 'Season value: 02'
System.out.println("Season value:\t" + matcher.group("season"));
// Output: 'Episode value: 18'
System.out.println("Episode value:\t" + matcher.group("episode"));
}
}
替代4:
"^[A-Za-z0-9]+\\s.+?s(?<season>[A-Za-z0-9]+)e(?<episode>[A-Za-z0-9]+).*[A-Za-z0-9]\\.mp4"
上下文中
public static void main(String[] args) {
String input = "Star Wars Rebels s02e18 - The Forgotten Droid.mp4";
// From the input the following is matched: 'Star Wars Rebels s02e18 - The Forgotten Droid.mp4'
String regex = "^[A-Za-z0-9]+\\s.+?s(?<season>[A-Za-z0-9]+)e(?<episode>[A-Za-z0-9]+).*[A-Za-z0-9]\\.mp4";
// MULTILINE due to Boundary matcher '^'
Matcher matcher = Pattern.compile(regex,// or 'java.lang.IllegalStateException: No match found' will be thrown.
if(matcher.find()) {
// Output: 'Season value: 02'
System.out.println("Season value:\t" + matcher.group("season"));
// Output: 'Episode value: 18'
System.out.println("Episode value:\t" + matcher.group("episode"));
}
}
有很多方法可以匹配输入以提取季节和情节。 这是Java的“类模式”的列表,提供更多选项:
https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html