使用split进行分割时遇到特殊字符的问题

使用split分割时:

String[] a="aa|bb|cc".split("|"output:
[a,a,
|,b,|,c,c]

先看一下split的用法

java.lang.String.split(String regex)

Splits <span style="color: #0000ff;">this<span style="color: #000000;"> string around matches of the given regular expression.

This method works as <span style="color: #0000ff;">if by invoking the two-<span style="color: #000000;">argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

The string "boo:and:foo",<span style="color: #0000ff;">for<span style="color: #000000;"> example,yields the following results with these expressions:

Regex Result
: { "boo","and","foo"<span style="color: #000000;"> }}
o { "b","",":and:f"<span style="color: #000000;"> }}

Parameters:
regex the delimiting regular expression
Returns:
the array of strings computed by splitting <span style="color: #0000ff;">this<span style="color: #000000;"> string around matches of the given regular expression
Throws:
PatternSyntaxException - <span style="color: #0000ff;">if the regular expression's Syntax is invalid
<span style="color: #000000;">Since:
1.4<span style="color: #000000;">
See Also:
java.util.regex.Pattern
@spec
JSR-51

可以看到split中参数是一个正则表达式,正则表达式中有一些特殊字符需要注意,它们有自己的用法

http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html

aracters are the Meta characters that give special meaning to the regular expression search Syntax:

<span style="color: #ff0000;">\ the backslash escape character.
The backslash gives special meaning to the character following it. For example,the combination "\n" stands <span style="color: #0000ff;">for the newline,one of the control characters. The combination "\w" stands <span style="color: #0000ff;">for a "word" character,one of the convenience escape sequences <span style="color: #0000ff;">while "\1"<span style="color: #000000;"> is one of the substitution special characters.
Example: The regex "aa\n" tries to match two consecutive "a"<span style="color: #000000;">s at the end of a line,inclusive the newline character itself.
Example: "a+" matches "a+" and not a series of one or "a"<span style="color: #000000;">s.
<span style="color: #ff0000;">^<span style="color: #000000;"> the caret is the start of line anchor or the negate symbol.
Example: "^a" matches "a"<span style="color: #000000;"> at the start of a line.
Example: "[^0-9]"<span style="color: #000000;"> matches any non digit.
<span style="color: #ff0000;">$ the dollar is the end of line anchor.
Example: "b$" matches a "b"<span style="color: #000000;"> at the end of a line.
Example: "^b$"<span style="color: #000000;"> matches the empty line.
<span style="color: #ff0000;">{ } the open and close curly bracket are used as range quantifiers.
Example: "a{2,3}" matches "aa" or "aaa"<span style="color: #000000;">.
<span style="color: #ff0000;">[ ] the open and close square bracket define a character <span style="color: #0000ff;">class<span style="color: #000000;"> to match a single character.
The "^" as the first character following the "[" negates and the match is <span style="color: #0000ff;">for the characters not listed. The "-" denotes a range of characters. Inside a "[ ]" character <span style="color: #0000ff;">class<span style="color: #000000;"> construction most special characters are interpreted as ordinary characters.
Example: "[d-f]" is the same as "[def]" and matches "d","e" or "f"<span style="color: #000000;">.
Example: "[a-z]"<span style="color: #000000;"> matches any lowercase characters in the alfabet.
Example: "[^0-9]"<span style="color: #000000;"> matches any character that is not a digit.
Example: A search <span style="color: #0000ff;">for "[][()?<>.?]" in the string "[]()?<>.?" followed by a replace string "r" has the result "rrrrrrrrrrrrr". Here the search string is one character <span style="color: #0000ff;">class<span style="color: #000000;"> and all the Meta characters are interpreted as ordinary characters without the need to escape them.
<span style="color: #ff0000;">( ) the open and close parenthesis are used <span style="color: #0000ff;">for<span style="color: #000000;"> grouping characters (or other regex).
The groups can be referenced in both the search and the substitution phase. There also exist some special constructs with parenthesis.
Example: "(ab)\1" matches "abab"<span style="color: #000000;">.
<span style="color: #ff0000;">. the dot matches any character except the newline.
Example: ".a" matches two consecutive characters where the last one is "a"<span style="color: #000000;">.
Example: "..txt$" matches all strings that end in ".txt"<span style="color: #000000;">.
<span style="color: #ff0000;">
the star is the match-zero-or-<span style="color: #000000;">more quantifier.
Example: "^.*$"<span style="color: #000000;"> matches an entire line.
<span style="color: #ff0000;">+ the plus is the match-one-or-<span style="color: #000000;">more quantifier.
? the question mark is the match-zero-or-<span style="color: #000000;">one quantifier. The question mark is also used in special constructs with parenthesis and in changing match behavIoUr.
<span style="color: #ff0000;">|<span style="color: #000000;"> the vertical pipe separates a series of alternatives.
Example: "(a|b|c)a" matches "aa" or "ba" or "ca"<span style="color: #000000;">.
<span style="color: #ff0000;">< ><span style="color: #000000;"> the smaller and greater signs are anchors that specify a left or right word boundary.
<span style="color: #ff0000;">- the minus indicates a range in a character <span style="color: #0000ff;">class (when it is not at the first position after the "[" opening bracket or the last position before the "]"<span style="color: #000000;"> closing bracket.
Example: "[A-Z]"<span style="color: #000000;"> matches any uppercase character.
Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-"<span style="color: #000000;">.
<span style="color: #ff0000;">& the and is the "substitute complete match" symbol.

那么上述方法解决方法是使用转义来分割:

String[] a="aa|bb|cc".split("\\|");

小结:

对字符串的正则操作时要注意特殊字符的转义。

相关文章

HashMap是Java中最常用的集合类框架,也是Java语言中非常典型...
在EffectiveJava中的第 36条中建议 用 EnumSet 替代位字段,...
介绍 注解是JDK1.5版本开始引入的一个特性,用于对代码进行说...
介绍 LinkedList同时实现了List接口和Deque接口,也就是说它...
介绍 TreeSet和TreeMap在Java里有着相同的实现,前者仅仅是对...
HashMap为什么线程不安全 put的不安全 由于多线程对HashMap进...