问题描述
我正在解析一些清单文件,需要清理它们,然后才能将它们加载为XML。因此,这些文件是无效的XML文件。
请考虑以下代码段:
<assemblyIdentity name=""Microsoft.Windows.Shell.DevicePairingFolder"" processorArchitecture=""amd64"" version=""5.1.0.0"" type="win32" />
有几个双引号""
的实例,我想用单引号"
代替。
本质上,示例将转换为
<assemblyIdentity name="Microsoft.Windows.Shell.DevicePairingFolder" processorArchitecture="amd64" version="5.1.0.0" type="win32" />
我认为正则表达式将是最好的方法,但这不是我的强项。
应注意以下几点:
- 清单是多行字符串(本质上只是XML文档)
- 在文档中类似
processorArchitecture=""
的东西是有效的,因此为什么不宜使用简单的string.Replace
调用。
解决方法
使用
(\w+=)""(.*?)""(?=\s+\w+=|$)
替换为$1"$2"
。参见proof。
说明
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z,A-Z,0-9,_) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
"" '""'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
"" '""'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s+ whitespace (\n,\r,\t,\f,and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z,_) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n,and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(\w+=)""""(.*?)""""(?=\s+\w+=|$)";
string substitution = @"$1""$2""";
string input = @"<assemblyIdentity name=""""Microsoft.Windows.Shell.DevicePairingFolder"""" processorArchitecture=""""amd64"""" version=""""5.1.0.0"""" type=""win32"" />";
Regex regex = new Regex(pattern);
string result = regex.Replace(input,substitution);
Console.Write(result);
}
}
,
两种方式:
- 字符串替换
var newString = s.Replace("\"\"","\"");
- 正则表达式。
string checkStringForDoubleQuotes = @"""";
string newString = Regex.Replace(s,checkStringForDoubleQuotes,@""");
更新后:
您的正则表达式是https://regex101.com/r/xZUtUf/1/
""(?=\w)|(?<=\w)""
string s = "test=\"\" test2=\"\"assdasad\"\"";
string checkStringForDoubleQuotes = "\"\"(?=\\w)|(?<=\\w)\"\"";
string newString = Regex.Replace(s,"\"");
Console.WriteLine(newString);
// test="" test2="assdasad"
https://dotnetfiddle.net/FmWXUa
,使用十六进制转义符将引号用作\x22
,以便于使用。这会将每个连续的""
替换为"
。
Regex.Replace(data,@"(\x22\x22)","\x22")