问题描述
This does not work for all cases and should not be used to process untrusted user input.
using System.Text.RegularExpressions;
...
const string HTML_TAG_PATTERN = "<.*?>";
static string StripHTML (string inputString)
{
return Regex.Replace
(inputString, HTML_TAG_PATTERN, string.Empty);
}
解决方法
如何从以下字符串中删除 HTML 标签?
<P style="MARGIN: 0cm 0cm 10pt" class=MsoNormal><SPAN style="LINE-HEIGHT: 115%;
FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #333333; FONT-SIZE: 9pt">In an
email sent just three days before the Deepwater Horizon exploded,the onshore
<SPAN style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> manager in charge of
the drilling rig warned his supervisor that last-minute procedural changes were
creating "chaos". April emails were given to government investigators by <SPAN
style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> and reviewed by The Wall
Street Journal and are the most direct evidence yet that workers on the rig
were unhappy with the numerous changes,and had voiced their concerns to <SPAN
style="mso-bidi-font-weight: bold"><b>BP</b></SPAN>’s operations managers in
Houston. This raises further questions about whether <SPAN
style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> managers properly
considered the consequences of changes they ordered on the rig,an issue
investigators say contributed to the disaster.</SPAN></p><br/>
我正在将其写入 Asponse.PDF,但 HTML 标记显示在 PDF 中。我怎样才能删除它们?