问题描述
我试图演示在 XML 中查找/替换 CDATA 文本字符串内容的功能,类似于相关问题 (Easiest way to parse a Lua datastructure in C# / .Net) 中提出的目标。我试图用 XML 的 CDATA 部分中名为“New Building”的新字符串替换字符串“Building in Éclépens,Switzerland”,但我似乎无法正确引用第一个字符串。理想情况下,我希望能够通过索引来查找/替换此字符串,而不必将字符串名称硬编码为变量。 CDATA 表达式本身是正确的并且支持注释,但我什至无法展示如何使用简单的打印语句引用这个 CDATA 字符串。下面是 XML,以及我正在使用的脚本和要添加到所需输出 XML 的新字符串:
XML(“foo_bar_CDATA.xml”):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
Building in Éclépens,Switzerland
</p>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
脚本(“foo_bar_CDATA.xml”):
import lxml.etree as ET
xml = ET.parse("C:\\Users\\mdl518\\Desktop\\bar_foo_CDATA.xml")
tree=xml.getroot()
cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building' # new string to be included in the XML
# Find and replace the CDATA string with "val_1"
for elem in tree.getiterator():
if elem.text:
elem.text=elem.text.replace('Building in Éclépens,Switzerland ',val_1)
output = ET.tostring(tree,encoding="UTF-8",method="xml",xml_declaration=True,pretty_print=True)
print(output.decode("utf-8"))
所需的输出 XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
New Building
</p>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
当我运行上面的脚本时,我没有对感兴趣的字符串进行所需的更改,并且在 XML 的可打印视图中没有保留打开/关闭标签(显示为 < 和 >)。我觉得正确的解决方案可能只需要一些小的调整,非常感谢任何帮助!
解决方法
你有elem.text=elem.text.replace('Building in Éclépens,Switzerland ',val_1)
改为使用此 elem.text=elem.text.replace('Building in Éclépens,Switzerland',val_1)
。
我已删除空间。
import lxml.etree as ET
xml = ET.parse("/home/cam/out.xml")
tree=xml.getroot()
cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
#print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building' # new string to be included in the XML
# Find and replace the CDATA string with "val_1"
for elem in tree.iter():
if "description" in elem.tag:
elem.text=elem.text.replace('Building in Éclépens,val_1)
elem.text = '![CDATA[' + elem.text + ']]'
root_str = ET.tostring(tree)
root_str = str(root_str.decode('utf-8').replace('<','<').replace('>','>').replace('\\n',''))
print(root_str)
输出:
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<p><span style="font-weight:italic">Dataset:</span>
New Building
</p>
</div>
</div>
</body>
</head>
</html>
]]</description>
</Overlay></kml>