xmltodict解析/ unparse产生不产生相同的xml文件

问题描述

我正在使用以下代码片段读取下面所示的示例xmlTest1.xml文件

with open("xmlTest1.xml") as xml_file:
    xml_dict = xmltodict.parse(xml_file.read(),force_list={'team'},attr_prefix="")
    xml_file.close()
    xml_file_unparsed = xmltodict.unparse(xml_dict,pretty=True,cdata_key='#text',full_document=False,short_empty_elements=True)
    print ("Round trip xml file is: ",xml_file_unparsed)

阅读parse()之后,我先执行unparse(),然后执行未解析内容的print()。未解析的文件与原始XML文件不同。我需要unparsed()文件与输入相同。

为什么会发生这种情况,我该如何纠正??

在这里很迷路。请帮忙。

<?xml version="1.0" encoding="UTF-8"?>
<nba>
    <date year="2020" month="10" date="7" day="3"/>
    <time hour="1" minute="08" second="52" timezone="Eastern" utc-hour="-4" utc-minute="00"/>
    <version number="4"/>
    <league global-id="1" name="National Basketball Assoc." alias="NBA" display-name=""/>
    <season season="2019"/>
    <conference id="1" label="Eastern">
      <division id="1" label="Atlantic">
        <team global-id="2" id="2" city="Boston" name="Celtics" alias="Bos" arena-id="2" arena-name="TD Garden"/>
        <team global-id="17" id="17" city="brooklyn" name="Nets" alias="Bkn" arena-id="10615" arena-name="Barclays Center"/>
        <team global-id="18" id="18" city="New York" name="Knicks" alias="NY" arena-id="18" arena-name="Madison Square Garden"/>
        <team global-id="20" id="20" city="Philadelphia" name="76ers" alias="Phi" arena-id="20" arena-name="Wells Fargo Center"/>
        <team global-id="28" id="28" city="Toronto" name="Raptors" alias="Tor" arena-id="28" arena-name="Scotiabank Arena"/>
      </division>
    </conference>
 </nba> 

Output of unparse():
<nba>
        <date>
                <year>2020</year>
                <month>10</month>
                <date>7</date>
                <day>3</day>
        </date>
        <time>
                <hour>1</hour>
                <minute>08</minute>
                <second>52</second>
                <timezone>Eastern</timezone>
                <utc-hour>-4</utc-hour>
                <utc-minute>00</utc-minute>
        </time>
        <version>
                <number>4</number>
        </version>
        <league>
                <global-id>1</global-id>
                <name>National Basketball Assoc.</name>
                <alias>NBA</alias>
                <display-name/>
        </league>
        <season>
                <season>2019</season>
        </season>
        <conference>
                <id>1</id>
                <label>Eastern</label>
                <division>
                        <id>1</id>
                        <label>Atlantic</label>
                        <team>
                                <global-id>2</global-id>
                                <id>2</id>
                                <city>Boston</city>
                                <name>Celtics</name>
                                <alias>Bos</alias>
                                <arena-id>2</arena-id>
                                <arena-name>TD Garden</arena-name>
                        </team>
                        <team>
                                <global-id>17</global-id>
                                <id>17</id>
                                <city>brooklyn</city>
                                <name>Nets</name>
                                <alias>Bkn</alias>
                                <arena-id>10615</arena-id>
                                <arena-name>Barclays Center</arena-name>
                        </team>
                        <team>
                                <global-id>18</global-id>
                                <id>18</id>
                                <city>New York</city>
                                <name>Knicks</name>
                                <alias>NY</alias>
                                <arena-id>18</arena-id>
                                <arena-name>Madison Square Garden</arena-name>
                        </team>
                        <team>
                                <global-id>20</global-id>
                                <id>20</id>
                                <city>Philadelphia</city>
                                <name>76ers</name>
                                <alias>Phi</alias>
                                <arena-id>20</arena-id>
                                <arena-name>Wells Fargo Center</arena-name>
                        </team>
                        <team>
                                <global-id>28</global-id>
                                <id>28</id>
                                <city>Toronto</city>
                                <name>Raptors</name>
                                <alias>Tor</alias>
                                <arena-id>28</arena-id>
                                <arena-name>Scotiabank Arena</arena-name>
                        </team>
                </division>
        </conference>
</nba>

解决方法

此处设置了attr_prefix="",因此xmltodict调用unparse时无法识别属性(默认设置为attr_prefix="@"

只需将其删除,就可以了:

with open("xmlTest1.xml") as xml_file:
xml_dict = xmltodict.parse(xml_file.read(),force_list={'team'})
xml_file.close()
xml_file_unparsed = xmltodict.unparse(xml_dict,pretty=True,cdata_key='#text',full_document=False,short_empty_elements=True)
print ("Round trip xml file is: ",xml_file_unparsed)