如何在 Python 中按属性数字前的字符串对 XML 进行排序

问题描述

我想通过“entry”标签属性“value”对下面的xml进行排序,并在数字之前对字符串(字母)进行排序。

<test>
    <entry value="-12" />
    <entry value="0" />
    <entry value="043" />
    <entry value="14" />
    <entry value="6" />
    <entry value="_null" />
    <entry value="abc" />
    <entry value="abcd" />
    <entry value="empty" />
    <entry value="false" />
    <entry value="test1" />
    <entry value="test2" />
    <entry value="true" />
</test>

我写了一些 Python 来对这个 xml 进行排序,但它首先对数字进行排序,然后对字符串进行排序。 我已经检查了这个 thread,但无法实现任何排序 XML 的解决方案。

import xml.etree.ElementTree as ElT
import os
from os.path import sep

def sort_xml(directory,xml_file,level1_tag,attribute,mode=0):
    #mode 0 - numbers before letters
    #mode 1 - letters before numbers

    file = directory + sep + xml_file

    tree = ElT.parse(file)
    data = tree.getroot()
    els = data.findall(level1_tag)
    
    if mode == 0:
        new_els = sorted(els,key=lambda e: (e.tag,e.attrib[attribute]))
    if mode == 1:
        new_els = sorted(els,key=lambda e: (isinstance(e.tag,(float,int)),e.attrib[attribute]))

    for el in new_els:
        if mode == 0:
            el[:] = sorted(el,e.attrib[attribute]))
        if mode == 1:
            el[:] = sorted(el,e.attrib[attribute]))
    
    data[:] = new_els

    tree.write(file,xml_declaration=True,encoding='utf-8')

    with open(file,'r') as fin:
        data = fin.read().splitlines(True)
    with open(file,'w') as fout:
        fout.writelines(data[1:])
        
        
sort_xml(os.getcwd(),"test.xml","entry","value",1)

知道如何做到这一点吗?

Edit1:所需的输出

<test>
    <entry value="_null" />
    <entry value="abc" />
    <entry value="abcd" />
    <entry value="empty" />
    <entry value="false" />
    <entry value="test1" />
    <entry value="test2" />
    <entry value="true" />
    <entry value="-12" />
    <entry value="0" />
    <entry value="043" />
    <entry value="14" />
    <entry value="6" />
</test>

解决方法

我把字母开始的部分放在最上面。这是顶部有字母的实际要求,我不关心其余的。

下面

 import xml.etree.ElementTree as ET

xml = '''<test>
    <entry value="-12" />
    <entry value="/this" />
    <entry value="0" />
    <entry value="043" />
    <entry value="14" />
    <entry value="6" />
    <entry value="_null" />
    <entry value="abc" />
    <entry value="abcd" />
    <entry value="empty" />
    <entry value="false" />
    <entry value="test1" />
    <entry value="test2" />
    <entry value="true" />
</test>'''

root = ET.fromstring(xml)
numeric = []
non_numeric = []
for entry in root.findall('.//entry'):
    try:
        x = int(entry.attrib['value'])
        numeric.append((x,entry.attrib['value']))
    except ValueError as e:
        non_numeric.append(entry.attrib['value'])

sorted(numeric,key=lambda x: x[0])
sorted(non_numeric)

root = ET.Element('test')
for value in non_numeric:
    entry = ET.SubElement(root,'entry')
    entry.attrib['value'] = value
for value in numeric:
    entry = ET.SubElement(root,'entry')
    entry.attrib['value'] = str(value[1])
ET.dump(root)

输出

 <?xml version="1.0" encoding="UTF-8"?>
<test>
   <entry value="/this" />
   <entry value="_null" />
   <entry value="abc" />
   <entry value="abcd" />
   <entry value="empty" />
   <entry value="false" />
   <entry value="test1" />
   <entry value="test2" />
   <entry value="true" />
   <entry value="-12" />
   <entry value="0" />
   <entry value="043" />
   <entry value="14" />
   <entry value="6" />
</test>
,

我认为您的问题是在排序时您正在检查值是 int 还是 float。事实上,所有的值都是字符串,例如isinstance(e.tag,(float,int)) 将始终为假。

这样的排序功能可以满足您的需求

def sorter(x):
    "Check if the value can be interpreted as an integer,then by the string"
    value = x.get("value") 
    def is_integer(i):
        try:
            int(i)
        except ValueError:
            return False
        return True
    return is_integer(value),value

可以这样使用(使用 StringIO 作为文件的替代品)

from xml.etree import ElementTree
from io import StringIO

xml = """<test>
    <entry value="-12" />
    <entry value="0" />
    <entry value="043" />
    <entry value="14" />
    <entry value="6" />
    <entry value="_null" />
    <entry value="abc" />
    <entry value="abcd" />
    <entry value="empty" />
    <entry value="false" />
    <entry value="test1" />
    <entry value="test2" />
    <entry value="true" />
</test>"""

tree = ElementTree.parse(StringIO(xml))
root = tree.getroot()
root[:] = sorted(root,key=sorter)
tree.write("output.xml")

output.xml 的内容是

<test>
    <entry value="_null" />
    <entry value="abc" />
    <entry value="abcd" />
    <entry value="empty" />
    <entry value="false" />
    <entry value="test1" />
    <entry value="test2" />
    <entry value="true" />
    <entry value="-12" />
    <entry value="0" />
    <entry value="043" />
    <entry value="14" />
    <entry value="6" />
</test>