使用 SAX Parser Java 正确构建字符串

问题描述

我正在尝试读取结构未知的 XML 文件。这可能是一个文件

<S:Envelope xmlns:S="http://anamespace">envelopestart
    <S:Body>bodyStart
        <ns2:getNextResponse xmlns:ns2="http://anothernamespace">getNextResponseStart
            <nextValue>9</nextValue>
        getNextResponseEnd</ns2:getNextResponse>
    bodyEnd</S:Body>
envelopeEnd</S:Envelope>

这是我实际使用的处理程序:

DefaultHandler handler = new DefaultHandler() {
    StringBuilder builder;
    Map<String,String> values = new HashMap<String,String>();
    
    @Override
    public void startElement(String uri,String localName,String qName,Attributes attributes) throws SAXException {
        builder = new StringBuilder();
    }

    @Override
    public void characters(char ch[],int start,int length) throws SAXException {
        builder.append(new String(ch,start,length));
    }

    @Override
    public void endElement(String uti,String qName) throws SAXException {
        values.put(localName,builder.toString());
        builder.setLength(0);
    }
}

我面临的问题是,如果我为每个被解析的新标签实例化一个新的 builder,我会丢失我迄今为止阅读的所有开始文本(假设 characters方法在一次调用中返回所有字符):

new Builder for the Envelope tag
reading characters: envelopestart
new Builder for the Body tag
reading characters: bodyStart
...
new Builder for the nextValue tag <- this is the last reference to the builder that I have to use from Now on
reading characters: 9
endElement: saving to Map ('nextValue','9') and resetting length of the last builder instantiated 
reading characters: getNextResponseEnd
endElement: saving to Map ('getNextResponse','getNextResponseEnd') and resetting length of the last builder instantiated
...

在这种情况下,values HashMap 将具有以下值:

nextValue=9
getNextResponse=getNextResponseEnd (missing getNextResponseStart)
body=bodyEnd (missing bodyStart)
envelope=envelopeEnd (missing envelopestart)

有没有办法在地图中保存每个标签的开始和结束字符串?

解决方法

只需保留一堆 StringBuilder:

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;
import java.util.Stack;

public class Example {
    public static void main(String... args) throws ParserConfigurationException,SAXException,IOException {
        Map<String,String> values = new HashMap<String,String>();

        DefaultHandler handler = new DefaultHandler() {
            Stack<StringBuilder> builders = new Stack<>();

            @Override
            public void startElement(String uri,String localName,String qName,Attributes attributes) throws SAXException {
                builders.push(new StringBuilder());
            }

            @Override
            public void characters(char ch[],int start,int length) throws SAXException {
                builders.peek().append(new String(ch,start,length));
            }

            @Override
            public void endElement(String uti,String qName) throws SAXException {
                values.put(localName,builders.peek().toString());
                builders.pop();
            }
        };

        String xml = "<S:Envelope xmlns:S=\"http://anamespace\">envelopeStart\n" +
                     "    <S:Body>bodyStart\n" +
                     "        <ns2:getNextResponse xmlns:ns2=\"http://anothernamespace\">getNextResponseStart\n" +
                     "            <nextValue>9</nextValue>\n" +
                     "        getNextResponseEnd</ns2:getNextResponse>\n" +
                     "    bodyEnd</S:Body>\n" +
                     "envelopeEnd</S:Envelope>";
        SAXParserFactory spf = SAXParserFactory.newInstance();
        spf.setNamespaceAware(true);
        SAXParser saxParser = spf.newSAXParser();
        XMLReader xmlReader = saxParser.getXMLReader();
        xmlReader.setContentHandler(handler);
        xmlReader.parse(new InputSource(new StringReader(xml)));
    }
}

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...