如何使用Pentaho格式化XML

问题描述

我要使用几个步骤来生成XML,最后,由于XML的复杂性(嵌套在嵌套内,嵌套在嵌套内),我不得不使用文本文件输出步骤,只需将“扩展名”选项更改为“ .xml”。

问题是我正在插入一个格式良好的XML的单行.xml文件; 如果我将这一行复制并粘贴到在线xmlFormatter中,则效果很好。

是否可以将一个文件读取为String并将其更改为格式良好的XML文件

获得:

obtained XML

假装:

pretended XML

谢谢。

解决方法

我建议使用用户定义的Java类步骤,并编写自己的代码,以将XML单行代码转换为打印精美的版本。 Pentaho已经带有用于XML操作的各种库JAR,您可以直接使用它们。

这是我编写的测试转换的样子:

enter image description here

生成XML字符串在'xml'字段中写入包含XML单线字符串的一行。

格式XML 在“处理器”标签下包含以下代码:

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringWriter;
import java.io.ByteArrayInputStream;

public static String toPrettyString(String xml) {
    try {
        // Turn xml string into a document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder()
                .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

        // Remove whitespaces outside tags
        document.normalize();
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",document,XPathConstants.NODESET);

        for (int i = 0; i < nodeList.getLength(); ++i) {
            Node node = nodeList.item(i);
            node.getParentNode().removeChild(node);
        }

        // Setup pretty print options
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");
        transformer.setOutputProperty(OutputKeys.INDENT,"yes");

        // Return pretty print xml string
        StringWriter stringWriter = new StringWriter();
        transformer.transform(new DOMSource(document),new StreamResult(stringWriter));
        return stringWriter.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}


public boolean processRow(StepMetaInterface smi,StepDataInterface sdi) throws KettleException
{
 
    // First,get a row from the default input hop
    Object[] r = getRow();
 
    // If the row object is null,we are done processing.
    if (r == null) {
        setOutputDone();
        return false;
    }

    // Init output row
    Object[] outputRow = createOutputRow(r,data.outputRowMeta.size());
 
    // Getting fields
    String xml = get(Fields.In,"xml").getString(r);
    
    // Init error handling  
    boolean rowInError = false;
    String errMsg = "";
    int errCnt = 0;

    // Init Output
    String xml_pretty = "";

    // Put the xml in pretty format
    try{
        xml_pretty = toPrettyString(xml);
    }
    catch (Exception ex) {
        errMsg = ex.getMessage();
        errCnt++;
        rowInError = true;
    }

    // Set the value in the output field
    //
    get(Fields.Out,"result").setValue(outputRow,true);
    get(Fields.Out,"xml_pretty").setValue(outputRow,xml_pretty);

    if ( !rowInError ) {
        // putRow will send the row on to the default output hop.
        //
        putRow(data.outputRowMeta,outputRow);
    }
    else {
        // putError will send the row on to the error hop.
        //
        get(Fields.Out,false);
        get(Fields.Out,"");
        putError(data.outputRowMeta,outputRow,errCnt,errMsg,"","ERR_0");
    }

    return true;
}

toPrettyString(String xml)的实现由您决定。在这里,我使用了this SO answer中的代码。您还必须在步骤的“字段”标签下定义输出字段(通过提供其名称和类型)。

以上代码已在Pentaho 8.3.0.10上使用Spoon / PDI客户端进行了测试