问题描述
我要使用几个步骤来生成XML,最后,由于XML的复杂性(嵌套在嵌套内,嵌套在嵌套内),我不得不使用文本文件输出步骤,只需将“扩展名”选项更改为“ .xml”。
问题是我正在插入一个格式良好的XML的单行.xml文件; 如果我将这一行复制并粘贴到在线xmlFormatter中,则效果很好。
是否可以将一个行文件读取为String并将其更改为格式良好的XML文件?
谢谢。
解决方法
我建议使用用户定义的Java类步骤,并编写自己的代码,以将XML单行代码转换为打印精美的版本。 Pentaho已经带有用于XML操作的各种库JAR,您可以直接使用它们。
这是我编写的测试转换的样子:
生成XML字符串在'xml'字段中写入包含XML单线字符串的一行。
格式XML 在“处理器”标签下包含以下代码:
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringWriter;
import java.io.ByteArrayInputStream;
public static String toPrettyString(String xml) {
try {
// Turn xml string into a document
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
// Remove whitespaces outside tags
document.normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",document,XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
// Setup pretty print options
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");
transformer.setOutputProperty(OutputKeys.INDENT,"yes");
// Return pretty print xml string
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(document),new StreamResult(stringWriter));
return stringWriter.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public boolean processRow(StepMetaInterface smi,StepDataInterface sdi) throws KettleException
{
// First,get a row from the default input hop
Object[] r = getRow();
// If the row object is null,we are done processing.
if (r == null) {
setOutputDone();
return false;
}
// Init output row
Object[] outputRow = createOutputRow(r,data.outputRowMeta.size());
// Getting fields
String xml = get(Fields.In,"xml").getString(r);
// Init error handling
boolean rowInError = false;
String errMsg = "";
int errCnt = 0;
// Init Output
String xml_pretty = "";
// Put the xml in pretty format
try{
xml_pretty = toPrettyString(xml);
}
catch (Exception ex) {
errMsg = ex.getMessage();
errCnt++;
rowInError = true;
}
// Set the value in the output field
//
get(Fields.Out,"result").setValue(outputRow,true);
get(Fields.Out,"xml_pretty").setValue(outputRow,xml_pretty);
if ( !rowInError ) {
// putRow will send the row on to the default output hop.
//
putRow(data.outputRowMeta,outputRow);
}
else {
// putError will send the row on to the error hop.
//
get(Fields.Out,false);
get(Fields.Out,"");
putError(data.outputRowMeta,outputRow,errCnt,errMsg,"","ERR_0");
}
return true;
}
toPrettyString(String xml)的实现由您决定。在这里,我使用了this SO answer中的代码。您还必须在步骤的“字段”标签下定义输出字段(通过提供其名称和类型)。
以上代码已在Pentaho 8.3.0.10上使用Spoon / PDI客户端进行了测试