谷歌数据融合xml解析-'parse-xml-to-json':关闭标记注释在6

问题描述

我是Google Cloud Data Fusion的新手。我能够成功处理CSV文件并将其加载到BigQuery中。 我的要求是处理XML文件并加载到BigQuery中。要尝试,我只是采用了非常简单的XML

XML文件:

{<?xml version="1.0" encoding="UTF-8"?> <note> <to>Tove</to <from>Jani</from>  <heading>Reminder</heading>  <body>Don't forget me this weekend!</body> </note> }

错误消息1

java.lang.Exception: Stage:Wrangler - Reached error threshold 1,terminating processing due to error : Error encountered while executing 'parse-xml-to-json' : Mismatched close tag note at 6 [character 7 line 1]
at io.cdap.wrangler.Wrangler.transform(Wrangler.java:404) ~[1601903767453-0/:na]
at io.cdap.wrangler.Wrangler.transform(Wrangler.java:83) ~[1601903767453-0/:na]
at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$transform$5(WrappedTransform.java:90) ~[cdap-etl-core-6.2.0.jar:na]
at io.cdap.cdap.etl.common.plugin.Caller$1.call(Caller.java:30) ~[cdap-etl-core-6.2.0.jar:na]
at io.cdap.cdap.etl.common.plugin.StageLoggingCaller.call(StageLoggingCaller.java:40) ~[cdap-etl-core-6.2.0.jar:na]
at io.cdap.cdap.etl.common.plugin.WrappedTransform.transform(WrappedTransform.java:89) ~[cdap-etl-core-6.2.0.jar:na]
at io.cdap.cdap.etl.common.TrackedTransform.transform(TrackedTransform.java:74) ~[cdap-etl-core-6.2.0.jar:na]
at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:50) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
at io.cdap.cdap.etl.spark.Compat$FlatMapAdapter.call(Compat.java:126) ~[hydrator-spark-core2_2.11-6.2.0.jar:na]
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:128) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415) ~[spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.scheduler.Task.run(Task.scala:109) [spark-core_2.11-2.3.3.jar:2.3.3]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) [spark-core_2.11-2.3.3.jar:2.3.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_252]

原因:io.cdap.wrangler.api.RecipeException:执行“ parse-xml-to-json”时遇到错误:关闭标记注释在6 [字符7第1行]不匹配 在io.cdap.wrangler.executor.RecipePipelineExecutor.execute(RecipePipelineExecutor.java:149)〜[wrangler-core-4.2.0.jar:na] 在io.cdap.wrangler.executor.RecipePipelineExecutor.execute(RecipePipelineExecutor.java:97)〜[wrangler-core-4.2.0.jar:na] 在io.cdap.wrangler.Wrangler.transform(Wrangler.java:376)〜[1601903767453-0 /:na] ...省略了26个通用框架 由以下原因引起:io.cdap.wrangler.api.DirectiveExecutionException:执行“ parse-xml-to-json”时遇到错误:关闭标记注释在6 [字符7第1行]不匹配 在io.cdap.directives.xml.XmlToJson.execute(XmlToJson.java:106)〜[na:na] 在io.cdap.directives.xml.XmlToJson.execute(XmlToJson.java:49)〜[na:na] 在io.cdap.wrangler.executor.RecipePipelineExecutor.execute(RecipePipelineExecutor.java:129)〜[wrangler-core-4.2.0.jar:na] ...省略28个通用框架 由以下原因引起:org.json.JSONException:6 [字符7第1行]的关闭标记注释不匹配 在org.json.JSONTokener.syntaxError(JSONTokener.java:505)〜[org.json.json-20090211.jar:na] 在org.json.XML.parse(XML.java:311)〜[org.json.json-20090211.jar:na] 在org.json.XML.toJSONObject(XML.java:520)〜[org.json.json-20090211.jar:na] 在org.json.XML.toJSONObject(XML.java:548)〜[org.json.json-20090211.jar:na] 在org.json.XML.toJSONObject(XML.java:472)〜[org.json.json-20090211.jar:na] 在io.cdap.directives.xml.XmlToJson.execute(XmlToJson.java:96)〜[na:na] ...省略了30个普通框架

错误消息2:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times,most recent failure: Lost task 0.0 in stage 0.0 (TID 0,localhost,executor driver): UnknownReason

驱动程序堆栈跟踪: 在org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1661)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1649)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1648)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)〜[scala-library-2.11.8.jar:na] 在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)〜[scala-library-2.11.8.jar:na] 在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1648)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:831)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:831)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在scala.Option.foreach(Option.scala:257)〜[scala-library-2.11.8.jar:na] 在org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1882)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1831)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1820)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)〜[na:2.3.3] 在org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)〜[na:2.3.3] 在org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)〜[na:2.3.3] 在org.apache.spark.internal.io.SparkHadoopWriter $ .write(SparkHadoopWriter.scala:78)〜[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply $ mcV $ sp(PairRDDFunctions.scala:1083)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply(PairRDDFunctions.scala:1081)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply(PairRDDFunctions.scala:1081)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.RDD.withScope(RDD.scala:363)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)[spark-core_2.11-2.3.3.jar:2.3.3] 在org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831)[spark-core_2.11-2.3.3.jar:2.3.3] 在io.cdap.cdap.etl.spark.batch.SparkBatchSinkFactory.writeFromRDD(SparkBatchSinkFactory.java:98)[hydrator-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.etl.spark.batch.RDDCollection $ 1.run(RDDCollection.java:179)[hydrator-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.etl.spark.SparkPipelineRunner.runPipeline(SparkPipelineRunner.java:350)上[hydrator-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:148)[hydrator-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SparkTransactional $ 2.run(SparkTransactional.java:236)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:208)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:138)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.AbstractSparkExecutionContext.execute(AbstractSparkExecutionContext.scala:228)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SerializableSparkExecutionContext.execute(SerializableSparkExecutionContext.scala:61)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.DefaultJavaSparkExecutionContext.execute(DefaultJavaSparkExecutionContext.scala:89)处[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.api.Transactionals.execute(Transactionals.java:63)[na:na] 在io.cdap.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:116)[hydrator-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SparkMainWrapper $ .main(SparkMainWrapper.scala:86)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.SparkMainWrapper.main(SparkMainWrapper.scala)处[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)〜[na:1.8.0_252] 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)〜[na:1.8.0_252] 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)〜[na:1.8.0_252] 在java.lang.reflect.Method.invoke(Method.java:498)〜[na:1.8.0_252] 在org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:56)上[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:2.3.3] 在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:894)[na:2.3.3] 在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:198)[na:2.3.3] 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:228)[na:2.3.3] 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:137)[na:2.3.3] 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)上[spark-core_2.11-2.3.3.jar:2.3.3] 在io.cdap.cdap.app.runtime.spark.submit.AbstractSparkSubmitter.submit(AbstractSparkSubmitter.java:172)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.submit.AbstractSparkSubmitter.access $ 000(AbstractSparkSubmitter.java:54)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在io.cdap.cdap.app.runtime.spark.submit.AbstractSparkSubmitter $ 5.run(AbstractSparkSubmitter.java:111)[io.cdap.cdap.cdap-spark-core2_2.11-6.2.0.jar:na] 在java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)[na:1.8.0_252] 在java.util.concurrent.FutureTask.run(FutureTask.java:266)[na:1.8.0_252] 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[na:1.8.0_252] 在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)[na:1.8.0_252] 在java.lang.Thread.run(Thread.java:748)[na:1.8.0_252]

解决方法

您的<?xml version="1.0" encoding="UTF-8"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> 似乎不正确。 尝试使用以下XML:

{{1}}

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...