如何将JSON数组反序列化为Apache Beam PCollection <javaObject>

问题描述

我有类似的数据

<body>
<div id="main">
<p>The Phoenix Suns are a professional basketball team based in Phoenix,Arizona. They are members of the ...</p>
<p>The Suns have been generally successful since they began play as an
 expansion team in 1968. In forty years of play they have posted ...</p>
<p>On January 22,1968,the NBA awarded expansion franchises to an ownership 
group from Phoenix and one from Milwaukee. ...</p>
<ul>
    <li>Richard L. Bloch,investment broker/real estate developer...</li> 
    <li>Karl Eller,outdoor advertising company owner and former...</li>
    <li>Donald Pitt,Tucson-based attorney;</li>
    <li>Don Diamond,Tucson-based real estate investor.</li>
</ul>
</div>

<p>
Page by Marty Stepp. <br />
Some (all) information taken from Wikipedia.
</p>
<hr />

<div>
Search for text:
<input id="searchtext" type="text"  /> 
<button id="searchbutton" onClick="count_search()">Search</button>
</div>
<script>
  function count_search(event){
      span = document.createElement('span');
      span.setAttribute("id","output");
      document.body.appendChild(span);
      
      var searchPhrase = document.querySelector("#searchtext").value;
      searchPhrase = new RegExp(searchPhrase,'g');
      var main = document.querySelector("#main");
      let count = 0;
      var mainParas = main.querySelectorAll("p").forEach(ele => {
       
        const times = ele.innerHTML.match(searchPhrase);
        count += times ? times.length : 0;
      });
      span.innerHTML = count;
      
  }
</script>
</body>

我需要将其反序列化为Java对象,然后使用以下代码

[{"ProjectId":1476401625,"ProjectName":"This is project name","ProjectPostcode":4178},{"ProjectId":2343,"ProjectName":"This is project 2 name","ProjectPostcode":5323}]

但是我总是出错

PCollection<Project> deserialisedProjectObject = projectFile.apply("Deserialize Projects",ParseJsons.of(Project.class))
        .setCoder(SerializableCoder.of(Project.class));

如果我将代码更改为:

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to parse a com.lendlease.dp.entity.Project from JSON value: [{"ProjectId":1476401625,"ProjectPostcode":5323}]

能够反序列化的跑步者,但我需要这一行来返回Project的集合;不是项目数组的集合

解决方法

您从Project []对象开始,因此解析是正确的。要从该对象提取Project对象,只需在ParseJson之后应用FlatMap转换,然后输出Array中的元素即可。

以及ParseJson,您可能还需要看一下:

JsonToRow

此输出是一个Row对象,您可以将其用作schema,它提供许多不错的功能,请参见using schemas。如果您需要管道中的实际POJO以及Row对象,则可以使用Convert.fromRow将其变成Pojo对象。