如何从Spark Table中的所有列中消除元数据? Java

问题描述

我有一个数据框df,其中有四列idtslatlon。如果我在调试模式下运行df.schema(),我会得到

 0 = {StructField@13126} "StructField(id,LongType,true)"
  name = "id"
  dataType = {LongType$@12993} "LongType"
  nullable = true
  Metadata = {Metadata@13065} "{"encoding":"UTF-8"}"
 1 = {StructField@13127} "StructField(ts,true)"
  name = "timestamp"
  dataType = {LongType$@12993} "LongType"
  nullable = true
  Metadata = {Metadata@13069} "{"encoding":"UTF-8"}"
 2 = {StructField@13128} "StructField(lat,DoubleType,true)"
  name = "position_lat"
  dataType = {DoubleType$@13034} "DoubleType"
  nullable = true
  Metadata = {Metadata@13073} "{"encoding":"UTF-8"}"
 3 = {StructField@13129} "StructField(lon,true)"
  name = "position_lon"
  dataType = {DoubleType$@13034} "DoubleType"
  nullable = true
  Metadata = {Metadata@13076} "{"encoding":"UTF-8"}"

现在,我想摆脱所有元数据,即每列的"{"encoding":"ZSTD"}"应该被""取代。请注意,我的实际表有很多列,因此解决方案需要有点通用。预先谢谢你!

解决方法

您可以使用encode(“ XX”,“ ignore”)

示例:

  Val df=data.map(lambda x: x.encode("ascii","ignore").