问题描述
u“联盟只能在具有兼容列类型的表上执行。map
这是架构的样子:
数据集1
root
|-- name: string (nullable = true)
|-- count: struct (nullable = true)
| |-- int: integer (nullable = true)
| |-- long: null (nullable = true)
数据集2
root
|-- name: string (nullable = true)
|-- count: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = true)
使用以下命令时,无法在DF上执行联合操作:
data= dataset1_df.union(dataset2_df)
如何解决这个问题?
已更新: 我想更改架构,例如:
数据集1
root
|-- name: string (nullable = true)
|-- count: long
DataSet2
root
|-- name: string (nullable = true)
|-- count: long
解决方法
一个简单的解决方案是将其中一个数据帧类型转换为匹配另一个数据帧,如下所示-
val df1 = spark.sql("select 'foo' name,named_struct('int',1,'long',null) count")
df1.show(false)
df1.printSchema()
/**
* +----+-----+
* |name|count|
* +----+-----+
* |foo |[1,] |
* +----+-----+
*
* root
* |-- name: string (nullable = false)
* |-- count: struct (nullable = false)
* | |-- int: integer (nullable = false)
* | |-- long: null (nullable = true)
*/
val df2 = spark.sql("select 'bar' name,map('2',3) count")
df2.show(false)
df2.printSchema()
/**
* +----+--------+
* |name|count |
* +----+--------+
* |bar |[2 -> 3]|
* +----+--------+
*
* root
* |-- name: string (nullable = false)
* |-- count: map (nullable = false)
* | |-- key: string
* | |-- value: integer (valueContainsNull = false)
*/
df1.withColumn("count",map($"count.int".cast("string"),$"count.long".cast("integer")))
.union(df2)
.show(false)
/**
* +----+--------+
* |name|count |
* +----+--------+
* |foo |[1 ->] |
* |bar |[2 -> 3]|
* +----+--------+
*/