在雅典娜/ presto中将地图值作为单独的列嵌套

问题描述

我的问题与此类似(Athena/Presto - UNNEST MAP to columns)。但就我而言,我知道我之前需要哪些列。

我的用例是这个

我有一个json blob,其中包含以下结构

{
  "reqId" : "1234","clientId" : "client","response" : [
                 {
                   "name" : "Susan","projects" : [
                       {
                          "name" : "project1","completed" : true
                       },{
                          "name" : "project2","completed" : false
                       }
                   ]
                 },{
                   "name" : "Adams","completed" : false
                       }
                   ]
                 }
               ]
}

我需要创建一个视图,该视图将返回类似这样的输出

    name  |  project    |  Completed |
----------+-------------+------------+
    Susan |  project1   |   true     |
    Susan |  project2   |   false    |
    Adams |  project1   |   true     |
    Adams |  project2   |   false    |

我尝试了以下方法和其他方法。这是我能得到的最接近的

WITH dataset AS (
  SELECT 'Susan' as name,transform(filter(CAST(json_extract('{
           "projects": [{"name":"project1","completed":false},{"name":"project3",{"name":"project2","completed":true}]}','$.projects') AS ARRAY<MAP<VARCHAR,VARCHAR>>),p -> (p['name'] != 'project1')),p -> ROW(map_values(p))) AS projects
)
SELECT * from dataset
CROSS JOIN UNnesT(projects)

这是我得到的输出


    name    projects                                                        _col2
1   Susan   [{field0=[project3,false]},{field0=[project2,true]}] {field0=[project3,false]}
2   Susan   [{field0=[project3,true]}] {field0=[project2,true]}

我基本上想将地图的键值对取消嵌套为单独的列。如何在presto / Athena中做到这一点?

解决方法

您的JSON示例似乎无效,它在,"name" : "Susan"之后遗漏了"name" : "Adams"。除此之外,您可以通过此查询实现预期的输出,您需要两次UNNEST,还需要进行一些强制转换:

with dataset as
(
    select json_parse('{"reqId" : "1234","clientId" : "client","response" : [{"name" : "Susan","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]},{"name" : "Adams","completed" : false}]}]}') as json_col
),unnest_response as
(
    select * 
    from dataset
    cross join UNNEST(cast(json_extract(json_col,'$.response') as array<JSON>)) as t (response)
)
select 
json_extract_scalar(response,'$.name') name,json_extract_scalar(project,'$.name') project_name,'$.completed') project_completed
from unnest_response
cross join UNNEST(cast(json_extract(response,'$.projects') as array<JSON>)) as t (project);