取消存储在列中的 JSON 字符串 [BigQuery]

问题描述

我有一个表,其中一列包含原始 JSON 字符串,如下所示:

enter image description here

存储在 order_lines 中的示例 JSON:

.img-container img

我想取消嵌套此字符串,以便可以单独访问键值对。此外,sku_code(上面共享的示例中的“STR_BLK_002”)在任何其他列中都不可用,并且该字符串可以包含更多单个 sku,因此如果有 2 个 sku(s) 对应于一个订单,那么 JSON 字符串将是:

{
   "STR_BLK_002":{
      "amount":167,"type":"part spare","total_discount":0,"color":"Black","is_out_of_stock":false,"variable_fields":{
         "Size":"XL","trueColor":"Black"
      },"category_id":"44356721","status_list":[
         {
            "id":1,"time":"2021-04-01T15:01:54.746Z","status":"ORDER PLACED"
         },{
            "id":2,"time":"2021-04-02T10:31:00.397Z","status":"PACKED"
         },{
            "id":3,"time":"2021-04-04T10:31:01.719Z","status":"SHIPPED"
         },"time":"2021-04-04T18:12:06.896Z","status":"SHIPPED"
         }
      ],"product_id":270,"price_per_quantity":167,"quantity":1,"maximum_quantity":10,"variant_name":"Helmet strap","current_status":30,"estimated_delivery":"09 Apr 2021","total_before_discount":167,"delivery_statuses":[
         {
            "time":"2021-04-01T15:10:13.594Z","status":"FULFILLABLE"
         },{
            "time":"2021-04-02T10:31:00.397Z",{
            "time":"2021-04-03T10:31:01.197Z","status":"READY_TO_SHIP"
         },{
            "time":"2021-04-04T10:31:01.719Z","status":"disPATCHED"
         },{
            "time":"2021-04-04T18:12:06.896Z","sku_code":"STR_BLK_002"
   }
}

我想将此信息分成单独的列,以便我可以获取每个 SKU 的相应值。

解决方法

下面应该会给你一个好的开始

select  
  json_extract_scalar(line,'$.sku_code') as sku_code,json_extract_scalar(line,'$.amount') as amount,'$.type') as type,'$.total_discount') as total_discount,'$.color') as color,'$.variable_fields.Size') as Size,'$.variable_fields.trueColor') as trueColor,from `project.dataset.table`,unnest(split(regexp_replace(regexp_replace(order_lines,r'\s',''),r'"STR_BLK_\d+":{','"STR_BLK":{'),'"STR_BLK":')) order_line with offset,unnest([struct('{' || trim(order_line,',{}}') || '}' as line)]) 
where offset > 0    

如果应用于您问题中的第一个示例 - 输出为

enter image description here

如果应用于您问题中的第二个示例 - 输出为

enter image description here

希望,您可以将此示例扩展到您想到的任何最终目标

,

所以基本上我认为你想要做的是首先将你的列转换成一个结构数组,而不是这样:

{
   "STR_BLK_002": {...},"STR_BLK_003": {...}
}

你有这样的事情:

[
  {
      "amount":167,"type":"part spare","total_discount":0,...
   },{ 
      "amount":590,"type":"accessory",...
  }
]

使用该格式的数据,您可以利用 UNNEST 将每个条目放入自己的行中,然后使用 JSON functions 将字段提取到它们自己的列中,例如 JSON_EXTRACT_SCALAR

为了做到这一点,我构建了一个 Javascript UDF 来查找对象中的键,然后遍历每个键以创建一个结构数组。

CREATE TEMP FUNCTION format_json(str STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS r"""
  var obj = JSON.parse(str);
  var keys = Object.keys(obj);
  var arr = [];
  for (i = 0; i < keys.length; i++) {
    arr.push(JSON.stringify(obj[keys[i]]));
  }
  return arr;
""";

SELECT 
  JSON_EXTRACT_SCALAR(formatted_json,JSON_EXTRACT_SCALAR(formatted_json,'$.is_out_of_stock') as is_out_of_stock,'$.sku_code') as sku_code
from
testing.json_test
left join unnest(format_json(order_lines)) as formatted_json

结果如下:

enter image description here