如何在 BigQuery SQL 中使用 UNNEST 和 SPLIT 避免重复?

问题描述

我有以下数据

ID Historical_UTMs
1 a,b,c,d;e,f,g,h;
2 i,j,k,l;
3 m,n,o,p;q,r,s,t;u,v,w,x;

我想以以下内容结束

ID utm_Type utm_Timestamp utm_Web_Page utm_Referrer
1 b c d
1 e f g h
2 j k l
3 n o p
3 q r s t
3 v w x

我想将 Historical_UTMs 字段的内容拆分为不同的行(以 ; 分隔),所有行都保留 Id 字段,并且还想拆分新行中的每个值(以,分隔)。>

我有以下脚本可以创建一个包含正确信息的表格。 问题是所有的记录都是重复的。

有没有人可以帮助我理解为什么这个脚本会创建重复的行,以及如何修复它?

with Expanded as (
  select 
    Lead.Id,Lead.Historical_UTMs
  from
    `dataset.GS_UTMs` AS Lead,unnest(split(Historical_UTMs,';')) AS History_UTMs
)

select
  Expanded.Id,split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,')[safe_offset(1)] as utm_Timestamp,')[safe_offset(2)] as utm_Web_Page,')[safe_offset(3)] as utm_Referrer,from
  Expanded

解决方法

考虑以下

select Id,UTM[offset(0)] as utm_Type,UTM[offset(1)] as utm_Timestamp,UTM[offset(2)] as utm_Web_Page,UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,unnest(split(trim(Historical_UTMs,';'),';')) Historical_UTM,unnest([struct(split(Historical_UTM) as UTM)])        

如果应用于您问题中的样本数据 - 输出为

enter image description here

,

如果我理解正确,问题是 historical_utms 在 CTE 中有多种含义,而您使用了错误的含义。也许这样的事情会奏效:

with Expanded as (
      select l.Id,Historical_UTM
      from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
           unnest(split(Historical_UTMs,';')) AS History_UTM
          )
select e.Id,split(e.Historical_UTM,',')[safe_offset(0)] as utm_Type,')[safe_offset(1)] as utm_Timestamp,')[safe_offset(9)] as utm_Web_Page,')[safe_offset(10)] as utm_Referrer
from Expanded e;