问题描述
我有以下数据
ID | Historical_UTMs |
---|---|
1 | a,b,c,d;e,f,g,h; |
2 | i,j,k,l; |
3 | m,n,o,p;q,r,s,t;u,v,w,x; |
我想以以下内容结束
ID | utm_Type | utm_Timestamp | utm_Web_Page | utm_Referrer |
---|---|---|---|---|
1 | 一 | b | c | d |
1 | e | f | g | h |
2 | 我 | j | k | l |
3 | 米 | n | o | p |
3 | q | r | s | t |
3 | 你 | v | w | x |
我想将 Historical_UTMs 字段的内容拆分为不同的行(以 ; 分隔),所有行都保留 Id 字段,并且还想拆分新行中的每个值(以,分隔)。>
我有以下脚本可以创建一个包含正确信息的表格。 问题是所有的记录都是重复的。
有没有人可以帮助我理解为什么这个脚本会创建重复的行,以及如何修复它?
with Expanded as (
select
Lead.Id,Lead.Historical_UTMs
from
`dataset.GS_UTMs` AS Lead,unnest(split(Historical_UTMs,';')) AS History_UTMs
)
select
Expanded.Id,split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,')[safe_offset(1)] as utm_Timestamp,')[safe_offset(2)] as utm_Web_Page,')[safe_offset(3)] as utm_Referrer,from
Expanded
解决方法
考虑以下
select Id,UTM[offset(0)] as utm_Type,UTM[offset(1)] as utm_Timestamp,UTM[offset(2)] as utm_Web_Page,UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,unnest(split(trim(Historical_UTMs,';'),';')) Historical_UTM,unnest([struct(split(Historical_UTM) as UTM)])
如果应用于您问题中的样本数据 - 输出为
,如果我理解正确,问题是 historical_utms
在 CTE 中有多种含义,而您使用了错误的含义。也许这样的事情会奏效:
with Expanded as (
select l.Id,Historical_UTM
from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
unnest(split(Historical_UTMs,';')) AS History_UTM
)
select e.Id,split(e.Historical_UTM,',')[safe_offset(0)] as utm_Type,')[safe_offset(1)] as utm_Timestamp,')[safe_offset(9)] as utm_Web_Page,')[safe_offset(10)] as utm_Referrer
from Expanded e;