问题描述
我正在尝试将一些学生数据库与GSuite电子邮件进行调和,其中用户名的创建多年来一直不一致。
我要在BigQuery上进行查询的要旨是:
或在sql中:
with mymatches as (
with emaildataset as (
select 'testA' as col
union all
select 'testB'
union all
select 'testC'
union all
select 'testD'
)
select * from emaildataset where col like '%A'
union distinct
select * from emaildataset where col like '%B'
),emaildataset2 as (
select 'testA' as col
union all
select 'testB'
union all
select 'testC'
union all
select 'testD'
)
select * from mymatches
union distinct
select * from emaildataset2 where emaildataset2.col not in (select col from mymatches)
现在的真实代码是:
with matchedEmails as (
with g as (
select * from gsuite.StudentUsers
union all
select * from gsuite.AlumniUsers
)
select
std.STDCODE,g.*
from g
inner join quick.all_students_alumni as std
on split(lower(g.Email),'@')[offset(0)] = split(quick.studentEmail(std.FNAME,std.MNAME,std.LNAME,std.STATUSTYPE),'@')[offset(0)]
where g.OU like '/Student%' or OU like '/Alumni%'
union distinct select
std.STDCODE,'','@')[offset(0)]
where g.OU like '/Student%' or OU like '/Alumni%'
)
select * from matchedEmails
union distinct select
'NOT MATCHED' as STDCODE,g.*
from (
select * from gsuite.StudentUsers
union all
select * from gsuite.AlumniUsers
) as g
where g.Email not in (select Email from matchedEmails)
and g.OU like '/Student%' or OU like '/Alumni%'
但是,结果是,由于where g.Email not in (select Email from matchedEmails)
子句,基于上面的知识和测试,我在“电子邮件”列中得到了重复。
我做错什么了吗?
解决方法
我认为,最后一个WHERE子句应固定为如下所示
where g.Email not in (select Email from matchedEmails)
and (g.OU like '/Student%' or OU like '/Alumni%')
如您所见-g.OU like '/Student%' or OU like '/Alumni%'
周围的括号丢失了
也许还有其他东西仍需要修复-但这会在以下问题中回答您
但是,结果是,我在Email列中得到了重复,基于我的知识和上面的测试,该列不应该是重复的,原因是g.Email不在(从matchedEmails中选择Email)子句中。