我想使用表达式

问题描述

company  | email | phone | website | address
Amar CO LLC | amar@gmail.com | 123 | NULL | India
Amar CO | amar@gmail.com | NULL | NULL | IND
Stacks CO | stack@gmil.com | 910 | stacks.com | United Kingdom
Stacks CO LLC | stack@gmail.com | NULL | NULL | UK

我想用 CO LLC 删除公司名称,而想保留 Amar CO,但想要 Amar CO LLC 中的所有列,因为它具有 minimum NULL 值或最大列数据。

简而言之:删除重复记录,删除以“以 LLC 结尾或匹配”的公司名称(不区分大小写),但保留具有最大信息列的两个记录中的值。

预期输出

Amar CO | amar@gmail.com | 123 | NULL | India
Stacks CO | stack@gmil.com | 910 | stacks.com | United Kingdom

解决方法

您需要 group byreplace 如下:

select replace(company,' LLC','') as company,max(email) as email,max(phone) as phone,max(website) as website,max(address) as address
  from your_table t
group by replace(company,'')

我可以看到您需要两行的所有数据,但应优先考虑 LLC 记录(India,IND --> India),然后您可以按如下方式使用它:

select t.company,coalesce(tt.email,t.emial) as email,coalesce(tt.phone,t.phone) as phone
       coalesce(tt.website,t.website) as website,coalesce(tt.address,t.address) as address
  from your_table t join your_table tt 
    on concat(t.company,' LLC') = tt.company

如果您想更新数据然后删除记录本身,我建议使用以下 deleteupdate

delete from your_table where t.company = 'Amar CO';

update your_table t
set t.comapny = replace(company,'') -- or use 'Amar CO'
where t.company = 'Amar CO LLC';

-- 更新

您想优先考虑具有最小空值的记录,那么您可以使用以下查询:

select t.company,case when tt_nulls > t_nulls then ttemail else temail end as email,case when tt_nulls > t_nulls then ttphone else tphone end as phone,case when tt_nulls > t_nulls then ttwebsite else twebsite end as website,case when tt_nulls > t_nulls then taddress else taddress end as address
from    
(select t.company,count(case when t.email IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when t.phone IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when t.website IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when t.address IS NULL THEN 1 end) over (partition by t.company)  
        as t_nulls,count(case when tt.email IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when tt.phone IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when tt.website IS NULL THEN 1 end) over (partition by t.company) 
        + count(case when tt.address IS NULL THEN 1 end) over (partition by t.company)  
        as tt_nulls
        t.email as temail,t.phone as tphone,t.website as twebsite,t.address as taddress,tt.email as ttemail,tt.phone as ttphone,tt.website as ttwebsite,tt.address as ttaddress
   from your_table t join your_table tt 
     on concat(t.company,' LLC') = tt.company) t
,

优先考虑具有最小空值的记录...

以下是 BigQuery 标准 SQL(查询#1)

#standardSQL
select 
  array_agg(t 
    order by array_length(regexp_extract_all(to_json_string(t),':null')) 
    limit 1
  )[offset(0)].* 
  replace(regexp_replace(company,r'(?i)CO LLC','CO') as company) 
from `project.dataset.table` t
group by company 

如果应用于您问题中的样本数据 - 输出为

enter image description here

如果您想填写所有记录中的所有字段 - 您可以使用下面的 (query#2)

select regexp_replace(company,'CO') as company,max(email) email,max(phone) phone,max(website) website,max(address) address
from `project.dataset.table`
group by company 

最后 - 如果您仍然希望优先考虑具有最小空值的记录,但其余空值替换为来自其他行的值 - 在下面使用(查询#3)

select company,ifnull(email,max_email) email,ifnull(phone,max_phone) phone,ifnull(website,max_website) website,ifnull(address,max_address) address
from (
  select array_agg(t 
      order by array_length(regexp_extract_all(to_json_string(t),':null')) 
      limit 1
    )[offset(0)].* 
    replace(regexp_replace(company,'CO') as company),max(email) max_email,max(phone) max_phone,max(website) max_website,max(address) max_address
  from `project.dataset.table` t
  group by company 
)  

您可以通过将它们应用于以下虚拟数据来测试/检查此选项与前一个选项之间的差异

with `project.dataset.table` as (
  select 'Amar CO LLC' company,'amar@gmail.com' email,123 phone,NULL website,'India' address union all
  select 'Amar CO',NULL,222,'amar.com',NULL union all
  select 'Stacks CO LLC','stack@gmail.com','UK' union all
  select 'Stacks CO','stack@gmil.com',910,'stacks.com','United Kingdom'
)

最后一个查询(查询#3)给出

enter image description here

而前一个(查询#2)只会在所有行中给出最大值

enter image description here

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...