SQL:表列的缺失百分比和唯一计数

问题描述

假设我有一个有 4 列的表格。 我想知道的每一列:

  • 缺失值的百分比(空计数)和
  • 独特的计数

如果我有一个包含 A B C 和 D 列的表格, 例如,上述情况的预期结果是:

Column_Name | PctMissing | UniqueCount
A           | 0.15       | 16
B           | 0          | 320
C           | 0.3        | 190
D           | 0.05       | 8

解决方法

如果您知道列数,我可能只会使用 union all

select 'a' as Column_Name,1.0*count(case when a is null then 1 end)/count(*) as PctMissing,count(distinct a) as UniqueCount
from t
union all
select 'b' as Column_Name,1.0*count(case when b is null then 1 end)/count(*) as PctMissing,count(distinct b) as UniqueCount
from t
union all
select 'c' as Column_Name,1.0*count(case when c is null then 1 end)/count(*) as PctMissing,count(distinct c) as UniqueCount
from t
union all
select 'd' as Column_Name,1.0*count(case when d is null then 1 end)/count(*) as PctMissing,count(distinct d) as UniqueCount
from t

Fiddle Demo

根据您的数据库,还有其他方法,但可能比 union all 更容易混淆。

,

我会这样写:

select 'a' as column_name,avg(case when a is null then 1.0 else 0 end) as missing_ratio,count(distinct a) as unique_count
from t
union all
select 'b' as column_name,avg(case when b is null then 1.0 else 0 end) as missing_ratio,count(distinct b) as unique_count
from t
union all
select 'c' as column_name,avg(case when c is null then 1.0 else 0 end) as missing_ratio,count(distinct c) as unique_count
from t
union all
select 'd' as column_name,avg(case when d is null then 1.0 else 0 end) as missing_ratio,count(distinct d) as unique_count
from t;