问题描述
- 缺失值的百分比(空计数)和
- 独特的计数
如果我有一个包含 A B C 和 D 列的表格, 例如,上述情况的预期结果是:
Column_Name | PctMissing | UniqueCount
A | 0.15 | 16
B | 0 | 320
C | 0.3 | 190
D | 0.05 | 8
解决方法
如果您知道列数,我可能只会使用 union all
:
select 'a' as Column_Name,1.0*count(case when a is null then 1 end)/count(*) as PctMissing,count(distinct a) as UniqueCount
from t
union all
select 'b' as Column_Name,1.0*count(case when b is null then 1 end)/count(*) as PctMissing,count(distinct b) as UniqueCount
from t
union all
select 'c' as Column_Name,1.0*count(case when c is null then 1 end)/count(*) as PctMissing,count(distinct c) as UniqueCount
from t
union all
select 'd' as Column_Name,1.0*count(case when d is null then 1 end)/count(*) as PctMissing,count(distinct d) as UniqueCount
from t
根据您的数据库,还有其他方法,但可能比 union all
更容易混淆。
我会这样写:
select 'a' as column_name,avg(case when a is null then 1.0 else 0 end) as missing_ratio,count(distinct a) as unique_count
from t
union all
select 'b' as column_name,avg(case when b is null then 1.0 else 0 end) as missing_ratio,count(distinct b) as unique_count
from t
union all
select 'c' as column_name,avg(case when c is null then 1.0 else 0 end) as missing_ratio,count(distinct c) as unique_count
from t
union all
select 'd' as column_name,avg(case when d is null then 1.0 else 0 end) as missing_ratio,count(distinct d) as unique_count
from t;