问题描述
我有一张桌子
col1 | col2 | col3 | col4 | col5
id1 | 1 0 0 1 0
id2 | 1 1 0 0 0
id3 | 0 1 0 1 0
id4 | 0 0 1 0 1
id5 | 1 0 1 0 0
id6 | 0 0 0 1 0
.
.
.
idN
col1 | col2 | col3 | col4 | col5
col1 | 3 1 1 1 0
col2 | 1 2 0 1 0
col3 | 1 1 2 0 1
col4 | 1 1 1 2 0
col5 | 0 0 1 0 1
其中结果中的每个条目是一列中某个值1与另一列值为1发生的次数?
我可以通过执行以下操作获得对角线值:
SELECT
sum(col1),sum(col2),sum(col3),sum(col4),sum(col5)
FROM (
SELECT
col1,col2,col3,col4,col5,col1 + col2 + col3 + col4 + col5 ) AS total
FROM (
SELECT
ROW_NUMBER()OVER(PARTITION BY id ORDER BY date) row_num,*
FROM (
SELECT disTINCT(id),date,col1,col5
FROM db.schema.table)
)
WHERE row_num = 1 AND total <= 1
ORDER BY total DESC);
我认为我必须做某种枢纽或各种工会,但我似乎无法弄清楚。
解决方法
我认为我将通过取消数据透视和重新聚合来解决此问题。以下获取对和计数:
with u as (
select t.id,v.col
from t cross join lateral
(values ('col1',col1),('col2',col2),('col3',col3),('col4',col4),('col5',col5)
) v(col,val)
where val = 1
)
select u1.col,u2.col,count(*)
from u u1 join
u u2
on u1.id = u2.id
group by u1.col,u2.col;
这对我来说似乎已经足够好了,但是您可以使用条件聚合:
select u1.col,sum(case when u2.col = 'col1' then 1 else 0 end) as col1,sum(case when u2.col = 'col2' then 1 else 0 end) as col2,sum(case when u2.col = 'col3' then 1 else 0 end) as col3,sum(case when u2.col = 'col4' then 1 else 0 end) as col4,sum(case when u2.col = 'col5' then 1 else 0 end) as col5
from u u1 join
u u2
on u1.id = u2.id
group by u1.col;
,
这是一种展示Snowflake强大的半结构化函数之一(即for state in soup_state.find('a',href=True):
print(state['href'])
),并且还利用了以下两个元属性(OBJECT_CONSTRUCT(*)
和SEQ
)的一种方法KEY
函数,以便在原始(源)表上不需要唯一的业务密钥:
FLATTEN