去重交易记录

问题描述

我有点卡住了!我有数据,如下所示。

enter image description here

我需要计算每个客户之间的频率总和。在上面,FROM customer1 TO customer2 应与 FROM customer2 TO customer1 相加 - 如下所示。

消息进入哪个方向无关紧要;我只需要总结 customer1 和 customer2 之间的所有通信。

enter image description here

解决方法

您可以按如下方式使用 greatestleast 功能:

select least(from,to) as from,greatest(from,to) as to,sum(frequency) as freq
  from your_Table
 group by least(from,to),to)

如果您的版本不支持最大和最小,那么您也可以使用 case..when

select case when from > to then to else from end as from,case when from > to then from else to end as to,sum(frequency) as freq
  from your_Table
 group by case when from > to then to else from end,case when from > to then from else to end
,

您可以尝试对 From To 进行排序

import http.client
import pandas as pd
import json
import requests

conn=http.client.HTTPSConnection("abc.com")
headers = {
         'authorization' : 'xyz'
         'cache-control': "no-cache"
         'postman-token: 'xyz'
}

url='xxyyzz'
conn.request("GET",url,headers=headers)
res=conn.getresponse()
data=res.read()
data_z=json.loads(data)
df=pd.DataFrame(datax['rows'])


,

您可以按排序数组分组:

WITH tab AS
(SELECT * FROM (VALUES ('Customer 1','Customer 2',2),('Customer 2','Customer 1',4),('Customer 3',4)
) a ([From],[To],[Frequency])
)
SELECT IIF([From] > [To],[From]) [From],IIF([From] > [To],[From],[To]) [To],SUM([Frequency]) Frequency
From tab
GROUP BY IIF([From] > [To],[From]),[To]) 
,

考虑到具有相同键值对但顺序不同的两个 map() 是相等的,因为映射根据定义是无序的,您可以利用此属性来聚合频率。

使用您的数据示例进行演示:

with mytable as(
select stack (3,2,4,'Customer 3',4  
) as (`from`,`to`,frequency)
)

select map_keys(vmap)[0] as `from`,map_keys(vmap)[1] as `to`,frequency
from
(
select map(`from`,1,1) vmap,sum(frequency) frequency
 from mytable group by map(`from`,1) 
)s;

结果:

from          to              frequency
Customer 2    Customer 1      6
Customer 3    Customer 1      4