问题描述
ID | timestamp |Phase| current
========================================
001 | 2020-09-20 07:00 | A | 1.4
001 | 2020-09-20 07:00 | B | 2.0
001 | 2020-09-20 07:00 | C | 1.6
002 | 2020-09-20 09:00 | A | 1.4
002 | 2020-09-20 09:00 | B | 1.23
002 | 2020-09-20 09:00 | C | 1.46
我需要计算每个ID /时间戳分组的相位百分比差异,因此我创建了groupby:
imbalanced = df.groupby(['timestamp','ID']).apply(calcImbalance)
这是calcImbalance:
def calcImbalance(pole):
phA = pole.loc[pole['Phase'] == 'A']['current'].astype('float')
phB = pole.loc[pole['Phase'] == 'B']['current'].astype('float')
phC = pole.loc[pole['Phase'] == 'C']['current'].astype('float')
imb = abs((phA-phB)/phB)
print ('imb:',imb)
if imb >= 0.3:
return pole
imb = abs((phB-phA)/phA)
if imb >= 0.3:
return pole
imb = abs((phA-phC)/phC)
if imb >= 0.3:
return pole
imb = abs((phC-phA)/phA)
if imb >= 0.3:
return pole
但这只是打印:
imb: 2661 NaN
2662 NaN
Name: Amps,dtype: float64
imb: 2661 NaN
2662 NaN
Name: Amps,dtype: float64
然后
引发异常:
ValueError: The truth value of a Series is ambiguous. Use a.empty,a.bool(),a.item(),a.any() or a.all().
我想做的是只创建一个df实例的数据帧,这些实例之间的相位差大于30%。我想我已经走了一个兔子洞,因为它看起来应该是微不足道的
在上面的示例中,“不平衡”数据框应包含:
ID | timestamp |Phase| current
========================================
001 | 2020-09-20 07:00 | A | 1.4
001 | 2020-09-20 07:00 | B | 2.0
apply函数不测试B和C相之间的不平衡,仅测试A&B和A&C相
解决方法
IIUC您可以使用熊猫函数找到所需的行
df['cng'] = (df.groupby('ID')['current'].pct_change() + 1).groupby(df.ID).cumprod()-1
df[df.groupby('ID')['cng'].transform(lambda x: x.fillna(x.max())) > .30]
输出
ID timestamp Phase current cng
0 1 2020-09-20 07:00 A 1.4 NaN
1 1 2020-09-20 07:00 B 2.0 0.428571
这是如何工作的
要查找在> 0.30的阶段之间发生变化的组
df[df.groupby('ID')['current'].pct_change().groupby(df.ID).transform('max') > .30]
输出
ID timestamp Phase current
0 1 2020-09-20 07:00 A 1.4
1 1 2020-09-20 07:00 B 2.0
2 1 2020-09-20 07:00 C 1.6
这给出了组中的百分比变化
df.groupby('ID')['current'].pct_change()
输出
0 NaN
1 0.428571
2 -0.200000
3 NaN
4 -0.121429
5 0.186992
每组的累积更改
(df.groupby('ID')['current'].pct_change() + 1).groupby(df.ID).cumprod()
输出
0 NaN
1 1.428571
2 1.142857
3 NaN
4 0.878571
5 1.042857
此解决方案可以检测到什么?
import pandas as pd
df = pd.DataFrame([('001','2020-09-20 07:00','A',1.4),('001','B',2.0),'C',1.6),('002','2020-09-20 09:00',1.2),('003','D',],columns=['ID','timestamp','Phase','current'])
在数据框中
ID timestamp Phase current
0 001 2020-09-20 07:00 A 1.4
1 001 2020-09-20 07:00 B 2.0
2 001 2020-09-20 07:00 C 1.6
3 002 2020-09-20 09:00 A 1.4
4 002 2020-09-20 09:00 B 1.2
5 002 2020-09-20 09:00 C 2.0
6 003 2020-09-20 09:00 A 1.4
7 003 2020-09-20 09:00 B 2.0
8 003 2020-09-20 09:00 C 1.6
9 003 2020-09-20 09:00 D 2.0
使用此解决方案
df['cng'] = (df.groupby('ID')['current'].pct_change() + 1).groupby(df.ID).cumprod()-1
df[df.groupby('ID')['cng'].transform(lambda x: x.fillna(x.max())) > .30]
结果。请注意,cng
是用于计算对第一行的更改的累积乘积。
ID timestamp Phase current cng
0 001 2020-09-20 07:00 A 1.4 NaN
1 001 2020-09-20 07:00 B 2.0 0.428571
3 002 2020-09-20 09:00 A 1.4 NaN
5 002 2020-09-20 09:00 C 2.0 0.428571
6 003 2020-09-20 09:00 A 1.4 NaN
7 003 2020-09-20 09:00 B 2.0 0.428571
9 003 2020-09-20 09:00 D 2.0 0.428571
,
根据您的代码,这可能会起作用。这会将电流收集到一个列表中,并将其传递给calcImbalance
函数。
import pandas as pd
dd = {
'ID':[1,1,2,2],'timestamp':['2020-09-20 07:00','2020-09-20 09:00'],'Phase':['A','C'],'current':[1.4,1.5,1.6,1.4,1.23,1.46]
}
df = pd.DataFrame(dd)
def calcImbalance(pole):
phA,phB,phC = tuple(pole) # currents in group
print('ph >',phA,phC)
imb = abs((phA-phB)/phB)
print ('imb:',imb)
if imb >= 0.3:
return pole
imb = abs((phB-phA)/phA)
if imb >= 0.3:
return pole
imb = abs((phA-phC)/phC)
if imb >= 0.3:
return pole
imb = abs((phC-phA)/phA)
if imb >= 0.3:
return pole
gb = df.groupby(['timestamp','ID'])['current'].apply(lambda x:[i for i in x]).apply(calcImbalance)
print('\n',gb)
输出
ph > 1.4 1.5 1.6
imb: 0.06666666666666672
ph > 1.4 1.23 1.46
imb: 0.13821138211382109
timestamp ID
2020-09-20 07:00 1 None
2020-09-20 09:00 2 None
Name: current,dtype: object
-更新-
根据您的帖子更新,这可能不是完整的答案,但仍然可以帮助您找到解决方案。
,编辑:此代码回答了问题,包括编辑内容。
import pandas as pd
def calc_imbalance(current):
pairs_to_test = [[0,1],[0,[1,2]]
for pair in pairs_to_test:
abs_percentage_imbalance = abs((current[pair[0]] - current[pair[1]])/current[pair[1]])
if abs_percentage_imbalance >= .3:
return pair
return []
df = pd.DataFrame([('001',1.23),1.46)],'current'])
df['original_index'] = df.index
all_index_to_keep = []
for _,group in df.groupby(['timestamp','ID']).agg(list).reset_index().iterrows():
index_to_keep = calc_imbalance(group['current'])
all_index_to_keep += [v for k,v in enumerate(group['original_index']) if k in index_to_keep]
df.drop('original_index',axis=1,inplace=True)
print(df.loc[all_index_to_keep,:])
返回:
ID timestamp Phase current
0 001 2020-09-20 07:00 A 1.4
1 001 2020-09-20 07:00 B 2.0