问题描述
我有两个这样的数据框:
df1:
col1 col2 time
0 A A_1 05:02:03
1 A A_2 15:36:14
2 A A_1 28:21:47
3 A A_1 47:21:17
4 A A_1 32:28:01
5 A A_2 37:27:14
我想比较“时间”列中的时间是否为 24 but 48 but 72 并将这些结果放入另一个数据框,如下所示:
df2:
col1 col2 time <24 24<time<48 48<time<72 time>72
0 A A_1 1 3 NaN NaN
1 A A_2 1 1 NaN NaN
所以,基本上我在这个 df2 中想要的是满足比较的文件数,例如“时间”列中有三个文件属于 A 和 A_1,时间为 24
解决方法
让我们试试:
- 将时间值转换为 TimeDelta 以获取天数
-
clip
以确保值不超过 3 天 - 使用
pivot_table
然后清理列
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': {0: 'A',1: 'A',2: 'A',3: 'A',4: 'A',5: 'A'},'col2': {0: 'A_1',1: 'A_2',2: 'A_1',3: 'A_1',4: 'A_1',5: 'A_2'},'time': {0: '05:02:03',1: '15:36:14',2: '28:21:47',3: '47:21:17',4: '32:28:01',5: '37:27:14'}})
df['days'] = (
pd.to_timedelta(df['time']).dt.days # Get Days from Time Delta
.clip(lower=0,upper=3) # Clip at 3 Days
)
time_cols = ['time < 24','24 <= time < 48','48 <= time < 72','time >= 72']
df = (
df.pivot_table(index=['col1','col2'],columns='days',aggfunc='count',fill_value=np.nan)
.droplevel(0,1) # Remove Column Multi Index
.reset_index() # Reset index
.rename_axis(None,axis=1) # Remove Axis Name
.rename(columns={i: v for i,v in enumerate(time_cols)})
)
# Add Missing Columns
df[list(set(time_cols).difference(df.columns))] = np.nan
# Reorder Columns
df = df[['col1','col2',*time_cols]]
print(df)
df
:
col1 col2 time < 24 24 <= time < 48 48 <= time < 72 time >= 72
0 A A_1 1 3 NaN NaN
1 A A_2 1 1 NaN NaN