问题描述
我有以下数据框:
import pandas as pd
hits = {'id': ['A','A','B','C','C'],'datetime': ['2010-01-02 03:00:00','2010-01-02 03:00:14','2010-01-02 03:00:35','2010-01-02 03:00:38','2010-01-02 03:29:10','2010-01-02 03:29:35','2010-01-02 03:45:20','2010-01-02 06:10:05','2010-01-02 06:10:15','2010-01-02 07:40:15','2010-01-02 07:40:20','2010-01-02 07:40:25'],'uri_len': [10,20,25,15,10,30,40,45]
}
df = pd.DataFrame(hits,columns = ['id','datetime','uri_len'])
df['datetime'] = pd.to_datetime(df['datetime'],format='%Y-%m-%d %H:%M:%s')
print (df)
id datetime uri_len
0 A 2010-01-02 03:00:00 10
1 A 2010-01-02 03:00:14 20
2 A 2010-01-02 03:00:35 25
3 A 2010-01-02 03:00:38 15
4 A 2010-01-02 03:29:10 20
5 A 2010-01-02 03:29:35 10
6 B 2010-01-02 03:45:20 20
7 B 2010-01-02 06:10:05 25
8 B 2010-01-02 06:10:15 15
9 C 2010-01-02 07:40:15 30
10 C 2010-01-02 07:40:20 40
11 C 2010-01-02 07:40:25 45
我想使用id
作为变量分组按会话对命中进行分组。对我来说,会话的闲置时间超过15秒(从datetime
列计算),或者uri_len
列减少,并且在两种情况下都比较连续命中。
我知道如何分别根据每个条件进行分组:
df['session1'] = (df.groupby('id')['datetime']
.transform(lambda x: x.diff().gt('15Sec').cumsum())
)
df['session2'] = (df.groupby('id')['uri_len']
.transform(lambda x: x.diff().lt(0).cumsum())
)
有没有办法在同一行中组合两个转换,所以输出就是这个?:
id datetime uri_len session
0 A 2010-01-02 03:00:00 10 0
1 A 2010-01-02 03:00:14 20 0
2 A 2010-01-02 03:00:35 25 1
3 A 2010-01-02 03:00:38 15 2
4 A 2010-01-02 03:29:10 20 3
5 A 2010-01-02 03:29:35 10 4
6 B 2010-01-02 03:45:20 20 0
7 B 2010-01-02 06:10:05 25 1
8 B 2010-01-02 06:10:15 15 2
9 C 2010-01-02 07:40:15 30 0
10 C 2010-01-02 07:40:20 40 0
11 C 2010-01-02 07:40:25 45 0
解决方法
如果我理解正确,您想添加它们吗?
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.animation as animation
n = 100
x = np.random.randn(n)
def update(curr):
if curr == n:
a.event_source.stop()
plt.cla()
bins = np.arange(-4,4,0.5)
plt.hist(x[:curr],bins=bins)
plt.axis([-4,30])
plt.gca().set_title('Sampling the Normal Distribution')
plt.gca().set_ylabel('Frequency')
plt.gca().set_xlabel('Value')
plt.annotate('n = {}'.format(curr),[3,27])
fig = plt.figure()
a = animation.FuncAnimation(fig,update,interval=10)
更清晰的方法:
df['session'] = df.groupby('id')['datetime'].transform(lambda x:
x.diff().gt('15Sec').cumsum()) + df.groupby('id')['uri_len'].transform(lambda x:
x.diff().lt(0).cumsum())