根据姓名和时间获得中位数

问题描述

假设我有一个数据框:

 ID              Timestamp   Temp     Height
 01    2016-01-01 09:00:00     17         10
 01    2016-01-01 10:15:00     20          5
 02    2019-05-23 07:30:00     12          3
 01    2012-01-01 09:00:00     15         12
 01    2018-01-01 09:00:00     13         18
 02    2013-11-23 17:40:00     22          9

我想要获得的是基于“ ID”的列“ Height”的中位数,以及来自“ Timestamp”的月份和小时。我想要的输出示例:

 ID    Month     Hour     Height_median
 01       01       09                12
 01       01       10                 5
 02       05       07                 3
 02       11       17                 9

解决方法

您在这里:

import pandas as pd 

df = pd.DataFrame(dict(ID=[1,1,2,2],Timestamp=['2016-01-01 09:00:00','2016-01-01 10:15:00','2019-05-23 07:30:00','2012-01-01 09:00:00','2018-01-01 09:00:00','2013-11-23 17:40:00'],Height = [10,5,3,12,18,9]))

df.Timestamp = pd.to_datetime(df.Timestamp)
df['month'] = df.Timestamp.apply(lambda x: x.month)
df['hour'] = df.Timestamp.apply(lambda x: x.hour)
df.groupby(['ID','month','hour'])[['Height']].agg('median').reset_index()