问题描述
我想按id_
,Code
,Timestamp
对值进行排序(因为时间顺序很重要),然后使用d1
和{{1 }},然后仅对id_
和Code
的每个列使用ffill
分别对NaN
和进行填充,同时保持其他列不变,并返回完整表格。
V1
:
V2
@H_404_27@尝试:
d1
@H_404_27@仅返回两列:
Type_x id_ Timestamp V1 Code Type_y V2 0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0 1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1 2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1 3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN NaN 4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN NaN
@H_404_27@我应该如何正确做?
解决方法
您需要退还什么?
d2 = d1.sort_values(by = ['id_','Code','Timestamp']).groupby(['id_','Code']).ffill() Type_x Timestamp V1 Type_y V2 1 abcd 39-38-30-34 23:46:23.870 44.5 NaN 1.0 2 abcd 39-38-30-34 23:48:07.870 43.5 NaN 1.0 3 abcd 39-38-30-34 23:49:48.870 42.5 NaN 1.0 0 abcd 39-38-30-34 23:46:05.870 35.5 NaN 0.0 4 abcd 39-38-30-34- 23:50:44.870 34.5 NaN 0.0
或
,d2 = d1.sort_values(by = ['id_','Code']).ffill().dropna(1) print(d2) Type_x Timestamp V1 V2 1 abcd 39-38-30-34 23:46:23.870 44.5 1.0 2 abcd 39-38-30-34 23:48:07.870 43.5 1.0 3 abcd 39-38-30-34 23:49:48.870 42.5 1.0 0 abcd 39-38-30-34 23:46:05.870 35.5 0.0 4 abcd 39-38-30-34- 23:50:44.870 34.5 0.0
如果实际数据帧中除了要
transform
的列和要groupby
的列以外的其他列,可以使用ffill
并逐列进行操作:d2 = d1.sort_values(by = ['id_','Timestamp']) d2['V1'] = d2.groupby(['id_','Code'])['V1'].transform(lambda x: x.ffill()) d2['V2'] = d2.groupby(['id_','Code'])['V2'].transform(lambda x: x.ffill()) d2 Out[1]: Type_x id_ Timestamp V1 Code Type_y V2 1 abcd 39-38-30-34 2012-09-20 23:46:23.870 44.5 0 NaN 1.0 2 abcd 39-38-30-34 2012-09-20 23:48:07.870 43.5 0 NaN 1.0 3 abcd 39-38-30-34 2012-09-20 23:49:48.870 42.5 0 NaN 1.0 0 abcd 39-38-30-34 2012-09-20 23:46:05.870 35.5 2 NaN 0.0 4 abcd 39-38-30-34 2012-09-20 23:50:44.870 34.5 2 NaN 0.0