在Python中对指定列进行分组和填充

问题描述

我想按id_CodeTimestamp对值进行排序(因为时间顺序很重要),然后使用d1和{{1 }},然后仅对id_Code的每个列使用ffill分别对NaN进行填充,同时保持其他列不变,并返回完整表格。

V1

V2@H_404_27@

尝试:

d1@H_404_27@

仅返回两列:


    Type_x  id_             Timestamp               V1   Code Type_y    V2
0   abcd    39-38-30-34     2012-09-20 23:46:05.870 35.5    2    NaN    0
1   abcd    39-38-30-34     2012-09-20 23:46:23.870 44.5    0    NaN    1
2   abcd    39-38-30-34     2012-09-20 23:48:07.870 43.5    0    NaN    1
3   abcd    39-38-30-34     2012-09-20 23:49:48.870 42.5    0    NaN    NaN
4   abcd    39-38-30-34     2012-09-20 23:50:44.870 34.5    2    NaN    NaN
@H_404_27@

我应该如何正确做?

解决方法

您需要退还什么?

d2 = d1.sort_values(by = ['id_','Code','Timestamp']).groupby(['id_','Code']).ffill()

    

             Type_x     Timestamp    V1  Type_y   V2
1 abcd   39-38-30-34  23:46:23.870  44.5     NaN  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5     NaN  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5     NaN  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5     NaN  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5     NaN  0.0

d2 = d1.sort_values(by = ['id_','Code']).ffill().dropna(1)
print(d2)

 

             Type_x     Timestamp    V1   V2
1 abcd   39-38-30-34  23:46:23.870  44.5  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5  0.0
,

如果实际数据帧中除了要transform的列和要groupby的列以外的其他列,可以使用ffill并逐列进行操作:

d2 = d1.sort_values(by = ['id_','Timestamp'])
d2['V1'] = d2.groupby(['id_','Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_','Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]: 
  Type_x          id_                Timestamp    V1  Code  Type_y   V2
1  abcd   39-38-30-34  2012-09-20 23:46:23.870  44.5  0    NaN      1.0
2  abcd   39-38-30-34  2012-09-20 23:48:07.870  43.5  0    NaN      1.0
3  abcd   39-38-30-34  2012-09-20 23:49:48.870  42.5  0    NaN      1.0
0  abcd   39-38-30-34  2012-09-20 23:46:05.870  35.5  2    NaN      0.0
4  abcd   39-38-30-34  2012-09-20 23:50:44.870  34.5  2    NaN      0.0