问题描述
我有一个下表列出的条目。表格中有缺失的日期条目,我想用月末的缺失日期值填充。对于第一列,应该使用相同的值填充缺少日期条目的添加行,对于值列,我希望它们填充0。
FirstName MiddleName LastName Date Value1 Value2 Value3
first1 middle1 last1 1/31/2020 51 80 19
first1 middle1 last1 2/29/2020 14 44 56
first1 middle1 last1 4/30/2020 57 96 40
first1 middle1 last1 6/30/2020 58 65 3
first1 middle1 last1 8/31/2020 1 34 4
first1 middle1 last1 10/31/2020 40 38 53
first1 middle1 last1 12/31/2020 93 65 41
first1 middle1 last1 2/28/2021 3 43 0
first1 middle1 last1 4/30/2021 46 61 52
first2 middle2 last2 1/31/2020 64 19 33
first2 middle2 last2 2/29/2020 28 71 16
first2 middle2 last2 4/30/2020 2 94 78
first2 middle2 last2 5/31/2020 78 99 87
first2 middle2 last2 6/30/2020 10 70 14
first2 middle2 last2 7/31/2020 30 30 59
first2 middle2 last2 8/31/2020 55 96 73
first2 middle2 last2 10/31/2020 22 43 23
first2 middle2 last2 11/30/2020 12 4 84
first2 middle2 last2 1/31/2021 59 93 1
first2 middle2 last2 2/28/2021 19 33 52
first2 middle2 last2 3/31/2021 46 12 97
first2 middle2 last2 4/30/2021 41 44 59
first2 middle2 last2 5/31/2021 67 84 96
first2 middle2 last2 6/30/2021 52 69 78
first3 middle3 last3 4/30/2020 5 63 30
first3 middle3 last3 5/31/2020 45 22 7
first3 middle3 last3 6/30/2020 76 2 33
first3 middle3 last3 8/31/2020 81 25 52
first3 middle3 last3 9/30/2020 55 3 32
first3 middle3 last3 11/30/2020 46 45 80
first3 middle3 last3 12/31/2020 17 81 74
first3 middle3 last3 1/31/2021 98 6 55
'''
预期产量
FirstName MiddleName LastName Date Value1 Value2 Value3
first1 middle1 last1 1/31/2020 51 80 19
first1 middle1 last1 2/29/2020 14 44 56
first1 middle1 last1 3/31/2020 0 0 0
first1 middle1 last1 4/30/2020 57 96 40
first1 middle1 last1 5/31/2020 0 0 0
first1 middle1 last1 6/30/2020 58 65 3
first1 middle1 last1 7/31/2020 0 0 0
first1 middle1 last1 8/31/2020 1 34 4
first1 middle1 last1 9/30/2020 0 0 0
first1 middle1 last1 10/31/2020 40 38 53
first1 middle1 last1 11/30/2020 0 0 0
first1 middle1 last1 12/31/2020 93 65 41
first1 middle1 last1 1/31/2021 0 0 0
first1 middle1 last1 2/28/2021 3 43 0
first1 middle1 last1 3/31/2021 0 0 0
first1 middle1 last1 4/30/2021 46 61 52
first2 middle2 last2 1/31/2020 64 19 33
first2 middle2 last2 2/29/2020 28 71 16
first2 middle2 last2 3/31/2020 0 0 0
first2 middle2 last2 4/30/2020 2 94 78
first2 middle2 last2 5/31/2020 78 99 87
first2 middle2 last2 6/30/2020 10 70 14
first2 middle2 last2 7/31/2020 30 30 59
first2 middle2 last2 8/31/2020 55 96 73
first2 middle2 last2 9/30/2020 0 0 0
first2 middle2 last2 10/31/2020 22 43 23
first2 middle2 last2 11/30/2020 12 4 84
first2 middle2 last2 12/31/2020 0 0 0
first2 middle2 last2 1/31/2021 59 93 1
first2 middle2 last2 2/28/2021 19 33 52
first2 middle2 last2 3/31/2021 46 12 97
first2 middle2 last2 4/30/2021 41 44 59
first2 middle2 last2 5/31/2021 67 84 96
first2 middle2 last2 6/30/2021 52 69 78
first3 middle3 last3 4/30/2020 5 63 30
first3 middle3 last3 5/31/2020 45 22 7
first3 middle3 last3 6/30/2020 76 2 33
first3 middle3 last3 7/31/2020 0 0 0
first3 middle3 last3 8/31/2020 81 25 52
first3 middle3 last3 9/30/2020 55 3 32
first3 middle3 last3 10/31/2020 0 0 0
first3 middle3 last3 11/30/2020 46 45 80
first3 middle3 last3 12/31/2020 17 81 74
first3 middle3 last3 1/31/2021 98 6 55
我尝试对数据帧使用重采样,但未获得所需的输出。 df = df.set_index('Date')。resample('M')。ffill()。reset_index()
解决方法
编辑:比下面的第二个解决方案好得多:
您的逻辑很接近。您可以在.groupby()
之前加入.resample
,在.asfreq()
之后加入。{p>
df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
.groupby(cols)
.resample('M')
.asfreq(fill_value=0)
.drop(cols,axis=1)
.reset_index())
df
Out[128]:
FirstName MiddleName LastName Date Value1 Value2 Value3
0 first1 middle1 last1 2020-01-31 51 80 19
1 first1 middle1 last1 2020-02-29 14 44 56
2 first1 middle1 last1 2020-03-31 0 0 0
3 first1 middle1 last1 2020-04-30 57 96 40
4 first1 middle1 last1 2020-05-31 0 0 0
5 first1 middle1 last1 2020-06-30 58 65 3
6 first1 middle1 last1 2020-07-31 0 0 0
7 first1 middle1 last1 2020-08-31 1 34 4
8 first1 middle1 last1 2020-09-30 0 0 0
9 first1 middle1 last1 2020-10-31 40 38 53
10 first1 middle1 last1 2020-11-30 0 0 0
11 first1 middle1 last1 2020-12-31 93 65 41
12 first1 middle1 last1 2021-01-31 0 0 0
13 first1 middle1 last1 2021-02-28 3 43 0
14 first1 middle1 last1 2021-03-31 0 0 0
15 first1 middle1 last1 2021-04-30 46 61 52
16 first2 middle2 last2 2020-01-31 64 19 33
17 first2 middle2 last2 2020-02-29 28 71 16
18 first2 middle2 last2 2020-03-31 0 0 0
19 first2 middle2 last2 2020-04-30 2 94 78
20 first2 middle2 last2 2020-05-31 78 99 87
21 first2 middle2 last2 2020-06-30 10 70 14
22 first2 middle2 last2 2020-07-31 30 30 59
23 first2 middle2 last2 2020-08-31 55 96 73
24 first2 middle2 last2 2020-09-30 0 0 0
25 first2 middle2 last2 2020-10-31 22 43 23
26 first2 middle2 last2 2020-11-30 12 4 84
27 first2 middle2 last2 2020-12-31 0 0 0
28 first2 middle2 last2 2021-01-31 59 93 1
29 first2 middle2 last2 2021-02-28 19 33 52
30 first2 middle2 last2 2021-03-31 46 12 97
31 first2 middle2 last2 2021-04-30 41 44 59
32 first2 middle2 last2 2021-05-31 67 84 96
33 first2 middle2 last2 2021-06-30 52 69 78
34 first3 middle3 last3 2020-04-30 5 63 30
35 first3 middle3 last3 2020-05-31 45 22 7
36 first3 middle3 last3 2020-06-30 76 2 33
37 first3 middle3 last3 2020-07-31 0 0 0
38 first3 middle3 last3 2020-08-31 81 25 52
39 first3 middle3 last3 2020-09-30 55 3 32
40 first3 middle3 last3 2020-10-31 0 0 0
41 first3 middle3 last3 2020-11-30 46 45 80
42 first3 middle3 last3 2020-12-31 17 81 74
43 first3 middle3 last3 2021-01-31 98 6 55
使用reindex
的方法2(更复杂)
它变得更加复杂,因为您需要在组中填充值。我不确定是否有比以下解决方案更简单的方法:
- 您可以使用
date_range()
创建一个freq='M'
,其中将包含缺少的月末日期值。您稍后将使用它来重新编制索引。 - 您正在组内重新建立索引,因此可以为组创建唯一的标识符,称为
FullName
。重新编制索引时还将传递此新列,以便您在组中重新编制索引。 - 通过利用
MultiIndex
创建由以上#1和#2组成的pd.MultiIndex.from_product()
- 使用
set_index()
并将FirstName
和Date
设置为索引。接下来,您将.reindex
传递通过我们在#3中创建的mmultiindex的数据框。 - 清理数据框,以便使用
FirstName
和MiddleName
用正确的值填充LastName
,.groupby
和.transform(max)
- 最后,清理列的顺序,不包括我先前创建的临时列
FullName
。
您需要使用pd.MultiIndex.from_product()
并通过FirstName
和date_range
的{{1}}日期重新索引。这是关键所在,可让您s
和set_index
填写遗漏的日期。
reindex