缺少日期值

问题描述

我有一个下表列出的条目。表格中有缺失的日期条目,我想用月末的缺失日期值填充。对于第一列,应该使用相同的值填充缺少日期条目的添加行,对于值列,我希望它们填充0。

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

'''

预期产量

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   3/31/2020   0   0   0
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   5/31/2020   0   0   0
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   7/31/2020   0   0   0
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   9/30/2020   0   0   0
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   11/30/2020  0   0   0
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   1/31/2021   0   0   0
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   3/31/2021   0   0   0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   3/31/2020   0   0   0
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   9/30/2020   0   0   0
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   12/31/2020  0   0   0
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   7/31/2020   0   0   0
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   10/31/2020  0   0   0
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

我尝试对数据帧使用重采样,但未获得所需的输出。 df = df.set_index('Date')。resample('M')。ffill()。reset_index()

解决方法

编辑:比下面的第二个解决方案好得多:

您的逻辑很接近。您可以在.groupby()之前加入.resample,在.asfreq()之后加入。{p>

df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
        .groupby(cols)
        .resample('M')
        .asfreq(fill_value=0)
        .drop(cols,axis=1)
        .reset_index())
df
Out[128]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first2    middle2    last2 2020-01-31      64      19      33
17    first2    middle2    last2 2020-02-29      28      71      16
18    first2    middle2    last2 2020-03-31       0       0       0
19    first2    middle2    last2 2020-04-30       2      94      78
20    first2    middle2    last2 2020-05-31      78      99      87
21    first2    middle2    last2 2020-06-30      10      70      14
22    first2    middle2    last2 2020-07-31      30      30      59
23    first2    middle2    last2 2020-08-31      55      96      73
24    first2    middle2    last2 2020-09-30       0       0       0
25    first2    middle2    last2 2020-10-31      22      43      23
26    first2    middle2    last2 2020-11-30      12       4      84
27    first2    middle2    last2 2020-12-31       0       0       0
28    first2    middle2    last2 2021-01-31      59      93       1
29    first2    middle2    last2 2021-02-28      19      33      52
30    first2    middle2    last2 2021-03-31      46      12      97
31    first2    middle2    last2 2021-04-30      41      44      59
32    first2    middle2    last2 2021-05-31      67      84      96
33    first2    middle2    last2 2021-06-30      52      69      78
34    first3    middle3    last3 2020-04-30       5      63      30
35    first3    middle3    last3 2020-05-31      45      22       7
36    first3    middle3    last3 2020-06-30      76       2      33
37    first3    middle3    last3 2020-07-31       0       0       0
38    first3    middle3    last3 2020-08-31      81      25      52
39    first3    middle3    last3 2020-09-30      55       3      32
40    first3    middle3    last3 2020-10-31       0       0       0
41    first3    middle3    last3 2020-11-30      46      45      80
42    first3    middle3    last3 2020-12-31      17      81      74
43    first3    middle3    last3 2021-01-31      98       6      55

使用reindex的方法2(更复杂)

它变得更加复杂,因为您需要在组中填充值。我不确定是否有比以下解决方案更简单的方法:

  1. 您可以使用date_range()创建一个freq='M',其中将包含缺少的月末日期值。您稍后将使用它来重新编制索引。
  2. 您正在组内重新建立索引,因此可以为组创建唯一的标识符,称为FullName。重新编制索引时还将传递此新列,以便您在组中重新编制索引。
  3. 通过利用MultiIndex创建由以上#1和#2组成的pd.MultiIndex.from_product()
  4. 使用set_index()并将FirstNameDate设置为索引。接下来,您将.reindex 传递通过我们在#3中创建的mmultiindex的数据框。
  5. 清理数据框,以便使用FirstNameMiddleName用正确的值填充LastName.groupby.transform(max)
  6. 最后,清理列的顺序,不包括我先前创建的临时列FullName

您需要使用pd.MultiIndex.from_product()并通过FirstNamedate_range的{​​{1}}日期重新索引。这是关键所在,可让您sset_index填写遗漏的日期。

reindex

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...