根据下一行查找上一行值

问题描述

如果我们知道下一行的值只是行的累积总和，我只是好奇如何找到上一行的值。例如，这里的新死亡人数仅增加了新死亡人数的总和。如何查找数据集中的缺失值？我们可以通过减去来找到，但这是任何编程方式吗？

date        total_cases   new_cases    total_deaths   new_deaths    population  lockdown_date
2020-29-04  1012583.0   24132.0         58355.0           2110.0    54225.446       2020-03-13
2020-04-30  1039909.0   27326.0         60966.0           2611.0    54225.446       2020-03-13
2020-05-01  1069826.0   29917.0         NaN                  NaN    54225.446       2020-03-13
2020-05-02  1103781.0   33955.0         65068.0           2062.0    54225.446       2020-03-13
2020-05-03  1133069.0   29288.0         66385.0           1317.0    54225.446       2020-03-13

解决方法

您可以结合使用shift和fillna重新对齐减去的列，以填充丢失的累积值，然后使用diff来检索新案例：

from io import StringIO

import pandas as pd

txt = """
date        total_cases   new_cases    total_deaths   new_deaths    population  lockdown_date
2020-29-04  1012583.0   24132.0         58355.0           2110.0    54225.446       2020-03-13
2020-04-30  1039909.0   27326.0         60966.0           2611.0    54225.446       2020-03-13
2020-05-01  1069826.0   29917.0         NaN                  NaN    54225.446       2020-03-13
2020-05-02  1103781.0   33955.0         65068.0           2062.0    54225.446       2020-03-13
2020-05-03  1133069.0   29288.0         66385.0           1317.0    54225.446       2020-03-13
"""

df = pd.read_csv(StringIO(txt),sep="\s+")

df_filled = df.assign(
    total_deaths=lambda f: f["total_deaths"].fillna(
        f["total_deaths"].sub(f["new_deaths"]).shift(-1)
    ),new_deaths=lambda f: f["new_deaths"].fillna(f["total_deaths"].diff()),)
print(df_filled)

         date  total_cases  new_cases  total_deaths  new_deaths  population  \
0  2020-29-04    1012583.0    24132.0       58355.0      2110.0   54225.446   
1  2020-04-30    1039909.0    27326.0       60966.0      2611.0   54225.446   
2  2020-05-01    1069826.0    29917.0       63006.0      2040.0   54225.446   
3  2020-05-02    1103781.0    33955.0       65068.0      2062.0   54225.446   
4  2020-05-03    1133069.0    29288.0       66385.0      1317.0   54225.446   

  lockdown_date  
0    2020-03-13  
1    2020-03-13  
2    2020-03-13  
3    2020-03-13  
4    2020-03-13

jupyter-notebook multiple-columns pandas python

根据下一行查找上一行值

问题描述

解决方法

相关问答