问题描述
如果我们知道下一行的值只是行的累积总和,我只是好奇如何找到上一行的值。例如,这里的新死亡人数仅增加了新死亡人数的总和。如何查找数据集中的缺失值?我们可以通过减去来找到,但这是任何编程方式吗?
date total_cases new_cases total_deaths new_deaths population lockdown_date
2020-29-04 1012583.0 24132.0 58355.0 2110.0 54225.446 2020-03-13
2020-04-30 1039909.0 27326.0 60966.0 2611.0 54225.446 2020-03-13
2020-05-01 1069826.0 29917.0 NaN NaN 54225.446 2020-03-13
2020-05-02 1103781.0 33955.0 65068.0 2062.0 54225.446 2020-03-13
2020-05-03 1133069.0 29288.0 66385.0 1317.0 54225.446 2020-03-13
解决方法
您可以结合使用shift
和fillna
重新对齐减去的列,以填充丢失的累积值,然后使用diff
来检索新案例:
from io import StringIO
import pandas as pd
txt = """
date total_cases new_cases total_deaths new_deaths population lockdown_date
2020-29-04 1012583.0 24132.0 58355.0 2110.0 54225.446 2020-03-13
2020-04-30 1039909.0 27326.0 60966.0 2611.0 54225.446 2020-03-13
2020-05-01 1069826.0 29917.0 NaN NaN 54225.446 2020-03-13
2020-05-02 1103781.0 33955.0 65068.0 2062.0 54225.446 2020-03-13
2020-05-03 1133069.0 29288.0 66385.0 1317.0 54225.446 2020-03-13
"""
df = pd.read_csv(StringIO(txt),sep="\s+")
df_filled = df.assign(
total_deaths=lambda f: f["total_deaths"].fillna(
f["total_deaths"].sub(f["new_deaths"]).shift(-1)
),new_deaths=lambda f: f["new_deaths"].fillna(f["total_deaths"].diff()),)
print(df_filled)
date total_cases new_cases total_deaths new_deaths population \
0 2020-29-04 1012583.0 24132.0 58355.0 2110.0 54225.446
1 2020-04-30 1039909.0 27326.0 60966.0 2611.0 54225.446
2 2020-05-01 1069826.0 29917.0 63006.0 2040.0 54225.446
3 2020-05-02 1103781.0 33955.0 65068.0 2062.0 54225.446
4 2020-05-03 1133069.0 29288.0 66385.0 1317.0 54225.446
lockdown_date
0 2020-03-13
1 2020-03-13
2 2020-03-13
3 2020-03-13
4 2020-03-13