问题描述
import pandas as pd
df = pd.DataFrame([['NewJersy','2020-08-29'],['NewJersy',12,'2020-08-30'],'2020-08-31'],None,'2020-09-01'],'2020-09-02'],'2020-09-03'],5,'2020-09-04'],'2020-09-05'],'2020-09-06'],['NewYork',8,7,'2020-09-03']],columns=['FName','FVal','GDate'])
print(df)
我想用以前的记录值填充NULL值。例如,列FValue的值从20-09-01到20-09-03为NULL。 NULL值应替换为先前有效值(即20-08-31)的值12。
如果日期2020-08-29的值为空,则应将其替换为零,因为它是第一个日期,并且没有以前的记录。
我尝试了以下代码,但无法正常工作
df ['F'] = df ['F']。fillna(method ='ffill')
在此处检查期望值: Fill Null Values image
谢谢
解决方法
首先,请确保您的DataFrame随时间排序,以防万一:
df = df.sort_values('GDate').reset_index(drop=True)
然后,您必须用0填充第一个值:
if pd.isnull(df.loc[0,'FVal']):
df.loc[0,'FVal'] = df.loc[0,'FVal']
然后像您一样向前填充:
df['FVal'] = df['FVal'].fillna(method='ffill')
请注意,列名是FVal
而不是F
。
不确定这是否是您想要的。但这就是我要做的
>>> import math
>>> for s in df.iterrows():
... if math.isnan(s[1][1]):
... df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 5.0 2020-08-29
10 NewYork 5.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
>>>
,
您可以尝试以下方法:
df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
if (np.isnan(df.FVal.loc[i])) and (i > 0):
if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
df.FVal.loc[i] = df.FVal.loc[i-1]
else:
df.FVal.loc[i] = 0
输出
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 0.0 2020-08-29
10 NewYork 0.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03