python – 当缺少多天数据时,用NaN填充数据帧

我有一个pandas数据帧,我插入它来获取每日数据帧.原始数据框如下所示:

               col_1      vals 
2017-10-01  0.000000  0.112869 
2017-10-02  0.017143  0.112869 
2017-10-12  0.003750  0.117274 
2017-10-14  0.000000  0.161556 
2017-10-17  0.000000  0.116264   

在插值数据帧中,我想将数据值更改为NaN,其中日期差距超过5天.例如.在上述数据框中,2017-10-02和2017-10-12之间的差距超过5天,因此在插值数据框中,应删除这两个日期之间的所有值.我不知道怎么做,也许combine_first?

–EDIT:插值数据帧如下所示:

            col_1      vals 
2017-10-01  0.000000  0.112869 
2017-10-02  0.017143  0.112869 
2017-10-03  0.015804  0.113309 
2017-10-04  0.014464  0.113750 
2017-10-05  0.013125  0.114190 
2017-10-06  0.011786  0.114631 
2017-10-07  0.010446  0.115071 
2017-10-08  0.009107  0.115512 
2017-10-09  0.007768  0.115953 
2017-10-10  0.006429  0.116393 
2017-10-11  0.005089  0.116834 
2017-10-12  0.003750  0.117274 
2017-10-13  0.001875  0.139415 
2017-10-14  0.000000  0.161556 
2017-10-15  0.000000  0.146459 
2017-10-16  0.000000  0.131361 
2017-10-17  0.000000  0.116264

预期产量:

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

解决方法:

我首先要确定缺口超过5天的位置.从那里,我生成一个数组,确定这些差距之间的组.最后,我将使用groupby转向每日频率并进行插值.

# convenience: assign string to variable for easier access
daytype = 'timedelta64[D]'

# define five days for use when evaluating size of gaps
five = np.array(5, dtype=daytype)

# get the size of gaps
deltas = np.diff(df.index.values).astype(daytype)

# identify groups between gaps
groups = np.append(False, deltas > five).cumsum()

# handy function to turn to daily frequency and interpolate
to_daily = lambda x: x.asfreq('D').interpolate()

# and finally...
df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

如果您想提供自己的插值方法.您可以像这样修改上面的内容

daytype = 'timedelta64[D]'
five = np.array(5, dtype=daytype)
deltas = np.diff(df.index.values).astype(daytype)
groups = np.append(False, deltas > five).cumsum()

# custom interpolation function that takes a dataframe
def my_interpolate(df):
    """This can be whatever you want.
    I just provided what will result
    in the same thing as before."""
    return df.interpolate()

to_daily = lambda x: x.asfreq('D').pipe(my_interpolate)

df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...