比较包含日期和时间的数据框中的两列并给出另一列中的差异

问题描述

我有一个这样的数据框:

         datetime1                datetime2             
0   2021-05-09 19:52:14      2021-05-09 20:52:14  
1   2021-05-09 19:52:14      2021-05-09 21:52:14  

我想比较它们并创建一个包含它们之间差异的新列:

理想的输出如下:

         datetime1                datetime2              Difference in H:m:s
0   2021-05-09 19:52:14      2021-05-09 20:52:14                  01:00:00
1   2021-05-09 19:52:14      2021-05-09 21:52:14                  02:00:00

编辑:

@Andrej 当我在 datetime1 和 2 中都有时间戳时,你给我的解决方案工作得很好。如果我有一个像下面这样的 df,它就会失败,因为它没有什么可比较的

df1:

         datetime1                datetime2             
0   2021-05-09 19:52:14      2021-05-09 20:52:14  
1   2021-05-09 19:52:14      2021-05-09 21:52:14 
2           NaN                      NaN
3  2021-05-09 16:30:14               NaN
4           NaN                      NaN
5  2021-05-09 12:30:14        2021-05-09 14:30:14

df2(理想的输出):

         datetime1            datetime2        Difference in H:m:s    Compared with datetime.Now()
0   2021-05-09 19:52:14  2021-05-09 20:52:14         01:00:00           NaN
1   2021-05-09 19:52:14  2021-05-09 21:52:14         02:00:00           NaN
2           NaN               NaN                      NaN              NaN
3   2021-05-09 16:30:14       NaN                      NaN       e.g(04:00:00)
4           NaN               NaN                      NaN              NaN
5  2021-05-09 12:30:14   2021-05-09 14:30:14         02:00:00           NaN

在实际情况中,我有一个情况,我在 datetime1 和 datetime2 中没有值,或者我在 datatime1 中有值但在 datatime2 中没有,所以有没有可能的方法在“差异”中获取 NaN " 如果 datetime1 和 2 中没有时间戳,并且只有 datetime1 中有时间戳,则获取与 datetime.Now() 相比的差异并将其放在另一列中。

解决方法

试试:

def strfdelta(tdelta,fmt):
    d = {"days": tdelta.days}
    d["hours"],rem = divmod(tdelta.seconds,3600)
    d["minutes"],d["seconds"] = divmod(rem,60)
    return fmt.format(**d)


# if datetime1/datetime2 aren't already datetime,apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])

df["Difference in H:m:s"] = df.apply(
    lambda x: strfdelta(
        x["datetime2"] - x["datetime1"],"{hours:02d}:{minutes:02d}:{seconds:02d}",),axis=1,)
print(df)

打印:

            datetime1           datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14            01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14            02:00:00

编辑:处理NaN

# if datetime1/datetime2 aren't already datetime,)
    if pd.notna(x["datetime1"]) and pd.notna(x["datetime2"])
    else np.nan,)

df["Compared with datetime.now()"] = df.apply(
    lambda x: strfdelta(
        pd.Timestamp.now() - x["datetime1"],)
    if pd.notna(x["datetime1"]) & pd.isna(x["datetime2"])
    else np.nan,)

print(df)

打印:

            datetime1           datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14            01:00:00                          NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14            02:00:00                          NaN
2                 NaT                 NaT                 NaN                          NaN
3 2021-05-09 16:30:14                 NaT                 NaN                     03:00:20
4                 NaT                 NaT                 NaN                          NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14            02:00:00                          NaN