减去给定条件的两个日期列在另一列中？

问题描述

数据框（excel格式）：

     A                      B                        C
1  this 9/20/2020  2:33:59 PM    9/20/2020  2:34:04 PM
2  this 9/17/2020  6:39:19 PM    9/17/2020  6:24:11 PM
3  not  9/22/2020  1:23:45 AM    9/22/2020  1:23:41 AM
4  this

我试图通过计算C-B来找到C和B之间的区别，但是只有在A列=='this'之后才能将这些计算放到新列D中。

最好只在几个小时内就可以了，这里不算空白。 B和C列已正确设置日期和时间格式。然后，我打算使用这些小时数将它们分组为报告的日期范围。

这是我到目前为止所拥有的：

df = pd.read_excel('df.xlsx')
print(df)

# df['D'] = (df['C']-df['B'])

df.loc[df['A'].eq('this'),'D'] = (df['C']-df['B'])

解决方法

也许可以用更优雅的方法解决此问题，但这是使用列表推导的方法。

# Create some data
df_arr = [["this",4.0,6.0],["this",5.0,9.0],["not",10.0,12.0],14.0,20.0]]

# Initiate DataFrame
df = pd.DataFrame(df_arr,columns = ["A","B","C"])

DataFrame：

┌───┬──────┬──────┬──────┐
│   │  A   │  B   │  C   │
├───┼──────┼──────┼──────┤
│ 0 │ this │  4.0 │  6.0 │
│ 1 │ this │  5.0 │  9.0 │
│ 2 │ not  │ 10.0 │ 12.0 │
│ 3 │ this │ 14.0 │ 20.0 │
└───┴──────┴──────┴──────┘

使用列表理解1或2（两者都输出相同的值）。

解决方案1.直接选择要迭代的列并使用“ zip（）”
解决方案2。使用“ .iterrows（）”遍历DataFrame中的行

import numpy as np
# List comprehension 1
df["D1"] = [(val_c - val_b) if val_a == "this" else
            np.nan for val_a,val_b,val_c in zip(df["A"],df["B"],df["C"])] 

# OR

# List comprehension 2
df["D2"] = [(row[2] - row[1]) if row[0] == "this" else
            np.nan for idx,row in df.iterrows()]

结果：


┌───┬──────┬──────┬──────┬─────┬─────┐
│   │  A   │  B   │  C   │ D1  │ D2  │
├───┼──────┼──────┼──────┼─────┼─────┤
│ 0 │ this │  4.0 │  6.0 │ 2.0 │ 2.0 │
│ 1 │ this │  5.0 │  9.0 │ 4.0 │ 4.0 │
│ 2 │ not  │ 10.0 │ 12.0 │ NaN │ NaN │
│ 3 │ this │ 14.0 │ 20.0 │ 6.0 │ 6.0 │
└───┴──────┴──────┴──────┴─────┴─────┘

自然地，当“ A”列等于“ not”时，可以将“ np.nan”值替换为其他值。

尝试numpy.where(condition,[x,y])
其中condition为真，产生x，否则产生y

import pandas as pd
import numpy as np

# load your DataFrame

df['D'] = np.where(df.A == 'this',df.C - df.B,np.nan)

print(df)
      A                   B                   C                 D
0  this 2020-09-20 14:33:59 2020-09-20 14:34:04   0 days 00:00:05
1  this 2020-09-17 18:39:19 2020-09-17 18:24:11 -1 days +23:44:52
2   not 2020-09-22 01:23:45 2020-09-22 01:23:41               NaT

列D中的值变为timedelta（两个日期时间对象之间的差异）

dataframe datetime-format pandas pandas python python-3.x