如何通过在熊猫数据框中迭代来比较两个日期并创建一个新列

问题描述

我有一个包含客户交易的 Pandas 数据框,如下所示,并创建了一个名为“标签”的列,其中包含 2 个不同的值

  • 在上一笔交易结束日期之前执行的新交易

  • 在上一笔交易结束日期之后进行的新交易

输入

Transaction ID    Transaction Start Date  Transaction End Date 

      1               23-jun-2014              15-Jul-2014

      2               14-jul-2014              8-Aug-2014        

      3               13-Aug-2014              22-Aug-2014        

      4               21-Aug-2014              28-Aug-2014      

      5               29-Aug-2014              05-Sep-2014

      6               06-Sep-2014              15-Sep-2014

期望输出

Transaction ID    Transaction Start Date  Transaction End Date  Label

  1               23-jun-2014              15-Jul-2014

  2               14-jul-2014              8-Aug-2014       New Transaction performed before end date of prevIoUs transaction

  3               13-Aug-2014              22-Aug-2014      New Transaction after the end date of prevIoUs transaction.    

  4               21-Aug-2014              28-Aug-2014      New Transaction performed before the end date of prevIoUs transaction.

  5               29-Aug-2014              05-Sep-2014      New Transaction after the end date of prevIoUs transaction.

  6               06-Sep-2014              15-Sep-2014      New Transaction after the end date of prevIoUs transaction.

解决方法

使用 numpy.whereSeries.shift

import numpy as np

df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()),'New Transaction performed before end date of previous transaction','New Transaction after the end date of previous transaction.')
,

首先使用 to_datetime,然后使用 numpy.whereSeries.lt 形成由 Series.shift 减少压缩移位的值,最后将第一个值设置为空字符串:

df['Transaction End Date'] = pd.to_datetime(df['Transaction End Date'])
df['Transaction Start Date'] = pd.to_datetime(df['Transaction Start Date'])

df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()),'New Transaction after the end date of previous transaction.')
df.loc[0,'Label'] = ''

替代解决方案:

m = df['Transaction Start Date'].lt(df['Transaction End Date'].shift())

df['Label'] = [''] + np.where(m,'New Transaction after the end date of previous transaction.')[1:].tolist()

print (df)
   Transaction ID Transaction Start Date Transaction End Date  \
0               1             2014-06-23           2014-07-15   
1               2             2014-07-14           2014-08-08   
2               3             2014-08-13           2014-08-22   
3               4             2014-08-21           2014-08-28   
4               5             2014-08-29           2014-09-05   
5               6             2014-09-06           2014-09-15   

                                               Label  
                                                     
1  New Transaction performed before end date of p...  
2  New Transaction after the end date of previous...  
3  New Transaction performed before end date of p...  
4  New Transaction after the end date of previous...  
5  New Transaction after the end date of previous...