在Python中使用Pandas进行十进制日期操作

问题描述

这可能有点愚蠢,但是我尝试搜索示例来使用熊猫操纵数据框中的日期。但是令我困惑的是我的约会日期采用这种格式:

Time A B C D

1.000347257 626.9966431 0   0   -99.98999786
1.001041651 626.9967651 0   0   -99.98999786
1.001736164 627.0130005 0   0   -99.98999786
1.002430558 627.0130005 0   0   -99.98999786
1.003124952 627.0455933 0   0   -99.98999786
1.003819466 627.0618286 0   0   -99.98999786

...

1.998263836 627.7052002 0.3417936265    0.2321419418    0.07069379836
1.998958349 627.7216187 0.3260073066    0.2284916639    0.073251158
1.999652743 627.6726074 0.3180454969    0.2164463699    0.07418025285
2.000347137 627.7371826 0.3161731362    0.2277853489    0.07479456067
2.001041651 627.7365723 0.301556468     0.2394933105    0.07920494676
2.001736164 627.7686157 0.3718534708    0.2506033182    0.07810453326

...

366.996887  625.413574  3.168393    2.114161    2.119713
366.997559  625.413391  3.163851    2.104703    2.117746
366.998261  625.461792  3.184296    2.113827    2.117964
366.998962  625.449463  3.163331    2.117869    2.116489
366.999664  625.510681  3.166895    2.126145    2.110077

这是我存储数据的文件的摘录。有没有一种方法可以使用datetime库将此格式转换为类似2010-10-23的格式?这里是2011年,但未在数据中指定。

谢谢!


我研究了大熊猫的文档,虽然我不太了解,但效果很好。时间采用十进制格式,按天。因此,我只是对其进行定义,并使用时间戳来声明我已经知道的年份。

df['Time'] = pd.to_datetime(
                      df['Time'],unit='D',origin=pd.Timestamp('2011-01-01')
                      )

结果就是我想要的。它需要366天,如下所示:

Time A B C D
2016-01-02 00:00:30.003004800   626.996643  0.000000    0.000000    -99.989998
2016-01-02 00:01:29.998646400   626.996765  0.000000    0.000000    -99.989998
2016-01-02 00:02:30.004569600   627.013000  0.000000    0.000000    -99.989998
2016-01-02 00:03:30.000211200   627.013000  0.000000    0.000000    -99.989998
2016-01-02 00:04:29.995852800   627.045593  0.000000    0.000000    -99.989998
...     ...     ...     ...     ...
2017-01-01 23:55:31.054080000   625.413574  2.706322    2.086675    2.094654
2017-01-01 23:56:29.063040000   625.413391  2.738388    2.082261    2.092784
2017-01-01 23:57:29.707200000   625.461792  2.762815    2.097127    2.091273
2017-01-01 23:58:30.351360000   625.449463  2.698989    2.105750    2.090060
2017-01-01 23:59:30.995520000   625.510681  2.751848    2.109448    2.090664

解决方法

您可以使用pd.to_datetime()将列转换为日期时间:

df.Time = pd.to_datetime(df.Time)

df.head(2)
                             Time            A  B   C            D
0   1970-01-01 00:00:01.000347257   626.996643  0   0   -99.989998
1   1970-01-01 00:00:01.001041651   626.996765  0   0   -99.989998
,

您的列Time似乎是天数。如果您知道年份,则可以使用

将其转换为日期时间列
# 1 - convert the year to nanoseconds since the epoch
# 2 - add the day fraction,after you convert that to nanoseconds as well
# 3 - convert the resulting nanoseconds since the epoch to datetime
year = '2011'
df['datetime'] = pd.to_datetime(pd.to_datetime(year).value + df['Time']*86400*1e9)

这将为您提供例如

df
       Time           A  B  C          D                      datetime
0  1.000347  626.996643  0  0 -99.989998 2011-01-02 00:00:30.003004928
1  1.001042  626.996765  0  0 -99.989998 2011-01-02 00:01:29.998646272
2  1.001736  627.013000  0  0 -99.989998 2011-01-02 00:02:30.004569600
3  1.002431  627.013000  0  0 -99.989998 2011-01-02 00:03:30.000211200
4  1.003125  627.045593  0  0 -99.989998 2011-01-02 00:04:29.995852800
5  1.003819  627.061829  0  0 -99.989998 2011-01-02 00:05:30.001862400