问题描述
这可能有点愚蠢,但是我尝试搜索示例来使用熊猫操纵数据框中的日期。但是令我困惑的是我的约会日期采用这种格式:
Time A B C D
1.000347257 626.9966431 0 0 -99.98999786
1.001041651 626.9967651 0 0 -99.98999786
1.001736164 627.0130005 0 0 -99.98999786
1.002430558 627.0130005 0 0 -99.98999786
1.003124952 627.0455933 0 0 -99.98999786
1.003819466 627.0618286 0 0 -99.98999786
...
1.998263836 627.7052002 0.3417936265 0.2321419418 0.07069379836
1.998958349 627.7216187 0.3260073066 0.2284916639 0.073251158
1.999652743 627.6726074 0.3180454969 0.2164463699 0.07418025285
2.000347137 627.7371826 0.3161731362 0.2277853489 0.07479456067
2.001041651 627.7365723 0.301556468 0.2394933105 0.07920494676
2.001736164 627.7686157 0.3718534708 0.2506033182 0.07810453326
...
366.996887 625.413574 3.168393 2.114161 2.119713
366.997559 625.413391 3.163851 2.104703 2.117746
366.998261 625.461792 3.184296 2.113827 2.117964
366.998962 625.449463 3.163331 2.117869 2.116489
366.999664 625.510681 3.166895 2.126145 2.110077
这是我存储数据的文件的摘录。有没有一种方法可以使用datetime库将此格式转换为类似2010-10-23的格式?这里是2011年,但未在数据中指定。
谢谢!
我研究了大熊猫的文档,虽然我不太了解,但效果很好。时间采用十进制格式,按天。因此,我只是对其进行定义,并使用时间戳来声明我已经知道的年份。
df['Time'] = pd.to_datetime(
df['Time'],unit='D',origin=pd.Timestamp('2011-01-01')
)
结果就是我想要的。它需要366天,如下所示:
Time A B C D
2016-01-02 00:00:30.003004800 626.996643 0.000000 0.000000 -99.989998
2016-01-02 00:01:29.998646400 626.996765 0.000000 0.000000 -99.989998
2016-01-02 00:02:30.004569600 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:03:30.000211200 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:04:29.995852800 627.045593 0.000000 0.000000 -99.989998
... ... ... ... ...
2017-01-01 23:55:31.054080000 625.413574 2.706322 2.086675 2.094654
2017-01-01 23:56:29.063040000 625.413391 2.738388 2.082261 2.092784
2017-01-01 23:57:29.707200000 625.461792 2.762815 2.097127 2.091273
2017-01-01 23:58:30.351360000 625.449463 2.698989 2.105750 2.090060
2017-01-01 23:59:30.995520000 625.510681 2.751848 2.109448 2.090664
解决方法
您可以使用pd.to_datetime()将列转换为日期时间:
df.Time = pd.to_datetime(df.Time)
df.head(2)
Time A B C D
0 1970-01-01 00:00:01.000347257 626.996643 0 0 -99.989998
1 1970-01-01 00:00:01.001041651 626.996765 0 0 -99.989998
,
您的列Time
似乎是天数。如果您知道年份,则可以使用
# 1 - convert the year to nanoseconds since the epoch
# 2 - add the day fraction,after you convert that to nanoseconds as well
# 3 - convert the resulting nanoseconds since the epoch to datetime
year = '2011'
df['datetime'] = pd.to_datetime(pd.to_datetime(year).value + df['Time']*86400*1e9)
这将为您提供例如
df
Time A B C D datetime
0 1.000347 626.996643 0 0 -99.989998 2011-01-02 00:00:30.003004928
1 1.001042 626.996765 0 0 -99.989998 2011-01-02 00:01:29.998646272
2 1.001736 627.013000 0 0 -99.989998 2011-01-02 00:02:30.004569600
3 1.002431 627.013000 0 0 -99.989998 2011-01-02 00:03:30.000211200
4 1.003125 627.045593 0 0 -99.989998 2011-01-02 00:04:29.995852800
5 1.003819 627.061829 0 0 -99.989998 2011-01-02 00:05:30.001862400