问题描述
我从一个 excel 文件中提取了一个数据框,该文件有一个 datetime 列,但有一些 excel Date 格式的值,如下所示:
import pandas as pd
import numpy as np
import xlrd
rnd = np.random.randint(0,1000,size=(10,1))
test = pd.DataFrame(data=rnd,index=range(0,10),columns=['rnd'])
test['Date'] = pd.date_range(start='1/1/1979',periods=len(test),freq='D')
r1 = np.random.randint(0,5)
r2 = np.random.randint(6,10)
test.loc[r1,'Date'] = 44305
test.loc[r2,'Date'] = 44287
test
rnd Date
0 56 1979-01-01 00:00:00
1 557 1979-01-02 00:00:00
2 851 1979-01-03 00:00:00
3 553 44305
4 258 1979-01-05 00:00:00
5 946 1979-01-06 00:00:00
6 930 1979-01-07 00:00:00
7 805 1979-01-08 00:00:00
8 362 44287
9 705 1979-01-10 00:00:00
当我尝试单独使用 xlrd.xldate_as_datetime 函数转换错误日期时,我得到一个格式正确的系列:
# Identifying the index of dates in int format
idx_ints = test[test['Date'].map(lambda x: isinstance(x,int))].index
test.loc[idx_ints,'Date'].map(lambda x: xlrd.xldate_as_datetime(x,0))
3 2021-04-19
8 2021-04-01
Name: Date,dtype: datetime64[ns]
但是,当我尝试将更改应用到位时,我得到了截然不同的 int:
test.loc[idx_ints,'Date'] = test.loc[idx_ints,0))
test
rnd Date
0 56 1979-01-01 00:00:00
1 557 1979-01-02 00:00:00
2 851 1979-01-03 00:00:00
3 553 1618790400000000000
4 258 1979-01-05 00:00:00
5 946 1979-01-06 00:00:00
6 930 1979-01-07 00:00:00
7 805 1979-01-08 00:00:00
8 362 1617235200000000000
9 705 1979-01-10 00:00:00
任何想法,或者我的日期整数转换问题的替代解决方案,谢谢!
解决方法
从我链接的答案中反转逻辑,这适用于您的测试 df:
# where you have numeric values,i.e. "excel datetime format":
nums = pd.to_numeric(test['Date'],errors='coerce') # timestamps will give NaN here
# now first convert the excel dates:
test.loc[nums.notna(),'datetime'] = pd.to_datetime(nums[nums.notna()],unit='d',origin='1899-12-30')
# ...and the other,"parseable" timestamps:
test.loc[nums.isna(),'datetime'] = pd.to_datetime(test['Date'][nums.isna()])
test
rnd Date datetime
0 840 44305 2021-04-19
1 298 1979-01-02 00:00:00 1979-01-02
2 981 1979-01-03 00:00:00 1979-01-03
3 806 1979-01-04 00:00:00 1979-01-04
4 629 1979-01-05 00:00:00 1979-01-05
5 540 1979-01-06 00:00:00 1979-01-06
6 499 1979-01-07 00:00:00 1979-01-07
7 155 1979-01-08 00:00:00 1979-01-08
8 208 44287 2021-04-01
9 737 1979-01-10 00:00:00 1979-01-10
如果您的输入已经有日期时间对象而不是时间戳字符串,您可以跳过转换并将值转移到我认为的新列中。