问题描述
我有类似的数据:
00052600150.00942615
00052601000.01014910
00052601050.02709672
00052601100.11454732
00052601150.23151254
00052601200.36262522
00052601250.66432348
00052601301.07723763
00052601351.26019487
00052601401.20568581
前10位数字代表时间步YYMMDDhhmm,后跟数字
应该是0005260010,0.00799872,其中第一个块是一个时间步,第二个块是一个度量。
我曾经尝试过使用熊猫读取数据并将其转换为str,但随后我丢失了前导零?有没有一种方法可以用数字分隔浮点数?
问候
解决方法
带有大熊猫的正则表达式可以分隔列而无需分度符
# sample data
df = pd.DataFrame({'A': [
'00052600150.00942615','00052601000.01014910','00052601050.02709672','00052601100.11454732','00052601150.23151254','00052601200.36262522','00052601250.66432348','00052601301.07723763','00052601351.26019487','00052601401.20568581',]})
df3 = df['A'].str.extract(
r'(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(\d\.\d*)',expand=True)
df3.columns = ['Year','Month','Day','Hour','Minute','Reading']
print(df3)
输出
Year Month Day Hour Minute Reading
0 00 05 26 00 15 0.00942615
1 00 05 26 01 00 0.01014910
2 00 05 26 01 05 0.02709672
3 00 05 26 01 10 0.11454732
4 00 05 26 01 15 0.23151254
5 00 05 26 01 20 0.36262522
6 00 05 26 01 25 0.66432348
7 00 05 26 01 30 1.07723763
8 00 05 26 01 35 1.26019487
9 00 05 26 01 40 1.20568581
,
您可以将列读为3
,然后按位置划分值
str
出局:
df = pd.read_csv('yourfile.csv',header=None,dtype='str',names=['col1'])
df['time'] = pd.to_datetime(df.col1.str[:10],unit='s')
df['value'] = (df.col1.str[10:]).astype('float')
df