问题描述
| Month | day | hour | Temperature |
|-----------|-----|------|-------------|
| September | 01 | 0:00 | 19,11 |
| September | 01 | 1:00 | 18,67 |
| September | 01 | 2:00 | 18,22 |
| September | 01 | 3:00 | 17,77 |
convert to:
| Month | day | hour | Temperature |
|-----------|-----|------|-------------|
| September | 01 | 0:00 | T = 19,11 |
| September | 01 | 0:15 | T2 = T + (18,67 - 19,11)/ 4 |
| September | 01 | 0:30 | T3 = T2 + (18,11)/4 |
| September | 01 | 0:45 | T4 = T3 + (18,11)/4 |
| September | 01 | 1:00 | T = 18,67 |
| September | 01 | 1:15 | T2 = T + (18,22 - 18,67)/ 4 |
| September | 01 | 1:30 | T3 = T2 + (18,67)/4 |
| September | 01 | 1:45 | T4 = T3 + (18,67)/4 |
| September | 01 | 2:00 | T = 18,22 |
。 . .
我有一个 excel 文件,想在 python 中进行这些更改。最初我将数据集上传到数据框。 有人可以帮我吗?
解决方法
我会给你一个示例代码:
x = df.Temperature.str.split(",",expand=True)
x:
0 1
0 19 11
1 18 67
2 18 22
3 17 77
y = x[0].astype(int).diff().div(4).fillna(x.iloc[0,0]).astype(float).cumsum()
是:
0 19.00
1 18.75
2 18.75
3 18.50
Name: 0,dtype: float64
对其他列也这样做,然后将它们合并在一起得到 "<num1>,<num2>"
第一阶段:重新采样:
df[['temp1','temp2']] = df.Temperature.str.split(",expand=True)
df['temp1'] = df['temp1'].astype(int)
df['temp2'] = df['temp2'].astype(int)
u = pd.to_datetime(df['hour'],format='%H:%M')#.dt.hour
df['hr'] = u.dt.hour
df = df.set_index(u)
df1 = df.resample('900s').pad()
df1:
第二阶段
<to be continued>
编辑 2:
df['hour'] = pd.to_datetime(df['hour'],format='%H:%M')
df.set_index('hour',inplace=True)
v = df.resample('15T').bfill().reset_index()
v[['temp1','temp2']] = v.Temperature.str.split(",expand=True)
v['temp1'] = v['temp1'].astype(int)
v['temp2'] = v['temp2'].astype(int)
t = v.groupby(v['hour'].dt.hour)
def calc(val1,val2):
diff1 = (val1['temp1']-val2['temp1'])
diff1.iloc[0]= val1['temp1'].iloc[0]*4
diff2 = (val1['temp2']-val2['temp2'])
diff2.iloc[0]= val1['temp2'].iloc[0]*4
t1_group = diff1.div(4).cumsum()
t2_group = diff2.div(4).cumsum()
return list(zip(t1_group,t2_group))
concat_res = []
for _,gr in t:
concat_res.append(calc(gr,gr.iloc[0]))
flatten = lambda t: [item for sublist in t for item in sublist]
v['Temperature'] = flatten(concat_res)
v = v.drop(['temp1','temp2'],axis=1)
v:
,有 pandas
内置工具可以执行此操作;主要障碍是将数据转换为更友好的格式。这是“最坏情况”,其中时间戳在列之间分隔,温度使用逗号作为小数,并且一切都是字符串:
import pandas as pd
df = pd.DataFrame({'month': 'September','day': '01','hour': ['0:00','1:00','2:00','3:00'],'temperature': ['19,11','18,67',22','17,77']})
# month day hour temperature
# 0 September 01 0:00 19,11
# 1 September 01 1:00 18,67
# 2 September 01 2:00 18,22
# 3 September 01 3:00 17,77
以下是如何使用 pd.to_datetime
将时间信息转换为日期时间对象:
dates = df['hour'] + ',' + df['month'] + ' ' + df['day'] + ',' + '2020'
dates = pd.to_datetime(dates)
以下是将温度转换为浮点数的方法:
temps = df['temperature'].str.replace(',','.').astype(float)
然后,您可以创建一个新的 DataFrame,其中仅包含日期和温度、resample
和 interpolate
以获取推算温度:
df = pd.DataFrame({'temperature': temps.values},index=dates)
result = df.resample('15T').interpolate()
结果:
temperature
2020-09-01 00:00:00 19.1100
2020-09-01 00:15:00 19.0000
2020-09-01 00:30:00 18.8900
2020-09-01 00:45:00 18.7800
2020-09-01 01:00:00 18.6700
2020-09-01 01:15:00 18.5575
2020-09-01 01:30:00 18.4450
2020-09-01 01:45:00 18.3325
2020-09-01 02:00:00 18.2200
2020-09-01 02:15:00 18.1075
2020-09-01 02:30:00 17.9950
2020-09-01 02:45:00 17.8825
2020-09-01 03:00:00 17.7700
如果您想将时间信息返回到单独的列,您可以执行以下操作:
formatted = result.index.strftime('%B,%d,%H:%M').str.split(',').to_list()
result[['month','day','hour']] = formatted
result = result.reset_index(drop=True)
result
现在是:
temperature month day hour
0 19.1100 September 01 00:00
1 19.0000 September 01 00:15
2 18.8900 September 01 00:30
3 18.7800 September 01 00:45
4 18.6700 September 01 01:00
5 18.5575 September 01 01:15
6 18.4450 September 01 01:30
7 18.3325 September 01 01:45
8 18.2200 September 01 02:00
9 18.1075 September 01 02:15
10 17.9950 September 01 02:30
11 17.8825 September 01 02:45
12 17.7700 September 01 03:00