问题描述
我有两个 Pandas DataFrame 对象需要通过多个索引和列连接。
DF1
包含每日数据(索引为 RNK、R_ID、纬度和经度):
Date FFDI
RNK R_ID latitude longitude
1 0 -39.20000 140.80000 1973-04-02 5.40000
1 -39.20000 140.83786 1973-04-02 5.40000
2 -39.20000 140.87572 1973-04-02 5.40000
3 -39.20000 140.91359 1973-04-02 5.40000
4 -39.20000 140.95145 1973-04-02 5.40000
5 -39.20000 140.98930 1973-04-02 5.40000
6 -39.20000 141.02716 1973-04-02 5.40000
7 -39.20000 141.06502 1973-05-31 5.40000
8 -39.20000 141.10289 1973-05-31 5.50000
9 -39.20000 141.14075 1973-05-31 6.00000
10 -39.20000 141.17860 1973-05-31 6.40000
11 -39.20000 141.21646 1973-05-31 6.80000
12 -39.20000 141.25432 1973-05-31 7.70000
13 -39.20000 141.29219 1973-05-31 7.90000
14 -39.20000 141.33005 1973-05-31 7.00000
15 -39.20000 141.36790 1973-05-31 6.60000
16 -39.20000 141.40576 1973-05-31 6.10000
17 -39.20000 141.44362 1973-05-31 5.00000
18 -39.20000 141.48149 1973-05-31 4.40000
19 -39.20000 141.51935 1972-04-21 4.40000
20 -39.20000 141.55721 1972-04-21 4.40000
21 -39.20000 141.59506 1972-04-21 4.50000
22 -39.20000 141.63292 1972-04-21 4.60000
23 -39.20000 141.67079 1972-04-21 4.70000
24 -39.20000 141.70865 1972-04-21 4.70000
25 -39.20000 141.74651 1972-04-21 4.70000
26 -39.20000 141.78436 1972-04-21 4.70000
27 -39.20000 141.82222 1972-04-21 4.70000
28 -39.20000 141.86009 1972-04-21 4.70000
29 -39.20000 141.89795 1972-04-21 4.70000
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
5 36082 -33.90000 148.90205 1972-12-24 35.70000
36083 -33.90000 148.93991 1974-11-12 36.30000
36084 -33.90000 148.97778 1974-11-12 35.90000
36085 -33.90000 149.01564 1973-11-20 36.80000
36086 -33.90000 149.05350 1973-11-20 37.00000
36087 -33.90000 149.09135 1974-11-12 35.60000
36088 -33.90000 149.12921 1973-01-03 35.90000
36089 -33.90000 149.16708 1973-01-03 34.40000
36090 -33.90000 149.20494 1973-01-03 32.90000
36091 -33.90000 149.24280 1973-01-03 32.20000
36092 -33.90000 149.28065 1973-01-03 32.30000
36093 -33.90000 149.31851 1973-01-03 32.20000
36094 -33.90000 149.35638 1973-01-03 30.20000
36095 -33.90000 149.39424 1973-11-20 28.60000
36096 -33.90000 149.43210 1973-11-20 28.70000
36097 -33.90000 149.46996 1973-11-20 29.10000
36098 -33.90000 149.50781 1973-11-20 30.10000
36099 -33.90000 149.54568 1973-11-20 30.80000
36100 -33.90000 149.58354 1973-01-09 30.60000
36101 -33.90000 149.62140 1973-01-09 30.10000
36102 -33.90000 149.65926 1973-01-09 29.50000
36103 -33.90000 149.69711 1973-01-09 29.20000
36104 -33.90000 149.73499 1973-01-09 29.90000
36105 -33.90000 149.77284 1973-01-09 29.90000
36106 -33.90000 149.81070 1973-01-09 27.60000
36107 -33.90000 149.84856 1973-01-09 24.40000
36108 -33.90000 149.88641 1973-01-09 23.80000
36109 -33.90000 149.92429 1973-01-09 23.80000
36110 -33.90000 149.96214 1973-01-09 24.10000
36111 -33.90000 150.00000 1973-01-09 25.30000
DF2
每小时数据(索引 = R_ID):
latitude longitude time T_SFC
R_ID
0 -39.20000 140.80000 1972-01-20 00:00:00 15.80000
0 -39.20000 140.80000 1972-01-20 01:00:00 15.90000
0 -39.20000 140.80000 1972-01-20 02:00:00 16.00000
0 -39.20000 140.80000 1972-01-20 03:00:00 16.20000
0 -39.20000 140.80000 1972-01-20 04:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 05:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 06:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 07:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 08:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 09:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 10:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 11:00:00 16.40000
0 -39.20000 140.80000 1972-01-20 12:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 13:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 14:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 15:00:00 16.70000
0 -39.20000 140.80000 1972-01-20 16:00:00 16.70000
0 -39.20000 140.80000 1972-01-20 17:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 18:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 19:00:00 16.60000
0 -39.20000 140.80000 1972-01-20 20:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 21:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 22:00:00 16.50000
0 -39.20000 140.80000 1972-01-20 23:00:00 16.40000
0 -39.20000 140.80000 1972-01-21 00:00:00 16.40000
0 -39.20000 140.80000 1972-01-21 01:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 02:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 03:00:00 16.30000
0 -39.20000 140.80000 1972-01-21 04:00:00 16.10000
0 -39.20000 140.80000 1972-01-21 05:00:00 16.00000
... ... ... ... ...
36111 -38.87551 141.14075 1974-12-30 18:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 19:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 20:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 21:00:00 14.10000
36111 -38.87551 141.14075 1974-12-30 22:00:00 14.20000
36111 -38.87551 141.14075 1974-12-30 23:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 00:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 01:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 02:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 03:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 04:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 05:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 06:00:00 14.60000
36111 -38.87551 141.14075 1974-12-31 07:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 08:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 09:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 10:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 11:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 12:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 13:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 14:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 15:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 16:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 17:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 18:00:00 14.30000
36111 -38.87551 141.14075 1974-12-31 19:00:00 14.40000
36111 -38.87551 141.14075 1974-12-31 20:00:00 14.50000
36111 -38.87551 141.14075 1974-12-31 21:00:00 14.60000
36111 -38.87551 141.14075 1974-12-31 22:00:00 14.70000
36111 -38.87551 141.14075 1974-12-31 23:00:00 14.80000
DF1
有一个日期列,其每日值从 1972-01-20 到 1974-12-31,而 DF2
有一个时间列,其中包含从 1972-01-20T00:00:00到 1974-12-31T23:00:00。 DF1
按 RNK(等级)和 FFDI 排序,而 DF2
按 R_ID 和时间排序。一个 R_ID 是一个参考 ID,对应于一对唯一的纬度和经度。 DF2
将加入具有相同 R_ID 和 DF1
的时间列所属的相同日期的 DF2
。也就是说,DF1
中的每一行(天)将有来自 DF2
的 24(小时)行,并且具有相同的天值。
输出 df 将如下所示:
time T_SFC
RNK R_ID latitude longitude Date FFDI
1 0 -39.20000 140.80000 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
1 -39.20000 140.83786 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
2 -39.20000 140.87572 1973-04-02 5.40000 1973-04-02 00:00:00 13.8
1973-04-02 01:00:00 13.9
1973-04-02 02:00:00 13.0
1973-04-02 03:00:00 13.2
1973-04-02 04:00:00 13.6
... ... ... ...
... ... ... ...
2 0 -39.20000 140.80000 1974-03-07 5.60000 1974-03-07 00:00:00 15.8
1974-03-07 01:00:00 15.9
1974-03-07 02:00:00 16.0
1974-03-07 03:00:00 16.2
1974-03-07 04:00:00 16.6
... ... ... ...
1 -39.20000 140.83786 1973-03-09 5.40000 1973-03-09 00:00:00 15.8
1973-03-09 01:00:00 15.9
1973-03-09 02:00:00 16.0
1973-03-09 03:00:00 15.2
1973-03-09 04:00:00 15.6
... ... ... ...
... ... ... ...
... ... ... ...
5 36082 -33.90000 148.90205 1972-12-24 35.70000 1972-12-24 00:00:00 19.8
1972-12-24 01:00:00 19.1
1972-12-24 02:00:00 22.0
1972-12-24 03:00:00 24.2
1972-12-24 04:00:00 21.6
... ... ... ...
... ... ... ...
36111 -33.90000 150.00000 1973-01-09 25.30000 1973-01-09 00:00:00 19.8
1973-01-09 01:00:00 19.1
1973-01-09 02:00:00 22.0
1973-01-09 03:00:00 24.2
1973-01-09 04:00:00 21.6
... ... ... ...
1973-01-09 23:00:00 19.1
4,333,440 rows x 2 columns
按照@politinsa 的回答,我试过了
# Add a new column Date and save date part of the time column to it.
df2['Date'] = df2['time'].dt.date.astype('datetime64[ns]')
df_joined = pd.merge(df1,df2,on=['REF_ID','Date'],how='inner')
输出的问题是 df1 的多索引没有保留,输出 df 中缺少 RNK。
print(df_joined)
time FFDI latitude longitude T_SFC time_original
REF_ID
0 1973-04-02 5.40000 -39.20000 140.80000 16.40000 1973-04-02 00:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 16.00000 1973-04-02 01:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.70000 1973-04-02 02:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 03:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 04:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 05:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 06:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 07:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 08:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 09:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 10:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 11:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 12:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.20000 1973-04-02 13:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.00000 1973-04-02 14:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.10000 1973-04-02 15:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.30000 1973-04-02 16:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 17:00:00
0 1973-04-02 5.40000 -39.20000 140.80000 15.40000 1973-04-02 18:00:00
... ... ... ...
12000 rows × 6 columns
解决方法
您可以在 DF2 中创建一列包含日期(而不是日期时间),即在第 1973-04-02 01:00:00
行,您将有一列 Date
包含 {{ 1}}。
然后使用经典的内连接 (1973-04-02
) 就可以了。