Pandas - 通过多个索引和列连接两个 DataFrame

问题描述

我有两个 Pandas DataFrame 对象需要通过多个索引和列连接。

DF1 包含每日数据(索引为 RNK、R_ID、纬度和经度):

                                Date        FFDI
RNK R_ID latitude   longitude               
1   0   -39.20000   140.80000   1973-04-02  5.40000
    1   -39.20000   140.83786   1973-04-02  5.40000
    2   -39.20000   140.87572   1973-04-02  5.40000
    3   -39.20000   140.91359   1973-04-02  5.40000
    4   -39.20000   140.95145   1973-04-02  5.40000
    5   -39.20000   140.98930   1973-04-02  5.40000
    6   -39.20000   141.02716   1973-04-02  5.40000
    7   -39.20000   141.06502   1973-05-31  5.40000
    8   -39.20000   141.10289   1973-05-31  5.50000
    9   -39.20000   141.14075   1973-05-31  6.00000
    10  -39.20000   141.17860   1973-05-31  6.40000
    11  -39.20000   141.21646   1973-05-31  6.80000
    12  -39.20000   141.25432   1973-05-31  7.70000
    13  -39.20000   141.29219   1973-05-31  7.90000
    14  -39.20000   141.33005   1973-05-31  7.00000
    15  -39.20000   141.36790   1973-05-31  6.60000
    16  -39.20000   141.40576   1973-05-31  6.10000
    17  -39.20000   141.44362   1973-05-31  5.00000
    18  -39.20000   141.48149   1973-05-31  4.40000
    19  -39.20000   141.51935   1972-04-21  4.40000
    20  -39.20000   141.55721   1972-04-21  4.40000
    21  -39.20000   141.59506   1972-04-21  4.50000
    22  -39.20000   141.63292   1972-04-21  4.60000
    23  -39.20000   141.67079   1972-04-21  4.70000
    24  -39.20000   141.70865   1972-04-21  4.70000
    25  -39.20000   141.74651   1972-04-21  4.70000
    26  -39.20000   141.78436   1972-04-21  4.70000
    27  -39.20000   141.82222   1972-04-21  4.70000
    28  -39.20000   141.86009   1972-04-21  4.70000
    29  -39.20000   141.89795   1972-04-21  4.70000
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ...
5   36082   -33.90000   148.90205   1972-12-24  35.70000
    36083   -33.90000   148.93991   1974-11-12  36.30000
    36084   -33.90000   148.97778   1974-11-12  35.90000
    36085   -33.90000   149.01564   1973-11-20  36.80000
    36086   -33.90000   149.05350   1973-11-20  37.00000
    36087   -33.90000   149.09135   1974-11-12  35.60000
    36088   -33.90000   149.12921   1973-01-03  35.90000
    36089   -33.90000   149.16708   1973-01-03  34.40000
    36090   -33.90000   149.20494   1973-01-03  32.90000
    36091   -33.90000   149.24280   1973-01-03  32.20000
    36092   -33.90000   149.28065   1973-01-03  32.30000
    36093   -33.90000   149.31851   1973-01-03  32.20000
    36094   -33.90000   149.35638   1973-01-03  30.20000
    36095   -33.90000   149.39424   1973-11-20  28.60000
    36096   -33.90000   149.43210   1973-11-20  28.70000
    36097   -33.90000   149.46996   1973-11-20  29.10000
    36098   -33.90000   149.50781   1973-11-20  30.10000
    36099   -33.90000   149.54568   1973-11-20  30.80000
    36100   -33.90000   149.58354   1973-01-09  30.60000
    36101   -33.90000   149.62140   1973-01-09  30.10000
    36102   -33.90000   149.65926   1973-01-09  29.50000
    36103   -33.90000   149.69711   1973-01-09  29.20000
    36104   -33.90000   149.73499   1973-01-09  29.90000
    36105   -33.90000   149.77284   1973-01-09  29.90000
    36106   -33.90000   149.81070   1973-01-09  27.60000
    36107   -33.90000   149.84856   1973-01-09  24.40000
    36108   -33.90000   149.88641   1973-01-09  23.80000
    36109   -33.90000   149.92429   1973-01-09  23.80000
    36110   -33.90000   149.96214   1973-01-09  24.10000
    36111   -33.90000   150.00000   1973-01-09  25.30000

DF2 每小时数据(索引 = R_ID):

     latitude   longitude    time                T_SFC
R_ID                
0   -39.20000   140.80000   1972-01-20 00:00:00 15.80000
0   -39.20000   140.80000   1972-01-20 01:00:00 15.90000
0   -39.20000   140.80000   1972-01-20 02:00:00 16.00000
0   -39.20000   140.80000   1972-01-20 03:00:00 16.20000
0   -39.20000   140.80000   1972-01-20 04:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 05:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 06:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 07:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 08:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 09:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 10:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 11:00:00 16.40000
0   -39.20000   140.80000   1972-01-20 12:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 13:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 14:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 15:00:00 16.70000
0   -39.20000   140.80000   1972-01-20 16:00:00 16.70000
0   -39.20000   140.80000   1972-01-20 17:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 18:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 19:00:00 16.60000
0   -39.20000   140.80000   1972-01-20 20:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 21:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 22:00:00 16.50000
0   -39.20000   140.80000   1972-01-20 23:00:00 16.40000
0   -39.20000   140.80000   1972-01-21 00:00:00 16.40000
0   -39.20000   140.80000   1972-01-21 01:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 02:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 03:00:00 16.30000
0   -39.20000   140.80000   1972-01-21 04:00:00 16.10000
0   -39.20000   140.80000   1972-01-21 05:00:00 16.00000
... ... ... ... ...
36111   -38.87551   141.14075   1974-12-30 18:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 19:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 20:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 21:00:00 14.10000
36111   -38.87551   141.14075   1974-12-30 22:00:00 14.20000
36111   -38.87551   141.14075   1974-12-30 23:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 00:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 01:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 02:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 03:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 04:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 05:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 06:00:00 14.60000
36111   -38.87551   141.14075   1974-12-31 07:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 08:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 09:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 10:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 11:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 12:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 13:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 14:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 15:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 16:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 17:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 18:00:00 14.30000
36111   -38.87551   141.14075   1974-12-31 19:00:00 14.40000
36111   -38.87551   141.14075   1974-12-31 20:00:00 14.50000
36111   -38.87551   141.14075   1974-12-31 21:00:00 14.60000
36111   -38.87551   141.14075   1974-12-31 22:00:00 14.70000
36111   -38.87551   141.14075   1974-12-31 23:00:00 14.80000

DF1一个日期列,其每日值从 1972-01-20 到 1974-12-31,而 DF2一个时间列,其中包含从 1972-01-20T00:00:00到 1974-12-31T23:00:00。 DF1 按 RNK(等级)和 FFDI 排序,而 DF2 按 R_ID 和时间排序。一个 R_ID 是一个参考 ID,对应于一对唯一的纬度和经度。 DF2 将加入具有相同 R_ID 和 DF1 的时间列所属的相同日期的 DF2。也就是说,DF1 中的每一行(天)将有来自 DF2 的 24(小时)行,并且具有相同的天值。

输出 df 将如下所示:

                                                              time                   T_SFC
RNK  R_ID       latitude    longitude   Date        FFDI
1    0          -39.20000   140.80000   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     1          -39.20000   140.83786   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     2          -39.20000   140.87572   1973-04-02  5.40000   1973-04-02 00:00:00    13.8
                                                              1973-04-02 01:00:00    13.9
                                                              1973-04-02 02:00:00    13.0
                                                              1973-04-02 03:00:00    13.2
                                                              1973-04-02 04:00:00    13.6
                                                              ... ... ... ...
     ... ... ... ...
2    0          -39.20000   140.80000   1974-03-07  5.60000   1974-03-07 00:00:00    15.8
                                                              1974-03-07 01:00:00    15.9
                                                              1974-03-07 02:00:00    16.0
                                                              1974-03-07 03:00:00    16.2
                                                              1974-03-07 04:00:00    16.6
                                                              ... ... ... ...
     1          -39.20000   140.83786   1973-03-09  5.40000   1973-03-09 00:00:00    15.8
                                                              1973-03-09 01:00:00    15.9
                                                              1973-03-09 02:00:00    16.0
                                                              1973-03-09 03:00:00    15.2
                                                              1973-03-09 04:00:00    15.6
                                                              ... ... ... ...
... ... ... ...
... ... ... ...
5    36082     -33.90000    148.90205   1972-12-24  35.70000  1972-12-24 00:00:00    19.8
                                                              1972-12-24 01:00:00    19.1
                                                              1972-12-24 02:00:00    22.0
                                                              1972-12-24 03:00:00    24.2
                                                              1972-12-24 04:00:00    21.6
                                                              ... ... ... ...
     ... ... ... ...
     36111     -33.90000    150.00000   1973-01-09  25.30000  1973-01-09 00:00:00    19.8
                                                              1973-01-09 01:00:00    19.1
                                                              1973-01-09 02:00:00    22.0
                                                              1973-01-09 03:00:00    24.2
                                                              1973-01-09 04:00:00    21.6
                                                              ... ... ... ...
                                                              1973-01-09 23:00:00    19.1
4,333,440 rows x 2 columns

按照@politinsa 的回答,我试过了

# Add a new column Date and save date part of the time column to it.
df2['Date'] = df2['time'].dt.date.astype('datetime64[ns]')

df_joined = pd.merge(df1,df2,on=['REF_ID','Date'],how='inner')

输出的问题是 df1 的多索引没有保留,输出 df 中缺少 RNK。

print(df_joined)

        time    FFDI         latitude   longitude   T_SFC       time_original
REF_ID                      
0   1973-04-02  5.40000     -39.20000   140.80000   16.40000    1973-04-02 00:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   16.00000    1973-04-02 01:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.70000    1973-04-02 02:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 03:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 04:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 05:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 06:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 07:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 08:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 09:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 10:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 11:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 12:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.20000    1973-04-02 13:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.00000    1973-04-02 14:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.10000    1973-04-02 15:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.30000    1973-04-02 16:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 17:00:00
0   1973-04-02  5.40000     -39.20000   140.80000   15.40000    1973-04-02 18:00:00

... ... ... ...
12000 rows × 6 columns

解决方法

您可以在 DF2 中创建一列包含日期(而不是日期时间),即在第 1973-04-02 01:00:00 行,您将有一列 Date 包含 {{ 1}}。

然后使用经典的内连接 (1973-04-02) 就可以了。