在Python中复制Excel VLOOKUP

问题描述

因此，我有2个表，表1和表2，表2按照日期进行了排序-最近日期到旧日期。因此，在excel中，当我在表1中进行查找并且从表2中进行查找时，它仅从表2中选择第一个值，并且不会继续在第一个值之后搜索相同的值。因此，我尝试使用merge函数在python中复制它，但发现它可以重复该值出现在第二张表中的次数。

pd.merge(Table1,Table2,left_on='Country',right_on='Country',how='left',indicator='indicator_column')

表1

TABLE2

合并结果

预期结果（Excel vlookup）

通过合并功能或任何其他python函数可以实现这一目标吗？

解决方法

编辑：由于您想要直接使用Vlookup，因此只需使用join。似乎找到了第一个。

table1.join(table2,rsuffix='r',lsuffix='l')

文档似乎表明它的性能类似于vlookup：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

我建议更像是SQL连接而不是Vlookup。 Vlookup从顶部到底部查找第一条匹配行，这可以完全任意，具体取决于您如何在excel中对表/数组进行排序。出于充分的理由，“ True”数据库系统及其相关功能比这更详细。

为了只将右表中的一行连接到左表中的一行，您将需要某种聚合或选择-因此，根据您的情况，这将是MAX或MIN。

问题是，哪一列更重要？日期或年龄？

import pandas as pd
df1 = pd.DataFrame({
    'Country':['GERM','LIB','ARG','BNG','LITH','GHAN'],'Name':['Dave','Mike','Pete','Shirval','Kwasi','Delali']
})
df2 = pd.DataFrame({
    'Country':['GERM','GHAN','BNG'],'Age':[35,40,27,87,90,30,61,18,45],'Date':['7/10/2020','7/9/2020','7/8/2020','7/7/2020','7/6/2020','7/5/2020','7/4/2020','7/3/2020','7/2/2020']
})
df1.set_index('Country')\
.join(
    df2.groupby('Country')\
    .agg({'Age':'max','Date':'max'}),how='left',lsuffix='l',rsuffix='r')

在将数据包括为图像而不是文本时，请在盲点输入。

# The index is a very important element in a DataFrame
# We will see that in a bit
result = table1.set_index('Country')

# For each country,only keep the first row
tmp = table2.drop_duplicates(subset='Country').set_index('Country')

# When you assign one or more columns of a DataFrame to one or more columns of
# another DataFrame,the assignment is aligned based on the index of the two
# frames. This is the equivalence of VLOOKUP
result.loc[:,['Age','Date']] = tmp[['Age','Date']]
result.reset_index(inplace=True)