模糊匹配对应的行 DataFrame pandas

问题描述

我想模糊匹配 large['name']small['name'] 以创建一个添加'matched_name' 对应于最高匹配行。

大如下

    name
0   24/7 CUSTOMER
1   3 K TECHNOLOGIES
2   3I INFOTECH B P O
3   3I INFOTECH CONSULTANCY SERVICES
4   3I INFOTECH
... ...
889 ZIRCON TECHNOLOGIES
890 ZOETIS
891 ZOHO CORPORATION
892 ZOOM COMMUNICATIONS
893 ZYLOG SYstemS

小是这样的

        name        city        country_code
0       Wetpaint    New York    USA
1       Zoho        Pleasanton  USA
2       Digg        New York    USA
3       Facebook    Menlo Park  USA
4       Accel       Palo Alto   USA
... ... ... ...
1161387 TKX         NaN         NaN
1161388 Digitalhype NaN         NaN
1161389 TK Research NaN         NaN
1161390 Kyodo News  Tokyo       JPN
1161391 TKO Mobile  NaN         NaN

我想要的输出是这样的:

        name        matched_name        city        country_code
0       Wetpaint    WETPAINT CO         New York    USA
1       Zoho        ZOHO CORPORATION    Pleasanton  USA
2       Digg        DIGG CO             New York    USA
3       Facebook    FACEBOOK            Menlo Park  USA
4       Accel       ACCEL               Palo Alto   USA
... ... ... ...
1161387 TKX         TKX CO              NaN         NaN
1161388 Digitalhype DIGITAL HYPE        NaN         NaN
1161389 TK Research TK RESEARCH CO      NaN         NaN
1161390 Kyodo News  KYodo               Tokyo       JPN
1161391 TKO Mobile  TKO CO              NaN         NaN

这就是我目前所拥有的:

# create a new column with match all company_name
for i in large['name']:
    for j in small['company_name']:
        large['matched_name'] = process.extractOne(i,j)

但我收到一个错误ValueError: Length of values (0) does not match length of index (1161392)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)