问题描述
我想模糊匹配 large['name']
和 small['name']
以创建一个添加列 'matched_name'
对应于最高匹配行。
大如下
name
0 24/7 CUSTOMER
1 3 K TECHNOLOGIES
2 3I INFOTECH B P O
3 3I INFOTECH CONSULTANCY SERVICES
4 3I INFOTECH
... ...
889 ZIRCON TECHNOLOGIES
890 ZOETIS
891 ZOHO CORPORATION
892 ZOOM COMMUNICATIONS
893 ZYLOG SYstemS
小是这样的
name city country_code
0 Wetpaint New York USA
1 Zoho Pleasanton USA
2 Digg New York USA
3 Facebook Menlo Park USA
4 Accel Palo Alto USA
... ... ... ...
1161387 TKX NaN NaN
1161388 Digitalhype NaN NaN
1161389 TK Research NaN NaN
1161390 Kyodo News Tokyo JPN
1161391 TKO Mobile NaN NaN
我想要的输出是这样的:
name matched_name city country_code
0 Wetpaint WETPAINT CO New York USA
1 Zoho ZOHO CORPORATION Pleasanton USA
2 Digg DIGG CO New York USA
3 Facebook FACEBOOK Menlo Park USA
4 Accel ACCEL Palo Alto USA
... ... ... ...
1161387 TKX TKX CO NaN NaN
1161388 Digitalhype DIGITAL HYPE NaN NaN
1161389 TK Research TK RESEARCH CO NaN NaN
1161390 Kyodo News KYodo Tokyo JPN
1161391 TKO Mobile TKO CO NaN NaN
这就是我目前所拥有的:
# create a new column with match all company_name
for i in large['name']:
for j in small['company_name']:
large['matched_name'] = process.extractOne(i,j)
但我收到一个值错误:ValueError: Length of values (0) does not match length of index (1161392)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)