附加模糊匹配的熊猫数据框

问题描述

我想将 small['company_name'] 中的数据模糊匹配到 large['name'] 以最终将其附加到 dico,用于以下输出

 0 uuid    name    company_name   type    primary_role    cb_url  domain  homepage_url    combined_stock_symbols  city    region  country_code    short_description
0   e1393508-30ea-8a36-3f96-dd3226033abd    Wetpaint    organization    company https://www.crunchbase.com/organization/wetpai...   wetpaint.com    http://www.wetpaint.com/    NaN New York    New York    USA Wetpaint offers an online social publishing pl...
1   bf4d7b0e-b34d-2fd8-d292-6049c4f7efc7    Zoho    organization    company https://www.crunchbase.com/organization/zoho?u...   zoho.com    https://www.zoho.com/   NaN Pleasanton  California  USA Zoho offers a suite of business,collaboration...
2   5f2b40b8-d1b3-d323-d81a-b7a8e89553d0    Digg    organization    company https://www.crunchbase.com/organization/digg?u...   digg.com    http://www.digg.com NaN New York    New York    USA Digg Inc. operates a website that enables its ...
3   df662812-7f97-0b43-9d3e-12f64f504fbb    Facebook    organization    company https://www.crunchbase.com/organization/facebo...   facebook.com    http://www.facebook.com nasdaq:FB   Menlo Park  California  USA Facebook is an online social networking servic...
4   b08efc27-da40-505a-6f9d-c9e14247bf36    Accel   organization    investor    https://www.crunchbase.com/organization/accel?...   accel.com   http://www.accel.com    NaN Palo Alto   California  USA Accel is an early and growth-stage venture cap...

大如下

uuid    name    type    primary_role    cb_url  domain  homepage_url    combined_stock_symbols  city    region  country_code    short_description
0   e1393508-30ea-8a36-3f96-dd3226033abd    Wetpaint    organization    company https://www.crunchbase.com/organization/wetpai...   wetpaint.com    http://www.wetpaint.com/    NaN New York    New York    USA Wetpaint offers an online social publishing pl...
1   bf4d7b0e-b34d-2fd8-d292-6049c4f7efc7    Zoho    organization    company https://www.crunchbase.com/organization/zoho?u...   zoho.com    https://www.zoho.com/   NaN Pleasanton  California  USA Zoho offers a suite of business,collaboration...
2   5f2b40b8-d1b3-d323-d81a-b7a8e89553d0    Digg    organization    company https://www.crunchbase.com/organization/digg?u...   digg.com    http://www.digg.com NaN New York    New York    USA Digg Inc. operates a website that enables its ...
3   df662812-7f97-0b43-9d3e-12f64f504fbb    Facebook    organization    company https://www.crunchbase.com/organization/facebo...   facebook.com    http://www.facebook.com nasdaq:FB   Menlo Park  California  USA Facebook is an online social networking servic...
4   b08efc27-da40-505a-6f9d-c9e14247bf36    Accel   organization    investor    https://www.crunchbase.com/organization/accel?...   accel.com   http://www.accel.com    NaN Palo Alto   California  USA Accel is an early and growth-stage venture cap...

和小

company_name
0   24/7 CUSTOMER
1   3 K TECHNOLOGIES
2   3I INFOTECH B P O
3   3I INFOTECH CONSULTANCY SERVICES
4   3I INFOTECH

下面的代码是我做的,但是代码运行了无限循环

from fuzzywuzzy import fuzz

comb = pd.MultiIndex.from_product((large['name'],small['company_name']))
scores = comb.map(lambda x: fuzz.ratio(*x)) #or fuzz.partial_ratio(*x)
d = dict(a for a,b in zip(comb,scores) if b>90) #change threshold
out = large.assign(SurName=large['name'].map(d)).dropna(subset=['SurName'])

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)