从电子邮件数据库列表中提取域

问题描述

我需要从数据集中的电子邮件提取域,并计算前5个域。

import re
from collections import Counter
with open("emails")
domain = re.search('@[\w.)]+,email')
 print(domain.group())

 [email protected]  http://www.bentonjohnbjr.com
 josephine_darakjy@darakjy.org  http://www.chanayjeffreyaesq.com
 [email protected] http://www.chemeljameslcpa.com
 [email protected]  http://www.feltzprintingservice.com
 [email protected] http://www.printingdimensions.com

解决方法

这将列出前5个域:

import re
from collections import Counter 
resultList = []
with open("emails","r") as email:
    for x in email:
        result = re.search('@(.*) ',x)
        resultList.append(result.group(1))
occurence_count = Counter(resultList) 
print(occurence_count.most_common(5))

输出:

[('gmail.com ',1),('darakjy.org ',('venere.org',('hotmail.com ',('cox.net',1)]

输出的是5个最常见的域名