从电子邮件数据库列表中提取域

问题描述

我需要从数据集中的电子邮件中提取域,并计算前5个域。

import re
from collections import Counter
with open("emails")
domain = re.search('@[\w.)]+,email')
 print(domain.group())

 jbutt@gmail.com  http://www.bentonjohnbjr.com
 josephine_darakjy@darakjy.org  http://www.chanayjeffreyaesq.com
 art@venere.org http://www.chemeljameslcpa.com
 lpaprocki@hotmail.com  http://www.feltzprintingservice.com
 donette.foller@cox.net http://www.printingdimensions.com

解决方法

这将列出前5个域:

import re
from collections import Counter 
resultList = []
with open("emails","r") as email:
    for x in email:
        result = re.search('@(.*) ',x)
        resultList.append(result.group(1))
occurence_count = Counter(resultList) 
print(occurence_count.most_common(5))

输出:

[('gmail.com ',1),('darakjy.org ',('venere.org',('hotmail.com ',('cox.net',1)]

输出的是5个最常见的域名

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...