使用Exchangelib提取电子邮件时出现MemoryError

问题描述

我有一个关于使用exchangelib批量保存电子邮件数据的问题。当前,如果有很多电子邮件,则要花费很多时间。几分钟后,它将引发此错误:

    ERROR:    MemoryError:
    Retry: 0
    Waited: 10
    Timeout: 120
    Session: 25999
    Thread: 28148
    Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
    URL: https://outlook.office365.com/EWS/Exchange.asmx
    HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
    Allow redirects: False
    Streaming: False
    Response time: 411.93799999996554
    Status code: 503
    Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
    Response headers: {}

这是我用于连接和阅读的代码:

def connect_mail():
    config = Configuration(
        server="outlook.office365.com",credentials=Credentials(
            username="myworkemail@workdomain.com",password="*******"
        ),)
    return Account(
        primary_smtp_address="myworkemail@workdomain.com",config=config,access_type=DELEGATE,)

def import_email(account):
    tz = EWSTimeZone.localzone()
    start = EWSDateTime(2020,10,26,22,15,tzinfo=tz)
    for item in account.inbox.filter(
        datetime_received__gt=start,is_read=False
    ).order_by("-datetime_received"):
        email_body = item.body
        email_subject = item.subject
        soup = bs(email_body,"html.parser")
        tables = soup.find_all("table")
        item.is_read = True
        item.save()
        # Some code here for saving the email to a database

解决方法

您将获得MemoryError,这意味着Python无法在您的计算机上分配更多的内存。

您可以做一些事情来减少脚本的内存消耗。一种是使用.iterator()来禁用查询结果的内部缓存。另一个方法是使用.only()

仅获取您实际需要的字段

使用.only()时,其他字段为None。您需要记住只保存您实际更改的一个字段:item.save(update_fields=['is_read'])

这是如何使用两项改进的示例:

for item in account.inbox.filter(
        datetime_received__gt=start,is_read=False,).only(
        'is_read','subject','body',).order_by('-datetime_received').iterator():

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...