问题描述
我有一个关于使用exchangelib批量保存电子邮件数据的问题。当前,如果有很多电子邮件,则要花费很多时间。几分钟后,它将引发此错误:
ERROR: MemoryError:
Retry: 0
Waited: 10
Timeout: 120
Session: 25999
Thread: 28148
Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
URL: https://outlook.office365.com/EWS/Exchange.asmx
HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
Allow redirects: False
Streaming: False
Response time: 411.93799999996554
Status code: 503
Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
Response headers: {}
这是我用于连接和阅读的代码:
def connect_mail():
config = Configuration(
server="outlook.office365.com",credentials=Credentials(
username="myworkemail@workdomain.com",password="*******"
),)
return Account(
primary_smtp_address="myworkemail@workdomain.com",config=config,access_type=DELEGATE,)
def import_email(account):
tz = EWSTimeZone.localzone()
start = EWSDateTime(2020,10,26,22,15,tzinfo=tz)
for item in account.inbox.filter(
datetime_received__gt=start,is_read=False
).order_by("-datetime_received"):
email_body = item.body
email_subject = item.subject
soup = bs(email_body,"html.parser")
tables = soup.find_all("table")
item.is_read = True
item.save()
# Some code here for saving the email to a database
解决方法
您将获得MemoryError
,这意味着Python无法在您的计算机上分配更多的内存。
您可以做一些事情来减少脚本的内存消耗。一种是使用.iterator()来禁用查询结果的内部缓存。另一个方法是使用.only()
仅获取您实际需要的字段使用.only()
时,其他字段为None
。您需要记住只保存您实际更改的一个字段:item.save(update_fields=['is_read'])
这是如何使用两项改进的示例:
for item in account.inbox.filter(
datetime_received__gt=start,is_read=False,).only(
'is_read','subject','body',).order_by('-datetime_received').iterator():