多处理该如何改善呢?

问题描述

我有一个脚本,我在其中提取xlsx文件并将其重新格式化,并创建一个记录重新格式化的txt文档。该脚本可以很好地工作,并且可以执行我想要的操作。但是,由于没有充分利用多处理功能,所以速度不如我想的快。有时,每个“ files_xlsx”中可能只有少数几个文件被重新格式化。如果我删除processes.join(),它将最终崩溃。理想情况下,我希望它一次可以在多个“ files_xlsx” /目录等中的多个xlsx工作表上工作。但是我在编写代码方面并不走运。是否可以通过简单的方法来调整当前代码,以使其一次可以在更多xlsx上运行?

解决方法

要充分利用Python的multiprocessing库,最直接的方法是使用Pool

请查看对代码的修改,如下所示。请注意,我没有以任何方式修改def rename_sheets

# From Python 3.4 onwards,you can use pathlib
from pathlib import Path

def convert_excel_txt(fil): 
# directories is a globally defined variable. Not needed as an argument
# Variable name *file* is not a good idea. 

# This method is to process one and only one file
# The multiprocessing is taken care of by Pool
    open_xl = openpyxl.load_workbook(fil)
    titles = xls.sheet_names()
    # print(len(titles))
    count = 1
    for title in titles:
        # print("{}.| {}".format(count,title))
        sheet_title_value = rename_sheets(title,count,open_xl,fil)
        # We'll navigate to the directory we're working on
        directory = Path(fil).parent
        with open(directory+"\\Reference_Sheets\\"+fil[:-5]+".txt",'a',encoding='utf-8') as outfile:
                outfile.write('\n'+str(count)+". "+sheet_title_value)
                count +=1


directories = open(r"C:\Python38\Projects\s_&p_500_links_test.txt","r")

files = []

for directory in directories:
    directory = directory[:-1]
    print(directory)
    report_type = "Annual"
    path = os.chdir(directory)
    files = os.listdir(directory+"\\"+report_type)
    print(files)

files_xlsx = [f for f in files if f[-4:] == 'xlsx']
pool = Pool(24)
pool.map(convert_excel_txt,files_xlsx )

要定时执行各种版本的代码,请按照下列步骤操作:

import time
import datetime

overall_start_time = time.time()
print('Started at ',time.strftime('%X %x %Z'))

# timed code goes here

print ("Time elapsed overall (hours:min:sec): %s" % str(datetime.timedelta(seconds=(time.time()- overall_start_time)))) 

Referencehttps://docs.python.org/2/library/multiprocessing.html

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...