多处理该如何改善呢？

问题描述

我有一个脚本，我在其中提取xlsx文件并将其重新格式化，并创建一个记录重新格式化的txt文档。该脚本可以很好地工作，并且可以执行我想要的操作。但是，由于没有充分利用多处理功能，所以速度不如我想的快。有时，每个“ files_xlsx”中可能只有少数几个文件被重新格式化。如果我删除processes.join（），它将最终崩溃。理想情况下，我希望它一次可以在多个“ files_xlsx” /目录等中的多个xlsx工作表上工作。但是我在编写代码方面并不走运。是否可以通过简单的方法来调整当前代码，以使其一次可以在更多xlsx上运行？

解决方法

要充分利用Python的multiprocessing库，最直接的方法是使用Pool。

请查看对代码的修改，如下所示。请注意，我没有以任何方式修改def rename_sheets。

# From Python 3.4 onwards,you can use pathlib
from pathlib import Path

def convert_excel_txt(fil): 
# directories is a globally defined variable. Not needed as an argument
# Variable name *file* is not a good idea. 

# This method is to process one and only one file
# The multiprocessing is taken care of by Pool
    open_xl = openpyxl.load_workbook(fil)
    titles = xls.sheet_names()
    # print(len(titles))
    count = 1
    for title in titles:
        # print("{}.| {}".format(count,title))
        sheet_title_value = rename_sheets(title,count,open_xl,fil)
        # We'll navigate to the directory we're working on
        directory = Path(fil).parent
        with open(directory+"\\Reference_Sheets\\"+fil[:-5]+".txt",'a',encoding='utf-8') as outfile:
                outfile.write('\n'+str(count)+". "+sheet_title_value)
                count +=1


directories = open(r"C:\Python38\Projects\s_&p_500_links_test.txt","r")

files = []

for directory in directories:
    directory = directory[:-1]
    print(directory)
    report_type = "Annual"
    path = os.chdir(directory)
    files = os.listdir(directory+"\\"+report_type)
    print(files)

files_xlsx = [f for f in files if f[-4:] == 'xlsx']
pool = Pool(24)
pool.map(convert_excel_txt,files_xlsx )

要定时执行各种版本的代码，请按照下列步骤操作：

import time
import datetime

overall_start_time = time.time()
print('Started at ',time.strftime('%X %x %Z'))

# timed code goes here

print ("Time elapsed overall (hours:min:sec): %s" % str(datetime.timedelta(seconds=(time.time()- overall_start_time))))

Reference：https://docs.python.org/2/library/multiprocessing.html

multiprocessing operating-system python

多处理该如何改善呢？

问题描述

解决方法

相关问答