如何正确并行化类方法?

问题描述

我有一个表示自定义对象的类,它是从 FactSet 中提取的报告:

import pandas as pd
import pyodbc
import sys
import multiprocessing as mp

class FactSetReshapedobject(object):
    """
    This class represent abstract FactSet report file
    """
    def __init__(self,file_path):
        """
        Initializes object with given file_path

        Parameters
        ----------
        file_path : string
            path to excel file.

        Returns
        -------
        None.

        """
        self.file_path = file_path
        self.data_frame = None
        # target data_frame with new column names
        self.target_data_frame = pd.DataFrame(columns=['Date','Ticker','Company','Ending Price','Port. Weight'])
        
    def file_reader(self,skip_n_rows,skip_n_footer):
        """
        Reads excel file from path

        Parameters
        ----------
        skip_n_rows : integer
            rows to skip at the beginning of the file.
        skip_n_footer : integer
            rows to skip at the end of the file.

        Returns
        -------
        None.

        """
        file_reader = pd.read_excel(self.file_path,skiprows=skip_n_rows,skipfooter=skip_n_footer)
        self.data_frame = pd.DataFrame(file_reader)
        
    def replace_unnamed_columns(self):
        """
        Loops through self.data_frame and replaces 1st column noted as 'Unnamed: 0' into 'Ticker' and 'Unnamed 1' into 'Company'. 
        Remaining are replaced into date columns
        
        Returns
        -------
        None.
        """
        for i,k in enumerate(self.data_frame.columns):
            if k == 'Unnamed: 0':
                self.data_frame.columns.values[i] = 'Ticker'
                i += 1
            elif k == 'Unnamed: 1':
                self.data_frame.columns.values[i] = 'Company'
                i += 1
            elif 'Unnamed' in k:
                self.data_frame.columns.values[i] = self.data_frame.columns.values[i - 1]
                i += 1
    
    def append_values_into_target_data_frame(self):
        """
        Appends values into target_data_frame,which is new data frame created on the basis of original data

        Returns
        -------
        None.

        """
        # iterate through the old data frame,starting with the 2nd row; append values under each column
        for index,row in self.data_frame[1:].iterrows():
            for i in range(2,len(self.data_frame.columns),2):
                self.target_data_frame = self.target_data_frame.append([{'Date': self.data_frame.columns[i],'Ticker': row[0],'Company': row[1],'Ending Price': row[i],'Port. Weight': row[i + 1]}])

本来,拉入数据框时是这样的:

enter image description here

它非常丑陋,所以我构建了几种方法来重塑它,并允许我进一步研究它并获得所需的结构:

enter image description here

然而,主要方法 append_values_into_target_data_frame 在较大的卷上效率不高,因为它逐行迭代并将值附加到所需的结构中。我不知道如何有效地做到这一点,并且正在考虑在多个内核上运行。但这些对我来说完全是未知的水域,不知道在这种情况下如何做到这一点。正在尝试执行以下操作,但在类实例调用时返回 TypeError: 'module' object is not callable

def data_shaper(self):
    pararellism = 3
    # self.mapper = mp.Pool()
    self.p=mp.pool(pararellism)
    self.p.map(self.append_values_into_target_data_frame) 

如果有任何提示,我将不胜感激。也许有另一种不同的方法可以达到我不知道并且应该考虑的相同结果?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)