有没有办法更快地运行我的二进制粒子群优化代码?

问题描述

所以我正在为 BPSO 编写代码,用于乳腺癌数据集中的特征选择。代码是用 python 编写的,我正在利用 pyswarms 来做 PSO。数据相当大,有 1095 个样本,具有 20531 个特征。我正在尝试将 pyswarms 网站 (https://pyswarms.readthedocs.io/en/development/examples/feature_subset_selection.html) 上的代码用于我的数据。 网站上的代码有点不对,所以我修复了它。我最初收到错误

收敛警告:lbfgs 未能收敛(状态=1): 停止:总共没有。达到限制的迭代次数

增加迭代次数 (max_iter) 或缩放数据,如下所示: https://scikit-learn.org/stable/modules/preprocessing.html 另请参阅替代求解器选项的文档: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG) /usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs 未能收敛 (status=1): 停止:总共没有。达到限制的迭代次数

所以,我更改了 max_iter=1000。我再次运行代码,大约一个小时后,我决定让它运行一夜并上床睡觉。 12 小时后,我遇到了同样的错误。所以我决定使用 min_max_scaler 来缩放我的数据。我大约一个小时前执行了这个,它仍在运行。我担心我的代码有问题导致运行时间过长。我知道数据的大小在这方面起着一定的作用,但我在项目中的其他组成员没有经历相同的运行时间。

# Import modules
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import pandas as pd
from IPython.display import Image
from sklearn import preprocessing

# Import PySwarms
import pyswarms as ps
import pyswarms.backend as P
from pyswarms.backend.swarms import Swarm
from pyswarms.backend.topology import Star
from pyswarms.utils.functions import single_obj as fx
from pyswarms.utils.plotters import (plot_cost_history,plot_contour,plot_surface)

%load_ext autoreload
%autoreload 2
%matplotlib inline

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#OPEN FILE
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
file = pd.read_csv("Preprocessed_Data_With_Class_Encoded.csv",header=None)
#file
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#FORMAT DATA
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
new = file.loc[1:,2:] # REAL DATA
#new = file.loc[1:rows,2:5] #trial data
X_train=np.array(new,dtype=np.float)

#FORMAT TARGET
a = file.loc[:,1]
b = a[1:]
y=np.array(b,dtype=np.int64)

# X.shape = samples,features >> (10,16382)
#y.shape = samples,>> (10,)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#SCALE DATA
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


min_max_scaler = preprocessing.MinMaxScaler()
X = min_max_scaler.fit_transform(X_train)
X.shape

#simple logistic regression technique using sklearn.linear_model.LogisticRegression to perform classification. 
#A simple test of accuracy will be used to assess the performance of the classifier.
from sklearn import linear_model

# Create an instance of the classifier
classifier = linear_model.LogisticRegression(random_state=0,solver='sag',max_iter=1000)

# Define objective function
def f_per_particle(m,alpha):
    """Computes for the objective function per particle

    Inputs
    ------
    m : numpy.ndarray
        Binary mask that can be obtained from BinaryPSO,will
        be used to mask features.
    alpha: float (default is 0.5)
        Constant weight for Trading-off classifier performance
        and number of features

    Returns
    -------
    numpy.ndarray
        Computed objective function
    """
    total_features = n
    # Get the subset of the features from the binary mask
    if np.count_nonzero(m) == 0:
        X_subset = X
    else:
        X_subset = X[:,m==1]
    # Perform classification and store performance in P
    classifier.fit(X_subset,y)
    P = (classifier.predict(X_subset) == y).mean()
    # Compute for the objective function
    j = (alpha * (1.0 - P)
        + (1.0 - alpha) * (1 - (X_subset.shape[1] / total_features)))

    return j
def f(x,alpha=0.88):
    """Higher-level method to do classification in the
    whole swarm.

    Inputs
    ------
    x: numpy.ndarray of shape (n_particles,dimensions)
        The swarm that will perform the search

    Returns
    -------
    numpy.ndarray of shape (n_particles,)
        The computed loss for each particle
    """
    n_particles = x.shape[0]
    j = [f_per_particle(x[i],alpha) for i in range(n_particles)]
    return np.array(j)
options = {'c1': 0.5,'c2': 0.5,'w':0.9,'k': n,'p':2}

# Call instance of PSO
dimensions = dim

optimizer = ps.discrete.BinaryPSO(n_particles=n,dimensions=dimensions,options=options)
optimizer.reset()
# Perform optimization
cost,pos = optimizer.optimize(f,iters=10,verbose=True)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)