python – numpy.argmax比MATLAB慢[〜,idx] = max()？

我正在为正态分布编写Bayseian分类器.我在python和MATLAB中都有几乎相同的代码.但是,MATLAB代码的运行速度比我的Python脚本快50倍.我是Python的新手,所以也许我做的事情非常糟糕.我假设它是我循环数据集的地方.

可能numpy.argmax()比[〜,idx] = max()慢得多？循环数据框架很慢？字典的使用不好(以前我试过一个对象,它甚至很慢)？

欢迎任何建议.

Python代码

import numpy as np
import pandas as pd

#import the data as a data frame
train_df = pd.read_table('hw1_traindata.txt',header = None)#training
train_df.columns = [1, 2] #rename column titles

这里的数据是2列(300行/样品用于训练,300000用于测试).这是功能参数; mi和Si是样本均值和协方差.

case3_p = {'w': [], 'w0': [], 'W': []}
case3_p['w']={1:S1.I*m1,2:S2.I*m2,3:S3.I*m3}
case3_p['w0']={1: -1.0/2.0*(m1.T*S1.I*m1)-

1.0/2.0*np.log(np.linalg.det(S1)),
            2: -1.0/2.0*(m2.T*S2.I*m2)-1.0/2.0*np.log(np.linalg.det(S2)),
            3: -1.0/2.0*(m3.T*S3.I*m3)-1.0/2.0*np.log(np.linalg.det(S3))}
case3_p['W']={1: -1.0/2.0*S1.I,
           2: -1.0/2.0*S2.I,
           3: -1.0/2.0*S3.I}
#W1=-1.0/2.0*S1.I
#w1_3=S1.I*m1
#w01_3=-1.0/2.0*(m1.T*S1.I*m1)-1.0/2.0*np.log(np.linalg.det(S1))    
def g3(x,W,w,w0):
    return x.T*W*x+w.T*x+w0

这是分类器/循环

train_df['case3'] = 0

for i in range(train_df.shape[0]):
    x = np.mat(train_df.loc[i,[1, 2]]).T#observation

    #case 3    
    vals = [g3(x,case3_p['W'][1],case3_p['w'][1],case3_p['w0'][1]),
            g3(x,case3_p['W'][2],case3_p['w'][2],case3_p['w0'][2]),
            g3(x,case3_p['W'][3],case3_p['w'][3],case3_p['w0'][3])]
    train_df.loc[i,'case3'] = np.argmax(vals) + 1 #add one to make it the class value

对应的MATLAB代码

train = load('hw1_traindata.txt');

判别函数

W1=-1/2*S1^-1;%there isn't one for the other cases
w1_3=S1^-1*m1';%fix the transpose thing
w10_3=-1/2*(m1*S1^-1*m1')-1/2*log(det(S1));
g1_3=@(x) x'*W1*x+w1_3'*x+w10_3';

W2=-1/2*S2^-1;
w2_3=S2^-1*m2';
w20_3=-1/2*(m2*S2^-1*m2')-1/2*log(det(S2));
g2_3=@(x) x'*W2*x+w2_3'*x+w20_3';

W3=-1/2*S3^-1;
w3_3=S3^-1*m3';
w30_3=-1/2*(m3*S3^-1*m3')-1/2*log(det(S3));
g3_3=@(x) x'*W3*x+w3_3'*x+w30_3';

分类器

case3_class_tr = Inf(size(act_class_tr));
for i=1:length(train)
    x=train(i,:)';%current sample

    %case3
    vals = [g1_3(x),g2_3(x),g3_3(x)];%compute discriminant function value
    [~, case3_class_tr(i)] = max(vals);%get location of max

end

解决方法:

在这种情况下,最好对您的代码进行分析.首先,我创建了一些模拟数据：

import numpy as np
import pandas as pd

fname = 'hw1_traindata.txt'
ar = np.random.rand(1000, 2)
np.savetxt(fname, ar, delimiter='\t')

m1, m2, m3 = [np.mat(ar).T for ar in np.random.rand(3, 2)]
S1, S2, S3 = [np.mat(ar) for ar in np.random.rand(3, 2, 2)]

然后我将你的代码包装在一个函数中并使用lprun(line_profiler)IPython魔术进行分析.这些是结果：

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 4.77361 s
File: <ipython-input-164-563f57dadab3>
Function: train at line 1

Line #   Hits     Time  Per Hit  %Time  Line Contents
=====================================================
     1                                 def train(fname, m1, S1, m2, S2, m3, S3):
     2      1     9868   9868.0   0.1      train_df = pd.read_table(fname ,header = None)#training
     3      1      328    328.0   0.0      train_df.columns = [1, 2] #rename column titles
     4                                 
     5      1       17     17.0   0.0      case3_p = {'w': [], 'w0': [], 'W': []}
     6      1      877    877.0   0.0      case3_p['w']={1:S1.I*m1,2:S2.I*m2,3:S3.I*m3}
     7      1      356    356.0   0.0      case3_p['w0']={1: -1.0/2.0*(m1.T*S1.I*m1)-
     8                                 
     9      1      204    204.0   0.0      1.0/2.0*np.log(np.linalg.det(S1)),
    10      1      498    498.0   0.0                  2: -1.0/2.0*(m2.T*S2.I*m2)-1.0/2.0*np.log(np.linalg.det(S2)),
    11      1      502    502.0   0.0                  3: -1.0/2.0*(m3.T*S3.I*m3)-1.0/2.0*np.log(np.linalg.det(S3))}
    12      1      235    235.0   0.0      case3_p['W']={1: -1.0/2.0*S1.I,
    13      1      229    229.0   0.0                 2: -1.0/2.0*S2.I,
    14      1      230    230.0   0.0                 3: -1.0/2.0*S3.I}
    15                                 
    16      1     1818   1818.0   0.0      train_df['case3'] = 0
    17                                 
    18   1001    17409     17.4   0.2      for i in range(train_df.shape[0]):
    19   1000  4254511   4254.5  49.9          x = np.mat(train_df.loc[i,[1, 2]]).T#observation
    20                                 
    21                                         #case 3    
    22   1000   298245    298.2   3.5          vals = [g3(x,case3_p['W'][1],case3_p['w'][1],case3_p['w0'][1]),
    23   1000   269825    269.8   3.2                  g3(x,case3_p['W'][2],case3_p['w'][2],case3_p['w0'][2]),
    24   1000   274279    274.3   3.2                  g3(x,case3_p['W'][3],case3_p['w'][3],case3_p['w0'][3])]
    25   1000  3395654   3395.7  39.8          train_df.loc[i,'case3'] = np.argmax(vals) + 1
    26                                 
    27      1       45     45.0   0.0      return train_df

有两条线共占90％的时间.所以让我们将这些行分开一点然后重新运行探查器：

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 6.15358 s
File: <ipython-input-197-92d9866b57dc>
Function: train at line 1

Line #   Hits      Time  Per Hit  %Time  Line Contents
======================================================
...     
    19   1000   5292988   5293.0   48.2          thing = train_df.loc[i,[1, 2]]  # Observation
    20   1000    265101    265.1    2.4          x = np.mat(thing).T
...     
    26   1000    143142    143.1    1.3          index = np.argmax(vals) + 1  # Add one to make it the class value
    27   1000   4164122   4164.1   37.9          train_df.loc[i,'case3'] = index

大部分时间用于索引Pandas数据帧！取argmax仅占总执行时间的1.5％.

通过预先分配train_df [‘case3’]并使用.iloc,可以稍微改善这种情况：

%lprun -f train train(fname, m1, S1, m2, S2, m3, S3)
Timer unit: 5.59946e-07 s

Total time: 3.26716 s
File: <ipython-input-192-f6173cdf9990>
Function: train at line 1

Line #   Hits      Time  Per Hit  %Time  Line Contents
======= ======= ======================================
    16      1      1548   1548.0    0.0      train_df['case3'] = np.zeros(len(train_df))
...             
    19   1000   2608489   2608.5   44.7          thing = train_df.iloc[i,[0, 1]]  # Observation
    20   1000    228959    229.0    3.9          x = np.mat(thing).T
...             
    26   1000    123165    123.2    2.1          index = np.argmax(vals) + 1  # Add one to make it the class value
    27   1000   1849283   1849.3   31.7          train_df.iloc[i,2] = index

尽管如此,在紧密循环中迭代Pandas数据帧中的各个值是一个坏主意.在这种情况下,使用Pandas仅用于加载文本数据(它非常擅长),但除此之外使用“原始”Numpy数组.例如.使用train_data = pd.read_table(fname,header = None).values.当你到达分析阶段时,可能会回到熊猫身上.

其他一些说法：

>使用Python的基于零的索引,不要忘记使用
基于单一的索引.
>考虑使用普通的Numpy数组而不是矩阵.当你使用
矩阵你倾向于将它们与数组混合在一起并且难以调试
问题.
> MATLAB有一个JIT编译器,所以Python和Python之间存在速度差异
MATLAB预计用于循环繁重的代码.

python – numpy.argmax比MATLAB慢[〜,idx] = max()？

相关文章