multivariate_normal.pdf() 返回零?

问题描述

我正在尝试使用 Expectation Maximization (EM) 对我的数据进行分类。但是由于我对 Python 和算法的经验有限,当我编写算法的 E 步时,方法 multivariate_normal.pdf() 返回零,我不知道为什么会发生这种情况以及如何解决它。

我的参数设置:

n_clusters = 2
n_points = len(X)
m1 = X[:,0].mean() 
m2 = X[:,1].mean() 
Mu = [[m1,m2*1.1],[m1,m2*0.9]]
Var = [[1,1],[1,1]]
Pi = [1 / n_clusters] * 2
W = np.ones((n_points,n_clusters)) / n_clusters 
Pi = W.sum(axis=0) / W.sum()

E-步骤:

enter image description here

E-step 的功能:

def update_W(X,Mu,Var,Pi):
    n_points,n_clusters = len(X),len(Pi)
    pdfs = np.zeros(((n_points,n_clusters)))
    for i in range(n_clusters):
        pdfs[:,i] = Pi[i] * multivariate_normal.pdf(X,Mu[i],np.diag(Var[i]))
    W = pdfs / pdfs.sum(axis=1).reshape(-1,1)
    return W

问题在于代码

multivariate_normal.pdf(X,np.diag(Var[i]))

返回零,对算法没有意义。

我的数据在这里:

X = np.array([[5.04729,37744.57848238945],[5.0008,35899.206357479095],[4.98011,33056.83955574036],[4.9931,33704.46286725998],[4.96448,32990.76251792908],[4.99008,32283.805765628815],[5.04159,34409.80894064903],[5.00075,35270.9743578434],[5.00899,36284.97508478165],[4.97215,35458.088497161865],[4.9763,35349.08898703754],[5.02148,38928.51481676102],[4.96585,33278.32470035553],[5.02545,41023.1229429245],[5.00402,39417.85598278046],[5.02445,38158.13270044327],[5.02238,36516.55680179596],[4.98606,34646.19884157181],[5.0033,50677.906705379486],[5.02733,49949.25080680847],[5.02716,39155.43921852112],[5.04183,39191.40325546265],[4.96896,38266.83190345764],[4.99652,36774.488584041595],[5.03478,33207.51669681072],[5.03832,34724.067808151245],[5.02347,32981.52464199066],[4.96275,33158.869076013565],[4.97127,33810.3498980999],[5.01482,35844.073690891266],[4.97365,36869.271273851395],[5.00685,34672.39159619808],[4.98033,34688.13522028923],[5.00265,32825.98570096493],[5.00083,34160.511184334755],[5.01027,33884.14221894741],[4.95678,38457.50146174431],[4.96867,36320.28945875168],[5.00721,33037.772767663],[5.04357,33117.379173755646],[5.0004,34557.091692984104],[5.02337,34421.044409394264],[5.01924,36974.69707453251],[5.04621,38812.16114796698],[4.96225,33006.868394851685],[5.03166,33958.025827884674],[4.95124,34349.11486387253],[5.02567,34259.3874104023],[5.04868,41895.62332487106],[5.00072,34009.417558074],[5.04818,35927.81018912792],[4.97368,33997.19724833965],[4.97671,33999.519283890724],[4.95128,31926.064946889877],[4.95742,33177.98467195034],[4.9803,36231.32753157616],[4.99018,35190.21499049879],[5.00053,34551.94658563426],[4.96793,33994.06255364418],[4.98345,32120.49098587036],[5.04564,30271.260227680206],[4.95421,30653.037763834],[5.03442,31966.675320148468],[4.9722,33757.07542133331],[5.01377,33405.48862147331],[5.03417,33558.4126021862],[5.02044,30911.697856664658],[4.98318,33883.324236392975],[4.99876,36436.7568192482],[4.98475,34489.98227977753],[5.00355,30205.769625544548],[5.03646,33422.97079265118],[5.04884,34117.2653195858],[5.02259,34993.6039069891],[4.9737,47981.5806927681],[5.00593,56685.62618494034],[5.00434,54395.169895286555],[4.95768,35611.35310435295],[5.02978,37536.10489606857],[4.95214,35577.900700092316],[4.99925,33014.68128633499],[5.03029,29865.81333065033],28760.52792918682],[5.02051,37038.269605994225],[5.04974,36642.584582567215],29041.81211209297],[4.97972,37114.91027057171],[4.99165,36424.69217658043],[4.95779,33859.12481832504],[5.01495,32957.517973423004],[4.95671,32964.28135430813],[5.00732,34051.169529914856],[4.98396,32881.329072237015],[5.02704,34020.69495886564],[4.98323,34851.629052996635],[5.02463,34762.618599653244],[4.95826,31249.48672771454],[5.02317,35223.82992994785],[4.98121,31181.02133178711],[5.00584,28375.87954646349],[4.98506,31617.50852251053],[5.02145,31114.64755296707],[5.03362,31182.910351395607],[5.03503,30093.33342540264],[4.95797,36398.88142120838],[4.97227,35468.531022787094],[5.00365,35569.601973861456],[4.96697,33233.918072104454],[5.01004,31831.104349255562],[5.01838,36176.26664984226],[5.01747,35596.35544002056],[5.0358,32386.7973613739],[4.973,33052.33617353439],[5.03017,32395.227126598358],[4.98371,34437.29500854015],[5.02143,31013.464814782143],[5.02339,36368.26537466049],[5.02942,36371.447618722916],[5.04456,38037.0499355793],[5.02219,40177.06857061386],[4.95168,37964.482201099396],[4.99589,38579.15203022957],[4.99336,34731.92454767227],[4.98333,34161.41907787323],[4.95841,30232.777291665203],[4.95133,30595.70202767849],[5.0252,30708.32647585869],[5.04505,35107.191153526306],[4.98019,35513.3441824913],[4.97158,30879.74526977539],[4.96096,33175.533081412315]])

提前致谢!

解决方法

让我们看看一些实际数字:

>>> Mu[0]
[5.000155496183207,38591.28312042943]
>>> Var[0]
[1,1]
>>> X[:5]
array([[5.04729,37744.5785],[5.00080,35899.2064],[4.98011,33056.8396],[4.99310,33704.4629],[4.96448,32990.7625]])

现在,让我们尝试以下值,看看我们得到了什么:

>>> multivariate_normal.pdf([5,38561.5785],Mu[0],np.diag(Var[0]))
3.970168119488602e-193

你得到一个非常低的概率值,但至少它不是零。

让我们暂时忽略第一个维度,因为它不是问题所在:3856130 相差大约 38591 标准差。

对于单变量正态分布,我们知道偏离均值超过 3 个标准差的概率仅为 0.3%,并且下降得非常快:

enter image description here

查看存储在 X 中的值 - 它们与 Mu[0] 的距离远超过 30 个标准差。这些值不太可能来自这种分布,就 Python 而言,发生这种情况的概率是 0.0

更好的主意是从数据的实际协方差开始:

Var = np.cov(X.T) + 1
Var = [Var,Var]

然后突然你得到了不错的概率:)

附言我将协方差矩阵加 1 以避免 singularity

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...