R和Matlab中的拟合分布给出了截然不同的结果

问题描述

我正在将波高数据集（m）拟合到广义的pareto分布。我已经选择了0.15m的阈值。我正在从Matlab过渡到R，所以我想比较两个程序的结果。我知道这两个程序不会提供相同的确切答案，但是我希望得到的结果彼此接近。

当然，对于两种拟合，我使用相同的数据和相同的阈值。两种方法都通过MLE估计参数。在Matlab中，fitdist 函数不会估计阈值。假定它是已知的（在这种情况下为0.15m），并在调用它之前从数据中减去（我已经这样做了）。在R中，阈值是输入的一部分。

我使用的功能是： Matlab： fitdist(data_above_threshold,'Generalized Pareto') R： fevd(data,threshold =0.15,type = 'GP',method = "MLE")。（extRemes软件包）

在Matlab中，我得到： Shape:-1.0301 Scale:0.7534 在R中我得到Shape:-0.5848505 Scale:0.3620191

我在这里附加数据，以便您自己查看结果：

c(0.194101337780203,0.289791483274648,0.313940773547535,0.184577674010614,0.102266008573448,0.045464156804826,0.0486387113946889,0.143761972140945,0.267342847246331,0.242966803074167,0.069386693178437,0.0099771715681416,0.00736950172646855,0.056121590070795,0.0759625562574393,0.145009118586962,0.258045937376017,0.379926158236833,0.236844447793717,0.0861664817248564,0.0503393656392586,0.156233436601121,0.118138781522764,4.44089209850063e-16,0.0442170103588082,0.0143988726040223,0.148183673176825,0.197729400168618,0.026416829265647,0.133558046673528,0.209180472082052,0.236050809146251,0.0976175536382913,0.0292512530065965,0.101812500774896,0.174260371593558,0.219271020599832,0.463144839271102,0.579809720448572,0.533211794147367,0.354756475417204,0.360085192050189,0.632756755929504,0.651577329569406,0.413372358380034,0.39353139219339,0.343985665201597,0.0630375839987112,0.0434233717113424,0.0566884748189849,0.147730165378273,0.175734271938852,0.0827651732357175,0.0385481628769098,0.0378679011790819,0.0463711724019298,0.0200677200859207,0.0956901454944461,0.128909591738371,0.141154302299272,0.0459176646033779,0.0285709913087686,0.251696828196291,0.642847304447283,0.731394702114536,0.603732256822183,0.427997984883332,0.635251048821539)

所以，我的问题是：如果两个都使用MLE估计参数，那么结果为何如此不同？那么哪种配合更可靠，或者您建议使用哪种呢？

解决方法

我认为问题在于您的数据暗示了某种多峰性，而Matlab使用的是次优起始值。

数据的经验密度图：

plot(density(x,bw = "SJ"))

首先让我们绘制R拟合：

library(SpatialExtremes)
library(extRemes)
plot(fevd(x,threshold =0.15,type = 'GP',method = "MLE"))

左下角的图表明我们已经达到了不错的配合。（请注意，此图中的内核密度估计使用了与上面不同的带宽。）

现在，我们将您的Matlab结果用作起始值：

fevd(x,method = "MLE",initial = list(shape = -1.0301,scale = 0.7534))
#fevd(x = x,threshold = 0.15,type = "GP",#    initial = list(shape = -1.0301,scale = 0.7534))
#
#[1] "Estimation Method used: MLE"
#
#
# Negative Log-Likelihood Value:  -18.74149 
#
#
# Estimated parameters:
#     scale      shape 
# 0.6242056 -1.0736348 
#
# Standard Error Estimates:
#      scale       shape 
#0.000000020 0.001368606 
#
# Estimated parameter covariance matrix.
#              scale         shape
#scale  4.000000e-16 -2.598691e-17
#shape -2.598691e-17  1.873082e-06
#
# AIC = -33.48297 
#
# BIC = -30.5515 
#Warning messages:
#1: In log(z) : NaNs produced
#2: In log(z) : NaNs produced
plot(fevd(x,scale = 0.7534)))
#Warning messages:
#1: In log(z) : NaNs produced
#2: In log(z) : NaNs produced

我们可以清楚地看到拟合度更差，并且我们有一个错误的收敛性或局部最小值问题。警告进一步降低了我们对此配合的信心。我建议您尝试使用具有不同起始值的Matlab。

data-fitting matlab r r