问题描述
我正在将波高数据集(m)拟合到广义的pareto分布。我已经选择了0.15m的阈值。我正在从Matlab过渡到R,所以我想比较两个程序的结果。我知道这两个程序不会提供相同的确切答案,但是我希望得到的结果彼此接近。
当然,对于两种拟合,我使用相同的数据和相同的阈值。两种方法都通过MLE估计参数。在Matlab中,fitdist函数不会估计阈值。假定它是已知的(在这种情况下为0.15m),并在调用它之前从数据中减去(我已经这样做了)。在R中,阈值是输入的一部分。
我使用的功能是:
Matlab:
fitdist(data_above_threshold,'Generalized Pareto')
R:
fevd(data,threshold =0.15,type = 'GP',method = "MLE")
。 (extRemes软件包)
在Matlab中,我得到:
Shape:-1.0301 Scale:0.7534
在R中我得到Shape:-0.5848505 Scale:0.3620191
我在这里附加数据,以便您自己查看结果:
c(0.194101337780203,0.289791483274648,0.313940773547535,0.184577674010614,0.102266008573448,0.045464156804826,0.0486387113946889,0.143761972140945,0.267342847246331,0.242966803074167,0.069386693178437,0.0099771715681416,0.00736950172646855,0.056121590070795,0.0759625562574393,0.145009118586962,0.258045937376017,0.379926158236833,0.236844447793717,0.0861664817248564,0.0503393656392586,0.156233436601121,0.118138781522764,4.44089209850063e-16,0.0442170103588082,0.0143988726040223,0.148183673176825,0.197729400168618,0.026416829265647,0.133558046673528,0.209180472082052,0.236050809146251,0.0976175536382913,0.0292512530065965,0.101812500774896,0.174260371593558,0.219271020599832,0.463144839271102,0.579809720448572,0.533211794147367,0.354756475417204,0.360085192050189,0.632756755929504,0.651577329569406,0.413372358380034,0.39353139219339,0.343985665201597,0.0630375839987112,0.0434233717113424,0.0566884748189849,0.147730165378273,0.175734271938852,0.0827651732357175,0.0385481628769098,0.0378679011790819,0.0463711724019298,0.0200677200859207,0.0956901454944461,0.128909591738371,0.141154302299272,0.0459176646033779,0.0285709913087686,0.251696828196291,0.642847304447283,0.731394702114536,0.603732256822183,0.427997984883332,0.635251048821539)
所以,我的问题是:如果两个都使用MLE估计参数,那么结果为何如此不同?那么哪种配合更可靠,或者您建议使用哪种呢?
解决方法
我认为问题在于您的数据暗示了某种多峰性,而Matlab使用的是次优起始值。
数据的经验密度图:
plot(density(x,bw = "SJ"))
首先让我们绘制R拟合:
library(SpatialExtremes)
library(extRemes)
plot(fevd(x,threshold =0.15,type = 'GP',method = "MLE"))
左下角的图表明我们已经达到了不错的配合。 (请注意,此图中的内核密度估计使用了与上面不同的带宽。)
现在,我们将您的Matlab结果用作起始值:
fevd(x,method = "MLE",initial = list(shape = -1.0301,scale = 0.7534))
#fevd(x = x,threshold = 0.15,type = "GP",# initial = list(shape = -1.0301,scale = 0.7534))
#
#[1] "Estimation Method used: MLE"
#
#
# Negative Log-Likelihood Value: -18.74149
#
#
# Estimated parameters:
# scale shape
# 0.6242056 -1.0736348
#
# Standard Error Estimates:
# scale shape
#0.000000020 0.001368606
#
# Estimated parameter covariance matrix.
# scale shape
#scale 4.000000e-16 -2.598691e-17
#shape -2.598691e-17 1.873082e-06
#
# AIC = -33.48297
#
# BIC = -30.5515
#Warning messages:
#1: In log(z) : NaNs produced
#2: In log(z) : NaNs produced
plot(fevd(x,scale = 0.7534)))
#Warning messages:
#1: In log(z) : NaNs produced
#2: In log(z) : NaNs produced
我们可以清楚地看到拟合度更差,并且我们有一个错误的收敛性或局部最小值问题。警告进一步降低了我们对此配合的信心。我建议您尝试使用具有不同起始值的Matlab。