Python内核平滑

问题描述

我有一些要在Python中复制的R代码。在R文件中，我有一个数据框，并使用

对数据框的一列进行平滑处理

smoothedTime <- ksmooth(1:length(df$time),df$time,bandwidth=100,x.points=(1:length(df$time)))$y

在Python中，我使用scikit-fda库和skfda.preprocessing.smoothing.kernel_smoothers.NadarayaWatsonSmoother()进行平滑处理，并将smoothing_parameter设置为100，因为这是R ksmooth函数的基础上。我遇到的问题是我得到的平滑效果不一样。默认情况下，ksmooth中的内核为c("Box","normal")，但是我没有看到NadarayaWatsonSmoother()的盒装内核。因此，由于NadarayaWatsonSmoother()默认情况下具有正常内核，因此我尝试了

smoothedTime <- ksmooth(1:length(df$time),kernel=c("normal"),x.points=(1:length(df$time)))$y

，结果仍然不同。我想知道为什么我没有得到相同的答案，以及如何才能获得相同的答案。

相关代码是

Python代码：

import skfda
from skfda import FDataGrid
from skfda.misc import kernels
import skfda.preprocessing.smoothing.kernel_smoothers as ks

myTime = [-0.01,-0.02,-0.01,-0.04,-0.05,-0.07,-0.1,-0.12,-0.15,-0.19,-0.22,-0.26,-0.27,-0.31,-0.33,-0.36,-0.38,-0.4,-0.42,-0.44,-0.46,-0.47,-0.48,-0.49,-0.5,-0.51,-0.45,-0.43,-0.41,-0.39,-0.37,-0.34,-0.32,-0.35,-0.52,-0.55,-0.58,-0.6,-0.6]
fd = FDataGrid(sample_points=[*range(1,len(myTime)+1)],data_matrix=[myTime])
smoother = ks.NadarayaWatsonSmoother(smoothing_parameter=100)
smoothed = smoother.fit_transform(fd)

R代码：

df$time <- c(-0.01,-0.6)
smoothedTime <- ksmooth(1:length(df$time),kernel="normal",x.points=(1:length(df$time)))$y

解决方法

这种行为的原因是 R 中的 ksmooth 函数对不同的内核有不同的缩放（参见 source code），而 scikit-fda 只是除以应用内核之前通过的带宽。如果将 smoothing_parameter 乘以 0.3706506（对于普通内核）或乘以 0.5（对于盒内核；注意这个内核也可以是用于 scikit-fda 传递参数 kernel=skfda.misc.kernels.uniform）。

免责声明：我是 scikit-fda 的维护者。抱歉，我迟到了，但当此页面中出现提及它的问题时，我不会收到通知。如果您以后有关于包裹的问题，您可以尝试打开 issue 或 discussion。我会收到这些通知，通常可以在几小时或几天内回复。

kernel-density python r r scikit-learn smoothing