KS 测试：R 与 Python

问题描述

我在 R 和 Python 中得到了不同的 KS 测试值。

input_test = [2457.145,878.081,855.118,1157.135,1099.82,880.0,1399.0339999999999]

R 代码：

select v.month_year,a.sub,a.c_f,a.type,a.F_G,a.layer,v.value
from a cross apply
     (values ('dec_2020',a.dec_2020),('jan_2021',a.jan_2021),('feb_2021',a.feb_2021)
     ) v(month_year,value);

R 输出：

> parameters
 meanlog    sdlog 
9.621626 2.220691 

H <- 846.6572

truncgof::ks.test(input_test,'plnorm',parameters,H=H,sim=50)

Python 代码：

data:  input_test
KS = 2.3246,p-value < 2.2e-16
alternative hypothesis: two.sided

treshold = 846.6572,simulations: 50

Python 输出：

import scipy.stats as st
p = [ 2.22096211e+00,1.50686480e+04]
ks = st.kstest(input_test,st.lognorm.cdf,args=([p[0],p[1])]),N=50,alternative='two-sided')

有谁知道如何获得相同（或相似）的结果？是否可以在 Python 中设置阈值（R 中的 H）？

解决方法

尝试以下操作。它使用包 from scipy import ndimage as ndi s = ndi.generate_binary_structure(2,2) labeled_array,num_features = ndi.label(data,structure=s) labeled_array >>> array([[1,1,2],[1,0],[0,0]],dtype=int16) 创建截断的对数正态分布函数，然后使用基 R truncdist 运行 KS 测试。结果与 Python 结果相似但不完全相同。

ks.test

包 library(truncdist) ptrunclnorm <- function(x,H,...){ truncdist::ptrunc(x,spec = "lnorm",a = H,b = Inf,...) } input_test = c(2457.145,878.081,855.118,1157.135,1099.82,880.0,1399.0339999999999) H <- 846.6572 meanlog <- 9.621626 sdlog <- 2.220691 parameters <- c(meanlog,sdlog) set.seed(2020) truncgof::ks.test(input_test,'plnorm',parameters,H=H,sim=50) ks.test(input_test,"ptrunclnorm",H = H,meanlog = meanlog,sdlog = sdlog) # # One-sample Kolmogorov-Smirnov test # #data: input_test #D = 0.8786,p-value = 7.771e-07 #alternative hypothesis: two-sided 具有截断对数正态分布的函数，因此不需要对其进行定义。

EnvStats

输出和上面一样。

python r r scipy scipy statistics