如何创建一个函数来测试每个变量的正态性

问题描述

我正在尝试构建一个迭代返回 i) JarqueBera test stat,ii) JarqueBera pvalue,iii) probplot 的斜率、截距和确定系数,以及 iv) probplot 本身的函数。 All 旨在一次返回一个变量。

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c,plot = plt)[1][0]
    probplot_interc = ss.probplot(c,plot = plt)[1][1]
    probplot_r = ss.probplot(c,plot = plt)[1][2]
    return(print("Skewness:",c.skew(),"\nExcess kurtosis:",c.kurt(),"\nJarque-Bera stat:",JB_test_stat," pvalue:",JB_pval,"\nSlope:",probplot_slope,"Intercept:",probplot_interc,"r:",probplot_r,"\n"))

不幸的是,当我在我的数据帧 [numeric_cols] 上调用函数时,将 numeric_cols 变成一个列表,

for c in numeric_cols:    
    normality(df[c])

我在 return 语句中正确地得到了所有的数值结果,但在底部一个单一的 probplot,所有变量都以凌乱的方式绘制,而我期望的是获得每个变量的数值结果及其相应的 probplot。

偏度:0.1004187952160102 过量峰度:-0.543819517693596 Jarque-Bera 统计:7.593972235734294 pvalue:0.022438296430201454 斜率:4.3135147782152465 截距:25.5 r:0.9947611456706487

偏度:-0.1560130144763728 超峰度:-1.2824901951466612 Jarque-Bera 统计:38.56183464454786 pvalue:4.23061985443951e-09 斜率:11.492550446207257 截距:19.535714285714285 r:0.9668502992894236

偏度:0.2347601433103727 超峰度:-1.242639192300385 Jarque-Bera 统计:39.0662449724179 pvalue:3.287552452491127e-09 斜率:11.545683807955731 截距:15.714285714285714 r:0.9647448407831439

偏度:0.24353437856100904 超峰度:-1.1969521906230485 Jarque-Bera 统计:36.98912338336009 pvalue:9.287822622106034e-09 斜率:1013.985374629207 截距:1411.4436090225563 r:0.9682492605786011

偏度:2.837876986150242 超峰度:9.5166283306540​​08 Jarque-Bera 统计:2675.4455000782764 pvalue:0.0 斜率:2.6057664781688454 截距:1.8533834586466167 r:0.7776054895177505

偏度:2.406153102778617 超峰度:7.002529753885085 Jarque-Bera 统计:1573.6596724989513 pvalue:0.0 斜率:1.714847443415902 截距:1.287593984962406 r:0.8152919114915671

偏度:0.9337529310147361 超峰度:0.45862734243889847 Jarque-Bera 统计:81.22389376608798 pvalue:0.0 斜率:605.3354149443196 截距:717.75 r:0.9550404156079808

偏度:-3.030640857636996 超峰度:15.686541621050898 Jarque-Bera 统计:6154.761075129672 pvalue:0.0 斜率:11.37955609488042 截距:77.82387218045113 r:0.8711740556551902

偏度:6.398317104228115 超峰度:49.10097819497357 Jarque-Bera 统计:56029.69126113364 pvalue:0.0 斜率:0.41431397013222515 截距:0.1917293233082707 r:0.48503363895959983

偏度:6.204252341215679 超峰度:47.28662289867727 Jarque-Bera 统计:52010.755388690835 pvalue:0.0 斜率:0.4947086253584861 截距:0.23496240601503762 r:0.5050004904368586

偏度:2.06633193738682 超峰度:5.770784034742405 Jarque-Bera 统计:1098.0175308306793 pvalue:0.0 斜率:0.12821997057404685 截距:0.11328947368421052 r:0.8619773533976459

偏度:2.9189857433086495 超峰度:16.837230233306762 Jarque-Bera 统计:6909.724155123523 pvalue:0.0 斜率:0.07805612907589729 截距:0.07265037593984962 r:0.8632361803763113

偏度:1.2633082232077495 超峰度:1.5265390704578943 Jarque-Bera 统计:190.6495836394772 pvalue:0.0 斜率:2.09821120102269 截距:2.1146616541353382 r:0.9211028014650718

偏度:3.091346622737553 超峰度:8.530683362863476 Jarque-Bera 统计:2421.371001114453 pvalue:0.0 斜率:0.16657862407594715 截距:0.09022556390977444 r:0.5658043763386988

enter image description here

怎么解决? 先谢谢大家

解决方法

只需在您的函数中添加一个 plt.figure(),这样每次调用该函数都会打开一个新图形。
另一方面,使用 return(print('stuff')) 是多余的。如果您真的想打印结果,那么只需使用 print 而不使用 return
返回您当前正在打印的值,然后在外部打印它们会更加 Pythonic 和通常更好的做法:

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c,plot = plt)[1][0]
    probplot_interc = ss.probplot(c,plot = plt)[1][1]
    probplot_r = ss.probplot(c,plot = plt)[1][2]
    return c.skew(),c.kurt(),JB_test_stat,JB_pval,probplot_slope,probplot_interc,probplot_r


for c in numeric_cols:    
    c.skew(),probplot_r = normality(df[c])
    print("Skewness:",c.skew(),"\nExcess kurtosis:","\nJarque-Bera stat:"," pvalue:",B_pval,"\nSlope:","Intercept:","r:",probplot_r,"\n")