在正态分布上获得 p 值为 0

问题描述

这是我的代码

import pandas as pd
import matplotlib.pyplot as plt

wine = pd.read_csv('red wine quality.csv')
wine = wine.dropna()
wine_density = wine[['density']] # Isolating column 'density'
wine_density = wine_density.squeeze() # Reformating as pd.Series
wine_density = pd.to_numeric(wine_density) # Converting from Str to numeric
from scipy import stats
stat,p1 = stats.shapiro(wine_density)
print('Statistics = {0:.3f},p = {1:.3f}'.format(stat,p1))
alpha = 0.05
if p1 > alpha:
    print('Fail to reject H0,sample is normal.')
else:
    print('Reject H0,sample is NOT normal.)')

当我运行这段代码时,我得到:

runcell(9,'C:/Users/Adam/Desktop/DSCI 200/Week 8/Week8_Final_Project.py')
Statistics = 0.991,p = 0.000
Reject H0,sample is NOT normal.)

但是,当我分别运行图形正态性测试(例如 qqtest 或直方图)时,我得到:

from statsmodels.graphics.gofplots import qqplot
qqplot(wine_density,line = 's')
plt.show()

Normality qqtest

plt.hist(wine_density,bins=50)
plt.show()

Normality histogram

对我来说,这两个图像看起来都像正态分布,但即使不是,获得绝对零的 p 值似乎也是错误的。我有什么误解和/或做错了什么?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)