问题描述
这是我的代码:
import pandas as pd
import matplotlib.pyplot as plt
wine = pd.read_csv('red wine quality.csv')
wine = wine.dropna()
wine_density = wine[['density']] # Isolating column 'density'
wine_density = wine_density.squeeze() # Reformating as pd.Series
wine_density = pd.to_numeric(wine_density) # Converting from Str to numeric
from scipy import stats
stat,p1 = stats.shapiro(wine_density)
print('Statistics = {0:.3f},p = {1:.3f}'.format(stat,p1))
alpha = 0.05
if p1 > alpha:
print('Fail to reject H0,sample is normal.')
else:
print('Reject H0,sample is NOT normal.)')
当我运行这段代码时,我得到:
runcell(9,'C:/Users/Adam/Desktop/DSCI 200/Week 8/Week8_Final_Project.py')
Statistics = 0.991,p = 0.000
Reject H0,sample is NOT normal.)
但是,当我分别运行图形正态性测试(例如 qqtest 或直方图)时,我得到:
from statsmodels.graphics.gofplots import qqplot
qqplot(wine_density,line = 's')
plt.show()
和
plt.hist(wine_density,bins=50)
plt.show()
对我来说,这两个图像看起来都像正态分布,但即使不是,获得绝对零的 p 值似乎也是错误的。我有什么误解和/或做错了什么?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)