对数据框列中的所有唯一值运行 Spicy.stats ANOVA 测试?

问题描述

我有一个由许多城市及其相应温度组成的数据框:

               CurrentThermostatTemp
City                                
Cradley Heath                   20.0
Cradley Heath                   20.0
Cradley Heath                   18.0
Cradley Heath                   15.0
Cradley Heath                   19.0
...                              ...
Walsall                         16.0
Walsall                         22.0
Walsall                         20.0
Walsall                         20.0
Walsall                         20.0

[6249 rows x 1 columns]

唯一值是:

Index(['Cradley Heath','ROWLEY REGIS','Smethwick','Oldbury','West bromwich','Bradford','Bournemouth','Poole','Wareham','Wimborne',...
       'St. Helens','Altrincham','runcorn','Widnes','St Helens','Wakefield','Castleford','Pontefract','Walsall','Wednesbury'],dtype='object',name='City',length=137)

我的目标是进行单向方差分析,即

from scipy.stats import f_oneway

用于数据框中的所有唯一值。这样做

SciPy.stats.f_oneway("all unique values")

并接收输出:所有变量的单向方差分析给出 {} 和 p 值 {} 这是我尝试过很多次但不起作用的方法

all = Tempvs.index.unique()
Tempvs.sort_index(inplace=True)
for n in range(len(all)):
    truncated = Tempvs.truncate(all[n],all[n])
    print(f_oneway(truncated))

解决方法

IIUC 您需要一个方差分析测试,其中每个样本都包含唯一元素 Temp 的值 City。如果是这种情况,您可以这样做

import numpy as np
import pandas as pd
import scipy.stats as sps

# I create a sample dataset
index = ['Cradley Heath','ROWLEY REGIS','Smethwick','Oldbury','West Bromwich','Bradford','Bournemouth','Poole','Wareham','Wimborne','St. Helens','Altrincham','Runcorn','Widnes','St Helens','Wakefield','Castleford','Pontefract','Walsall','Wednesbury']
np.random.seed(1)
df = pd.DataFrame({
    'City': np.random.choice(index,500),'Temp': np.random.uniform(15,25,500)
})

# populate a list with all
# values of unique Cities
values = []
for city in df.City.unique():
    _df = df[df.City==city]
    values.append(_df.Temp.values)

# compute the ANOVA
# with starred *list
# as arguments
sps.f_oneway(*values)

在这种情况下,将给

F_onewayResult(statistic=0.4513685152123563,pvalue=0.9788508507035195)

PS:不要使用all作为变量,因为它是python内置函数,见https://docs.python.org/3/library/functions.html#all