计算置信区间 - Bootstap

问题描述

我正在尝试计算具有 1000 个数字的列表的置信区间,并将其转换为具有两个变量的元组。 然而,我没有得到一个包含两个变量的元组,而是得到一个包含两个数组的元组,每个数组包含 1000 个区间。 这是我的代码

def bootstrap(list):
"""in this line I made 1K lists with 16 numbers that was randomly picked"""
randomize = [[random.choice(list) for _ in list] for _ in range(1000)]
""" after that I used list comprehension and numpy to calculate mean and get 1 list with 1K means"""
means = [np.mean([i for i in sublist]) for sublist in randomize]
```then I tried to create two variable that each one is a sole number that represents the interval```
ci_left,ci_right = tuple(stats.t.interval(0.95,df =len(means) -1,loc = means))
return (ci_left,ci_right)

但我的输出是这样的:

(array([-1.33077651,-1.30684806,-1.35418851,-1.32454884,-1.31485041,-1.28670879,-1.32344893,-1.38127905,-1.35198733,-1.33957749]),array([2.59390641,2.61783486,2.57049441,2.60013409,2.60983251,2.63797414,2.60123399,2.54340387,2.57269559,2.58510543,2.58198925,2.56551404,2.57899741,2.59180679,2.56566707,]))

我想得到的输出示例:

(0.607898431,0.611159753)

感谢任何形式的帮助!

解决方法

问题是我使用了手段变量而不是通过求和并除以len来对手段进行平均,我还需要添加一个比例,这就是答案:

def bootstrap(list):
"""in this line I made 1K lists with 16 numbers that was randomly picked"""
randomize = [[random.choice(list) for _ in list] for _ in range(1000)]
""" after that I used list comprehension and numpy to calculate mean and get 1 list with 1K means"""
means = [np.mean([i for i in sublist]) for sublist in randomize]
ci_left,ci_right = tuple(stats.t.interval(0.95,df =len(means) -1,loc = sum(means)/len(means),scale = scipy.stats.sem(means)))
return (ci_left,ci_right)