从现有数据生成非标准正态分布

问题描述

给定样本大小为“n”的 Pandas 数据框中的非标准正态分布,是否有可靠的方法生成/模拟与原始分布具有相同矩的另一个分布(具有用户指定的样本大小)?>

我有什么...

import pandas as pd

row_list = [{"Bin": 1,"Value": 0,},{"Bin": 2,{"Bin": 3,"Value": 1,{"Bin": 4,"Value": 12,{"Bin": 5,"Value": 24,{"Bin": 6,"Value": 89,{"Bin": 7,"Value": 203,{"Bin": 8,"Value": 460,{"Bin": 9,"Value": 884,{"Bin": 10,"Value": 1539,{"Bin": 11,"Value": 2638,{"Bin": 12,"Value": 4278,{"Bin": 13,"Value": 6446,{"Bin": 14,"Value": 8942,{"Bin": 15,"Value": 11990,{"Bin": 16,"Value": 15484,{"Bin": 17,"Value": 18791,{"Bin": 18,"Value": 22449,{"Bin": 19,"Value": 25985,{"Bin": 20,"Value": 29209,{"Bin": 21,"Value": 32027,{"Bin": 22,"Value": 33812,{"Bin": 23,"Value": 35048,{"Bin": 24,"Value": 36089,{"Bin": 25,"Value": 36512,{"Bin": 26,"Value": 35993,{"Bin": 27,"Value": 35890,{"Bin": 28,"Value": 33990,{"Bin": 29,"Value": 32915,{"Bin": 30,"Value": 31471,{"Bin": 31,"Value": 29438,{"Bin": 32,"Value": 27672,{"Bin": 33,"Value": 25154,{"Bin": 34,"Value": 23347,{"Bin": 35,"Value": 21283,{"Bin": 36,"Value": 19520,{"Bin": 37,"Value": 17730,{"Bin": 38,"Value": 15732,{"Bin": 39,"Value": 14380,{"Bin": 40,"Value": 12665,{"Bin": 41,"Value": 11182,{"Bin": 42,"Value": 9839,{"Bin": 43,"Value": 8846,{"Bin": 44,"Value": 7736,{"Bin": 45,"Value": 6653,{"Bin": 46,"Value": 5829,{"Bin": 47,"Value": 5153,{"Bin": 48,"Value": 4368,{"Bin": 49,"Value": 3752,{"Bin": 50,"Value": 3397,{"Bin": 51,"Value": 2790,{"Bin": 52,"Value": 2415,{"Bin": 53,"Value": 2079,{"Bin": 54,"Value": 1779,{"Bin": 55,"Value": 1508,{"Bin": 56,"Value": 1302,{"Bin": 57,"Value": 1087,{"Bin": 58,"Value": 899,{"Bin": 59,"Value": 790,{"Bin": 60,"Value": 731,{"Bin": 61,"Value": 638,{"Bin": 62,"Value": 486,{"Bin": 63,"Value": 464,{"Bin": 64,"Value": 415,{"Bin": 65,"Value": 328,{"Bin": 66,"Value": 255,{"Bin": 67,"Value": 227,{"Bin": 68,"Value": 187,{"Bin": 69,"Value": 182,{"Bin": 70,"Value": 123,{"Bin": 71,"Value": 141,{"Bin": 72,"Value": 112,{"Bin": 73,"Value": 101,{"Bin": 74,"Value": 84,{"Bin": 75,"Value": 71,{"Bin": 76,"Value": 60,{"Bin": 77,"Value": 38,{"Bin": 78,"Value": 43,{"Bin": 79,"Value": 51,{"Bin": 80,"Value": 27,{"Bin": 81,"Value": 15,{"Bin": 82,"Value": 25,{"Bin": 83,"Value": 23,{"Bin": 84,"Value": 14,{"Bin": 85,"Value": 13,{"Bin": 86,{"Bin": 87,"Value": 7,{"Bin": 88,{"Bin": 89,{"Bin": 90,"Value": 8,{"Bin": 91,"Value": 11,{"Bin": 92,"Value": 3,{"Bin": 93,"Value": 5,{"Bin": 94,"Value": 2,{"Bin": 95,{"Bin": 96,{"Bin": 97,{"Bin": 98,{"Bin": 99,{"Bin": 100,{"Bin": 101,{"Bin": 102,{"Bin": 103,{"Bin": 104,{"Bin": 105,}]


df = pd.DataFrame(row_list)

# df.shape = (105,2)

输入数据示例图(Bin vs Value)

Bin vs Value

我想生成什么...

# gather moments (mvsk) of this data set
mean = df["Value"].mean()
var = df["Value"].var()
skew = df["Value"].skew()
kurt = df["Value"].kurt()

input_sample_size = sum(df["Value"]) # ~786K

desired_sample_size = 100e6 # 100 million


# This is where I seek help...
new_distribution = generate_dist(mean,var,skew,kurt,desired_sample_size)



本质上,我的目标是模拟具有与原始分布相同特征的更大的分布。因此,输出图看起来与输入图大致相似,只是它有更多的 Bins

旁注:我希望所有传入的数据都具有正偏度和长尾。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)