在Python中将散点图变成直方图 2020-11-23来自答案评论的更新：

问题描述

我需要从另一个文件中的某些数据绘制直方图。

目前，我有绘制散点图并适合高斯分布的代码。

x值是它正在读取的数据文件中相应行上的数字（其他信息的前12行之后，即第13行是第一个事件），y值是数字线乘以一个值。

然后绘制并拟合散点图，但是我需要能够将其绘制为直方图，并能够更改bin的宽度/数字（即，将bin 1、2、3和4加在一起，使其具有1/4总箱数是事件数的4倍-因此我想将数据中的多行加在一起），这就是我遇到的问题。

如何将其制作成直方图并调整宽度/数字？

下面的代码，不知道如何使其漂亮。让我知道我是否可以使其更容易阅读。

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from numpy import exp,loadtxt,pi,sqrt,random,linspace
from lmfit import Model
import glob,os

## Define gaussian
def gaussian(x,amp,cen,wid):
    """1-d gaussian: gaussian(x,wid)"""
    return (amp / (sqrt(2*pi) * wid)) * exp(-(x-cen)**2 / (2*wid**2))

## Define constants
stderrThreshold = 10
minimumAmplitude = 0.1
approxcen = 780
MaestroT = 53

## Define paramaters
amps = []; ampserr = []; ts = []
folderToAnalyze = baseFolder + filetoRun + '\\'

## Generate the time array

for n in range(0,numfiles):
    
    ## Load text file
    x = np.linspace(0,8191,8192) 
    fullprefix = folderToAnalyze + prefix + str(n).zfill(3)
    y = loadtxt(fullprefix + ".Spe",skiprows= 12,max_rows = 8192) 

    ## Make figure
    fig,ax = plt.subplots(figsize=(15,8))
    fig.suptitle('Coincidence Detections',fontsize=20)
    plt.xlabel('Bins',fontsize=14)
    plt.ylabel('Counts',fontsize=14)

    ## Plot data
    ax.plot(x,y,'bo')
    ax.set_xlim(600,1000)

    ## Fit data to Gaussian
    gmodel = Model(gaussian)
    result = gmodel.fit(y,x=x,amp=8,cen=approxcen,wid=1)

    ## Plot results and save figure
    ax.plot(x,result.best_fit,'r-',label='best fit')
    ax.legend(loc='best')
    texttoplot = result.fit_report()
    ax.text(0.02,0.5,texttoplot,transform=ax.transAxes)
    plt.close()
    fig.savefig(fullprefix + ".png",pad_inches='0.5')

当前输出：散点图，确实显示了数据的预期分布和图（但是它们确实减少了chi ^ 2的cr脚，但一次出现一个问题）

预期输出：当将每个事件绘制为单独的仓时，具有相同数据，分布和拟合度的直方图图，并希望可以将这些仓加在一起以减少误差线

错误：N / A

数据：基本上是8192行以上的标准分布。 1个文件的完整数据为here。也是原始的.Spe 文件，分散的plot和完整版本的code

2020-11-23来自答案评论的更新：

嗨，我一直在尝试实现这一点，但一直没有解决。我试图严格按照您的示例进行操作，但是我仍然使用bin宽度为1（即未加在一起）的直方图。我还在打印输出中得到了第二张空白图形，并且报告仅在IDE中打印输出（尽管我正在处理该图形，并且我很快就会知道）。同样由于某种原因，它似乎在循环的50次迭代中停止了3次。
这是code的当前状态：
这是我得到的输出：

这是原始输出：
以防万一，它是raw data。我似乎无法复制您的最后2个数字
理想的方法是能够将第30行的常数更改为所需的bin宽度，并在该情况下以该bin宽度运行。

解决方法

在这种情况下，scatter plot是一个直方图，只是点而不是条形。
.Spe是每个事件的垃圾箱计数。
x = np.linspace(0,8191,8192)定义了垃圾箱，垃圾箱宽度为1。
构造条形图而不是散点图
- ax.bar(x,y)而非ax.plot(x,y,'bo')
根据现有数据，下图是分布非常广泛的直方图。
- 取值范围从321到1585
- ax.set_xlim(300,1800)

此数据的好处在于，可以很容易地基于x重新创建原始分布，bin大小为1，并且y是每个x的各自计数。 / li>
np.repeat可以创建包含重复元素的数组

import numpy
import matplotlib.pyplot as plt

# given x and y from the loop
# set the type as int
y = y.astype(int)
x = x.astype(int)

# create the data
data = np.repeat(x,y)

# determine the range of x
x_range = range(min(data),max(data)+1)

# determine the length of x
x_len = len(x_range)

# plot
fig,(ax1,ax2) = plt.subplots(nrows=2,figsize=(10,10))

ax1.hist(data,bins=x_len)  # outliers are not plotted
ax2.boxplot(data,vert=False)
plt.show()

鉴于data，您现在可以执行所需的任何分析。
SO: Fit gaussian to noisy data with lmfit
LMFIT Docs
Cross Validated可能是深入研究模型的更好网站
所有错误计算参数都来自模型result。如果您针对不同的纸槽宽度从x计算新的y和np.histogram，可能会影响错误。
- approxcen = 780也是result

# given x_len determine how many bins for a given bin width
width = 8
bins = int(np.round(x_len / width))

# determine new x and y for the histogram
y,x = np.histogram(data,bins=bins)

# Fit data to Gaussian
gmodel = Model(gaussian)
result = gmodel.fit(y,x=x[:-1],amp=8,cen=approxcen,wid=1)

# result
print(result.fit_report())

[out]:
[[Model]]
    Model(gaussian)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 314
    # data points      = 158
    # variables        = 3
    chi-square         = 397.702574
    reduced chi-square = 2.56582306
    Akaike info crit   = 151.851284
    Bayesian info crit = 161.039069
[[Variables]]
    amp:  1174.80608 +/- 37.1663147 (3.16%) (init = 8)
    cen:  775.535731 +/- 0.46232727 (0.06%) (init = 780)
    wid:  12.6563219 +/- 0.46232727 (3.65%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp,wid) =  0.577

# plot
plt.figure(figsize=(10,6))
plt.bar(x[:-1],y)
plt.plot(x[:-1],result.best_fit,'r-',label='best fit')

plt.figure(figsize=(20,8))
plt.bar(x[:-1],y)
plt.xlim(700,850)
plt.plot(x[:-1],label='best fit')
plt.grid()

从下一个代码块可以看到，该错误与以下参数有关
- stderrThreshold = 10
- minimumAmplitude = 0.1
- MaestroT = 53

    ## Append to list if error in amplitude and amplitude itself is within reasonable bounds
    if result.params['amp'].stderr < stderrThreshold and result.params['amp'] > minimumAmplitude:
        amps.append(result.params['amp'].value) 
        ampserr.append(result.params['amp'].stderr) 
        ts.append(MaestroT*n)

histogram matplotlib plot python scatter-plot