如何将每日数据绘制为月平均值不同年份

问题描述

我正在尝试绘制一个图表来表示从 1980-01-01 到 2013-12-31 的每月河流流量数据集。

请查看此graph

计划将“Jan Feb Mar Apr May...Dec”绘制为 x 轴，将排放量 (m3/s) 绘制为 y 轴。图表上的实际线条将代表年份。或者，图表上的线条将显示 1980 年至 2013 年每年的月平均值（从 1 月到 12 月）。

  DAT = pd.read_excel('Modelled discharge_UIB_1980-2013_Daily.xlsx',sheet_name='Karhmong',header=None,skiprows=1,names=['year','month','day','flow'],parse_dates={ 'date': ['year','day'] },index_col='date')

上面是显示它是什么类型的数据


date        flow
1980-01-01  104.06
1980-01-02  103.81
1980-01-03  103.57
1980-01-04  103.34
1980-01-05  103.13
... ...
2013-12-27  105.65
2013-12-28  105.32
2013-12-29  105.00
2013-12-30  104.71
2013-12-31  104.42

因为我想将所有年份相互比较，所以我尝试了以下命令

DAT1980 = DAT[DAT.index.year==1980]
DAT1980

DAT1981 = DAT[DAT.index.year==1981
DAT1981

...等

在对 x 轴的月份进行分组方面，我尝试使用命令对月份进行分组

datmonth = np.unique(DAT.index.month)

到目前为止所有这些命令都没有导致错误

但是当我绘制图表时，我得到了这个错误

图形绘图命令

fig,ax = plt.subplots(nrows=1,ncols=1,figsize=(12,6))

ax.plot(datmonth,DAT1980,color='purple',linestyle='--',label='1980')
ax.grid()

plt.legend()

ax.set_title('Monthly River Indus discharge Comparison 1980-2013')
ax.set_ylabel('discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3,5)


axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left',bBox_to_anchor=(1,1))

我得到“ValueError：x 和 y 必须具有相同的第一维，但具有形状 (12,) 和 (366,1)”作为错误

然后我尝试了

fig,6))

ax.plot(DAT.index.month,DAT.index.year==1980,label='1980')
ax.grid()

ax.plot(DAT.index.month,DAT.index.year==1981,color='black',marker='o',linestyle='-',label='C1981')
ax.grid()


plt.legend()

ax.set_title('Monthly River Indus discharge Comparison 1980-2013')
ax.set_ylabel('discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1,12)


axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left',1))

它比之前的图表效果更好，但仍然不是我想要的 (please check out the graph here)

因为我的目的是创建一个类似于 this

我衷心感谢您提出的任何建议！非常感谢您，如果您需要任何进一步的信息，请不要犹豫，我会尽快回复。

解决方法

欢迎来到 SO！干得好，清晰地描述了您的问题并展示了大量代码:)

这里和那里有一些语法问题，但我看到的主要问题是您需要在某个时候添加 groupby/aggregation 操作。也就是说，您有每日数据，但您想要的图具有每月分辨率（每年）。听起来您想要每年每个月的每日值的平均值（如果有误，请纠正我）。

这是一些假数据：

dr = pd.date_range('01-01-1980','12-31-2013',freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow,columns=['flow'],index=dr)

看起来像你的例子：

                flow
1980-01-01  0.751287
1980-01-02  0.411040
1980-01-03  0.134878
1980-01-04  0.692086
1980-01-05  0.671108
             ...
2013-12-27  0.683654
2013-12-28  0.772894
2013-12-29  0.380631
2013-12-30  0.957220
2013-12-31  0.864612

[12419 rows x 1 columns]

您可以使用 groupby 获得每个月的平均值，使用与上面相同的日期时间属性（还有一些额外的方法来帮助使数据更易于处理）

monthly = (df.groupby([df.index.year,df.index.month])
           .mean()
           .rename_axis(index=['year','month'],)
           .reset_index())

monthly 有每年每个月的流量数据，即您要绘制的内容：

     year  month      flow
0    1980      1  0.514496
1    1980      2  0.633738
2    1980      3  0.566166
3    1980      4  0.553763
4    1980      5  0.537686
..    ...    ...       ...
403  2013      8  0.402805
404  2013      9  0.479226
405  2013     10  0.446874
406  2013     11  0.526942
407  2013     12  0.599161

[408 rows x 3 columns]

现在要绘制单个年份，您可以从 monthly 对其进行索引并绘制流量数据。我使用了你的大部分轴格式：

# make figure
fig,ax = plt.subplots(nrows=1,ncols=1,figsize=(12,6))

# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'],sub['flow'],color='purple',linestyle='--',label='1980')

# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1,13))
ax.set_xticklabels(['J','F','M','A','J','S','O','N','D'])
ax.legend()
ax.grid()

生产以下内容：

您可以改为使用某种循环来绘制几年：

years = [1980,1981,1982,...]
for year in years:
    sub = monthly[monthly['year'] == year]
    ax.plot(sub['month'],...)

你们在这里遇到了其他一些挑战（比如找到一种方法来为 30 多行设置漂亮的样式，并在循环中这样做）。如果您无法通过此处的其他帖子了解如何完成某事，您可以打开一个新帖子（以该帖子为基础）。祝你好运！

dimensions dimensions matplotlib plot python timeserieschart