问题描述
网络上的数据源/文件位置为:https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv 但是由于存在连接问题,我在本地保存了它('esms_seasonallyadjusted_diffusion.csv'), 无论如何,最好的做法是 我也将其保存到github: https://github.com/me50/hlar65/blob/master/ESMS_SeasonallyAdjusted_Diffusion.csv')
2个问题w:
- 当尝试访问该网络链接时(即使单击该链接也会下载文件),但出现连接错误:“ URLError:
” - 我的代码(我是初学者!)看起来很笨拙。有没有更干净的更好的表达方式?
感谢大家的帮助 '''
import pandas as pd
import numpy as np
from pandas.plotting import scatter_matrix
import scipy as sp
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sn
df = dd.read_csv('https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv')
df = df.rename(columns={'surveyDate':'Date','GACDISA': 'IndexAll','NECDISA': 'NumberofEmployees','NOCDISA': 'NewOrders','PPCDISA': 'PricesPaid','PRCDISA': 'PricesReceived'})
headers = df.columns
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date',inplace=True)
IndexAll = df['IndexAll']
NumberofEmployees = df['NumberofEmployees']
NewOrders = df['NewOrders']
PricesReceived = df['PricesReceived']
data = df[['IndexAll','NumberofEmployees','NewOrders','PricesReceived']]
data2 = data.copy()
ds = data2
FS_A = 14
FS_L = 16
FS_T = 20
FS_MT = 25
fig,((ax0,ax1),(ax2,ax3)) = plt.subplots(nrows=2,ncols=2,figsize=(20,15))
# density=True : probability density i.e. prb of an outcome; False = actual # of frequency
ds['IndexAll'].plot(ax=ax0,color='red')
ax0.set_title('New York Empire Manufacturing Index',fontsize = FS_T)
ax0.set_ylabel('Date',fontsize = FS_A)
ax0.set_xlabel('Empire Index',fontsize = FS_L)
ax0.tick_params(labelsize=FS_A)
ds['NumberofEmployees'].plot(ax=ax1,color='blue')
ax1.set_title('Empire: Number of Employees',fontsize = FS_T)
ax1.set_ylabel('Date',fontsize = FS_L)
ax1.set_xlabel('Number of Employees',fontsize = FS_L)
ax1.tick_params(labelsize=FS_A)
ds['NewOrders'].plot(ax=ax2,color='green')
ax2.set_title('Empire: New Orders',fontsize = FS_T)
ax2.set_ylabel('Date',fontsize = FS_L)
ax2.set_xlabel('New Orders',fontsize = FS_L)
ax2.tick_params(labelsize=FS_A)
ds['PricesReceived'].plot(ax=ax3,color='black')
ax3.set_title('Empire: Prices Received',fontsize = FS_T)
ax3.set_ylabel('Date',fontsize = FS_L)
ax3.set_xlabel('Prices Received',fontsize = FS_L)
ax3.tick_params(labelsize=FS_A)
fig.tight_layout()
fig.suptitle('New York Manufacturing Index Main Components - Showing the Depths of COVD19 in 2020',fontsize = FS_MT)
fig.tight_layout()
fig.subplots_adjust(top=0.88)
fig.subplots_adjust(bottom = -0.2)
fig.savefig("Empire.png")
plt.show()
'''
解决方法
对于第一个问题,您是否设置了代理?我认为它来自代理设置。
关于第二个,我可以做一些清理,但是它很大程度上取决于开发人员的代码风格。您可以通过几种不同的方式编写脚本。
请注意:
- 您可以在read_csv调用中解析日期
- 使用括号一步一步完成所有想要使用df的操作
- 您可以为参数定义数组,并在for循环中绘制所有图
df = (
pd
.read_csv('https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv',parse_dates=['surveyDate'])
.rename(columns={'surveyDate':'Date','GACDISA': 'IndexAll','NECDISA': 'NumberofEmployees','NOCDISA': 'NewOrders','PPCDISA': 'PricesPaid','PRCDISA': 'PricesReceived'})
.set_index('Date')
)
FS_A = 14
FS_L = 16
FS_T = 20
FS_MT = 25
titles = ['New York Empire Manufacturing Index','Empire: Number of Employees','Empire: New Orders','Empire: Prices Received']
xlabels = ['Empire Index','Number of Employees','New Orders','Prices Received']
colors=['red','blue','green','black']
columns = ['IndexAll','NumberofEmployees','NewOrders','PricesReceived']
ds = df[columns]
k=0
fig,axes = plt.subplots(nrows=2,ncols=2,figsize=(20,15))
for i in range(2):
for j in range(2):
ds[columns[k]].plot(ax=axes[i][j],color=colors[k])
axes[i][j].set_title(titles[k],fontsize = FS_T)
axes[i][j].set_ylabel('Date',fontsize = FS_A)
axes[i][j].set_xlabel(xlabels[k],fontsize = FS_L)
axes[i][j].tick_params(labelsize=FS_A)
k+=1