为Pandas / Matplotlib构建数据

问题描述

我有一个收集空气质量数据的树莓派，我希望将其使用Flask推送到Web服务器。我的问题在于我最初构造文本文件来存储数据的方式-如下：

> (Sun Aug  9 08:59:05 2020,PM1.0 ug/m3 (ultrafine particles):        
> 20 PM2.5 ug/m3 (combustion particles,organic compounds,Metals): 30
> PM10 ug/m3  (dust,pollen,mould spores):                      32
> PM1.0 ug/m3 (atmos env):                                       19
> PM2.5 ug/m3 (atmos env):                                       29 PM10
> ug/m3 (atmos env):                                        32
> >0.3um in 0.1L air:                                            3990
> >0.5um in 0.1L air:                                            1089
> >1.0um in 0.1L air:                                            180
> >2.5um in 0.1L air:                                            10
> >5.0um in 0.1L air:                                            2
> >10um in 0.1L air:                                             0 ),(Sun Aug  9 09:00:06 2020,PM1.0 ug/m3 (ultrafine particles):        
> 21 PM2.5 ug/m3 (combustion particles,Metals): 31
> PM10 ug/m3  (dust,mould spores):                      33
> PM1.0 ug/m3 (atmos env):                                       20
> PM2.5 ug/m3 (atmos env):                                       30 PM10
> ug/m3 (atmos env):                                        33
> >0.3um in 0.1L air:                                            3990
> >0.5um in 0.1L air:                                            1089
> >1.0um in 0.1L air:                                            180
> >2.5um in 0.1L air:                                            10
> >5.0um in 0.1L air:                                            2
> >10um in 0.1L air:                                             0 ),(Sun Aug  9 09:01:06 2020,

如您所见，数据可能不是出于我的意图而构建的，下面显示了我使用Pandas所做的工作，希望可以使用matplotlib提供数据可视化。

sample_data = pd.read_fwf('particulates.txt',header=None)

这将返回一个数据帧，如下所示： Dataframe

我一直在尝试找出整理数据的最佳方法，以便：

日期，显示在每个新记录的第一列上，并且是用'分隔的'可以用作我的x轴。
每个其他数据点-例如“ PM1.0 ug / m3（超细颗粒）：”可以组合在一起以形成相关数据图。

我正在寻找实现此目标的最佳方法的指针，而不是解决方案本身。我考虑的一种方法是使用繁重的字符串操作和模数运算，因为每条记录中的数据正好有14行，因此将其移至sql 数据库，但是，毫无疑问，这可以使用Pandas来实现。

任何指针将不胜感激，并感谢您的宝贵时间。

解决方法

尝试一下：

sample_data = pd.read_csv('particulates.txt',sep=':')

很抱歉，我无法发表评论，但是gtomer回答了您的问题，因此请给他功劳。您的数据已经结构化，可以在matplotlib中使用。

如果要将日期用作x维度，请使用以下代码：

sample_data.set_index(['date'])

之后，只需使用：

sample_data.plot(subplots = True)
plt.show()

data-structures dataframe matplotlib pandas pandas python