如何使用plotly-express组合两个数据框并将数据绘制为一条线

问题描述

我正在尝试绘制两个数据集的折线图,每个数据集包含值 'date''conversions''cpi' 作为一条线。该图表显示每天的值计数,因此我在处理同一天正确绘制每个数据集的值时遇到问题:

enter image description here

样本数据:

          date  conversions           cpi
0   2020-11-02          0.0  0.000000e+00
1   2020-11-03          0.0  0.000000e+00
2   2020-11-04          0.0  0.000000e+00
3   2020-11-05          0.0  0.000000e+00
4   2020-11-06          3.0  6.333000e-01
5   2020-11-07          0.0  0.000000e+00
6   2020-11-08          0.0  0.000000e+00
7   2020-11-09          0.0  0.000000e+00
8   2020-11-10          0.0  0.000000e+00
9   2020-11-11          2.0  1.695000e+00
10  2020-11-12          0.0  0.000000e+00
11  2020-11-13          2.0  2.170000e+00
12  2020-11-14          0.0  0.000000e+00
13  2020-11-15          1.0  2.590000e+00
14  2020-11-16          2.0  2.670000e+00
0   2020-11-02          5.0  2.039435e+06
1   2020-11-03          6.0  2.788452e+06
2   2020-11-04          8.0  1.720630e+06
3   2020-11-05          8.0  2.038703e+06
4   2020-11-06         11.0  1.775534e+06
5   2020-11-07         14.0  1.810215e+06
6   2020-11-08         30.0  1.617934e+06
7   2020-11-09         27.0  1.784663e+06
8   2020-11-10         32.0  1.368291e+06
9   2020-11-11          4.0  5.293594e+06
10  2020-11-12         17.0  1.524248e+06
11  2020-11-13         20.0  2.437085e+06
12  2020-11-14         24.0  2.272977e+06
13  2020-11-15         38.0  1.848160e+06
14  2020-11-16         22.0  2.415721e+06

我的代码是:

asa_installs_time = get_installs_time(start_date,end_date)
ga_installs_time = get_GAinstalls_time(start_date,end_date)
asa_installsTime_df = pd.DataFrame.from_dict(asa_installs_time[1])
ga_installsTime_df = pd.DataFrame.from_dict(ga_installs_time)
all_installsTime_df = pd.concat([ga_installsTime_df,asa_installsTime_df])
installs_time_series_chart = px.line( all_installsTime_df,x= all_installsTime_df['date'],all_installsTime_df['conversions'],title='Installs per Day')

return [all_installsTime_df]

如何解决绘制两个相同日期的问题?

编辑

使用 all_installsTime_df = all_installsTime_df.sort_values('date').reset_index(drop=True)

enter image description here

解决方法

  • 主要问题是需要 .groupby 'date'.sum() 来自两个数据帧的值。
import pandas as pd
import plotly.express as px

# sample data
data1 = {'date': ['2020-11-02','2020-11-03','2020-11-04','2020-11-05','2020-11-06','2020-11-07','2020-11-08','2020-11-09','2020-11-10','2020-11-11','2020-11-12','2020-11-13','2020-11-14','2020-11-15','2020-11-16'],'conversions': [0,3,2,1,2],'cpi': [0.0,0.0,0.6333,1.695,2.17,2.59,2.67]}
data2 = {'date': ['2020-11-02','conversions': [5.0,6.0,8.0,11.0,14.0,30.0,27.0,32.0,4.0,17.0,20.0,24.0,38.0,22.0],'cpi': [2039435.0,2788452.0,1720630.0,2038703.0,1775534.0,1810215.0,1617934.0,1784663.0,1368291.0,5293594.0,1524248.0,2437085.0,2272977.0,1848160.0,2415721.0]}

# create dataframes
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# concat the dataframes
df = pd.concat([df1,df2]).reset_index(drop=True)

# set the date column as a datetime
df.date = pd.to_datetime(df.date)

# groupby date,aggregate sum on all columns and reset 
dfg = df.groupby('date').sum().reset_index()

# plot
fig = px.line(dfg,x=dfg['date'],y=dfg['conversions'],title='Installs per Day')
fig.show()

enter image description here

display(dfg)

         date  conversions           cpi
0  2020-11-02          5.0  2.039435e+06
1  2020-11-03          6.0  2.788452e+06
2  2020-11-04          8.0  1.720630e+06
3  2020-11-05          8.0  2.038703e+06
4  2020-11-06         14.0  1.775535e+06
5  2020-11-07         14.0  1.810215e+06
6  2020-11-08         30.0  1.617934e+06
7  2020-11-09         27.0  1.784663e+06
8  2020-11-10         32.0  1.368291e+06
9  2020-11-11          6.0  5.293596e+06
10 2020-11-12         17.0  1.524248e+06
11 2020-11-13         22.0  2.437087e+06
12 2020-11-14         24.0  2.272977e+06
13 2020-11-15         39.0  1.848163e+06
14 2020-11-16         24.0  2.415724e+06

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...