使用数据和分组的线图

问题描述

df是这个:

gender party mean(salary)

1 female democrat 31833.33    
2 female republican 27000.00    
3 male democrat 30250.00    
4 male republican 36166.67

为男性民主人士,女性民主人士,男性共和党人和女性共和党人绘制均值线形图,所有均值显示在单个图形中,误差线显示95%的置信区间。在x轴上绘制政治从属关系,在y轴上绘制薪水。通过更改线条粗体,符号形状,颜色或破折号样式来识别男性和女性的线条。

这是我的尝试

ggplot(df,aes(x=party,y='mean(salary)',group=gender)) +
  geom_line(aes(color=gender))+
  geom_point(aes(color=gender)+
  stat_summary(fun.data = mean_cl_normal,geom = "errorbar",position = position_dodge(width = 0.90),width=0.2))

从民主人士到共和党人,事实证明这就像一条直线。

这是我根据请求的原始数据:

salary party gender
1 34000 republican male
2 31000 republican female
3 28000 democrat male
4 29000 democrat female
5 30000 republican male
6 23000 republican female
7 27500 democrat male
8 32000 democrat female
9 32000 republican male
10 28000 republican female
11 30000 democrat male
12 34000 democrat female
13 39000 republican male
14 27000 republican female
15 34000 democrat male
16 30000 democrat female
17 40000 republican male
18 26000 republican female
19 30000 democrat male
20 35000 democrat female
21 42000 republican male
22 27000 republican female
23 32000 democrat male
24 31000 democrat female

解决方法

虽然您可以创建折线图,但也许您会对条形图感兴趣?对于像政党这样的名义类别,这可能会更具吸引力。

这是使用dplyrggplot2的一种方法。首先确定每个partygender组合的平均值和标准差。然后,您可以使用geom_bargeom_errorbar进行绘制。

library(ggplot2)
library(dplyr)

df %>%
  group_by(party,gender) %>%
  summarise(mean=mean(salary),sd=sd(salary)) %>%
  ggplot(aes(x=party,y=mean,fill=gender)) +
    geom_bar(position=position_dodge(width=.75),stat = "identity",width=.7) +
    geom_errorbar(aes(ymin=mean-sd,ymax=mean+sd),position=position_dodge(width=.75),width=.3)

图解

bar plot

,

不想阻止您学习自己动手做,但是我有一个小包装,提供了可以满足您需要的功能,并提供了许多格式化选项。 CGPfunctions::Plot2WayANOVA

library(CGPfunctions)
CGPfunctions::Plot2WayANOVA(salary ~ party * gender,salary_df)
#> 
#> Converting party to a factor --- check your results
#> 
#> Converting gender to a factor --- check your results
#> 
#> Blah blah lots of important info to the console ...
#> Interaction graph plotted...

手动会出现类似这样的情况...

library(dplyr)
library(ggplot2)

# summarise what we need
salary_summarised <- salary_df %>%
   group_by(party,gender) %>%
   summarise(
      AVG.salary = mean(salary,na.rm=TRUE),SD.salary = sd(salary,N.salary = length(salary),SE.salary = sd(salary,na.rm=TRUE) / sqrt(length(salary)),CI95Muliplier = qt(.95/2 + .5,length(salary) - 1)
   )


# and plot it
ggplot(salary_summarised,aes(x=party,y=AVG.salary,colour=gender,group=gender)) +
   geom_errorbar(aes(ymin=AVG.salary - SE.salary*CI95Muliplier,ymax=AVG.salary + SE.salary*CI95Muliplier),width=.2,color = "purple") +
   geom_line() +
   geom_point(aes(y=AVG.salary)) +
   xlab("Party") +
   ylab("Salary") +
   ggtitle("Salary with 95% CI") +
   theme_bw()

使用数据样本

library(readr)
salary_df <- readr::read_table2("salary party gender
34000 republican male
31000 republican female
28000 democrat male
29000 democrat female
30000 republican male
23000 republican female
27500 democrat male
32000 democrat female
32000 republican male
28000 republican female
30000 democrat male
34000 democrat female
39000 republican male
27000 republican female
34000 democrat male
30000 democrat female
40000 republican male
26000 republican female
30000 democrat male
35000 democrat female
42000 republican male
27000 republican female
32000 democrat male
31000 democrat female")

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...