我有一个包含 700 个条形的 ggplot 条形图,我想要一条帕累托线,它可以工作,但是条形的 y 比例太小,所以它们没有显示在图表中

问题描述

多亏了很多人,我才开始使用 R 的图表。

I have three charts

Random bars

绘图频率排序

Bars ordered by frequency

绘制帕累托叠加

Pareto overlay

如果你仔细观察,你会看到底部有缩放的有序频率图。

 ```{r}
df <- filter(df_clean_distances,end_station_name != "NA" )
d <-df %>% select( end_station_name) %>%
group_by(end_station_name) %>%
summarize( freq = n())
head(d$freq )
dput(head(d))
d2 <- d[ order(-d$freq),]
d2

随机绘图

```{r}
ggplot(d2,aes( x=end_station_name,y= freq)) + 
geom_bar( stat = "identity") + 
theme( axis.text.x = element_blank()) +
  ylim( c(0,40000))
```

绘图频率排序

 ```{r}
 ggplot(d2,aes( x=reorder(end_station_name,-freq),y= freq)) +    
    geom_bar( stat = "identity") +   
    theme(axis.text.x = element_blank()) +   
    ylim( c(0,40000))+
    labs( title = "end station by freq",x = "Station Name")

使用帕累托叠加绘制

```{r}

ggplot(d2,y= freq)) +    
geom_bar( stat = "identity") +   theme(axis.text.x = element_blank()) +  
ggQC::stat_pareto( point.color = "red",point.size = 0.5) +
labs( title = "end station by freq",x = "Station Name") 
```

dput(head) 输出

```{r}
> dput(head(d,n=20))
  structure(list(end_station_name = c("2112 W Peterson Ave","63rd St 
  Beach","900 W Harrison St","Aberdeen St & Jackson Blvd","Aberdeen St & 
   Monroe St","Aberdeen St & Randolph St","Ada St & 113th St","Ada St & 
   Washington Blvd","Adler Planetarium","Albany Ave & 26th St","Albany Ave & 
   BloomingDale Ave","Albany Ave & Montrose Ave","Archer (damen) Ave & 37th St","Artesian Ave & Hubbard St","Ashland Ave & 13th St","Ashland Ave & 
  50th St","Ashland Ave & 63rd St","Ashland Ave & 66th St","Ashland Ave & 
   69th St","Ashland Ave & 73rd St"),freq = c(1032L,2524L,3836L,8383L,6587L,6136L,18L,6281L,12050L,397L,2833L,1875L,710L,1879L,2659L,151L,112L,102L,78L,8L)),row.names = c(NA,-20L),class = 
  c("tbl_df","tbl","data.frame"))
```    

正如您所看到的,帕累托图适用于右手比例,但左手却异常古怪。虽然有 300 万行,但 y 轴上的缩放已将频率降低到底部的一条非常小的曲线,但在左侧很难看到。

如何将左 y 轴固定到大约 40,000,以便正确显示频率曲线?

解决方法

这是一个解决方案,但不适用于包 ggQC,带有 sec_axis
诀窍是预先计算 max(freq),然后将其用作比例因子以对齐两个轴。此数据准备代码的灵感来自此 rstudio-pubs blog post

library(ggplot2)
library(dplyr)

M <- max(d$freq)

d %>%
  arrange(desc(freq)) %>%
  mutate(cum_freq = cumsum(freq/sum(freq))) %>%
  ggplot(aes(x = reorder(end_station_name,-freq),y = freq)) +    
  geom_bar(stat = "identity") +   
  geom_line(mapping = aes(y = cum_freq*M,group = 1)) +
  geom_point(
    mapping = aes(y = cum_freq*M),color = "red",size = 0.5
  ) +
  scale_y_continuous(
    sec.axis = sec_axis(~ ./M,labels = scales::percent,name = "Cummulative percentage")) +
  labs( title = "end station by freq",x = "Station Name") +
  theme(axis.text.x = element_blank())

enter image description here