R:如何在ggplots中按另一列因子或字符标签对字符列进行排序

问题描述

我正在尝试使用ggplot绘制冲积图。到目前为止,一切顺利,直到我想尝试清理该情节为止。

从图中可以看到,从左到右,第一个阶层/列是ID列,然后是一列标签:疾病风险。我想要实现的是在外面的情节中,而不是让患者ID呈锯齿状排列,我希望按疾病风险列对它们进行排序,以便所有高风险ID都放在首位,然后是低风险,然后是充满的。这样,更容易查看是否存在任何关系。

我到处寻找了ranging()和order()函数,它们似乎为我的实际输入数据提供了诀窍,但是一旦我在ggplot中传递了该数据帧,输出图仍然会混乱。

我考虑过将ID设置为因式,然后使用level = ....但是,如果患者ID持续增长,这不是很明智。

有没有更聪明的方法?请赐教。我已经附加了指向示例数据的链接

https://drive.google.com/file/d/16Pd8V3MCgEHmZEButVi2UjDiwZWklK-T/view?usp=sharing

我的图形绘制代码

library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8,"Set2"))(nb.cols)
mycolors <- c("Black")

 
#read the data
CLL3S.plusrec <- read.csv("xxxx.CSV",as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS,levels = c("low_risk","high_risk","Not filled"))
CLL3S.plusrec$`Enriched response phenotype` <- factor(CLL3S.plusrec$`Enriched response phenotype`,levels = c("Live cells","Pre-dead","TN & PDB","PDB & Lenalidomide","TN & STsveN & Live cells","Mixed"))

#here I reorder the dataframe and it looks good 
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec,CLL3S.plusrec[order(risk_by_DS),])


d <-ggplot(OR,aes(y = count,axis1= Patient.ID,axis2= risk_by_DS,axis3 = `Cluster assigned consensus`,axis4 = `Cluster assigned single drug`,axis5 = `Enriched response phenotype`
          
      )) +
  scale_x_discrete(limits = c("Patient ID","disease Risk","Consensus cluster","Single-drug cluster","Enriched drug response by Phenoptype")) +
  geom_alluvium(aes(fill=`Cluster assigned consensus`)) +
  geom_stratum(width = 1/3,fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:6]),color = "red") +
  #geom_stratum() +
  geom_text(stat = "stratum",aes(label = after_stat(stratum)),size=3) +
  theme(axis.title.x = element_text(size = 15,face="bold"))+
  theme(axis.title.y = element_text(size = 15,face="bold"))+
  theme(axis.text.x = element_text(size = 10,face="bold")) +
  theme(axis.text.y = element_text(size = 10,face="bold")) +
  labs(fill = "Consensus clusters")+
  guides(fill=guide_legend(override.aes = list(color=mycolors)))+
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters","3S stimulated patients")
  print(d)

my output figure

解决方法

不确定这是否是您想要的,请尝试通过以下方式设置风险列的格式:

library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8,"Set2"))(nb.cols)
mycolors <- c("Black")


#read the data
CLL3S.plusrec <- read.csv("test data.CSV",as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS,levels = c("high_risk","low_risk","Not filled"),ordered = T)
CLL3S.plusrec$Enriched.response.phenotype <- factor(CLL3S.plusrec$Enriched.response.phenotype,levels = c("Live cells","Pre-dead","TN & PDB","PDB & Lenalidomide","TN & STSVEN & Live cells","Mixed"))

#here I reorder the dataframe and it looks good 
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec,CLL3S.plusrec[order(risk_by_DS),])


ggplot(OR,aes(y = count,axis1= reorder(Patient.ID,risk_by_DS),axis2= risk_by_DS,axis3 = reorder(Cluster.assigned.consensus,axis4 = reorder(Cluster.assigned.single.drug,axis5 = reorder(Enriched.response.phenotype,risk_by_DS)
                   
)) +
  scale_x_discrete(limits = c("Patient ID","Disease Risk","Consensus cluster","Single-drug cluster","Enriched drug response by Phenoptype")) +
  geom_alluvium(aes(fill=Cluster.assigned.consensus)) +
  geom_stratum(width = 1/3,fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:6]),color = "red") +
  #geom_stratum() +
  geom_text(stat = "stratum",aes(label = after_stat(stratum)),size=3) +
  theme(axis.title.x = element_text(size = 15,face="bold"))+
  theme(axis.title.y = element_text(size = 15,face="bold"))+
  theme(axis.text.x = element_text(size = 10,face="bold")) +
  theme(axis.text.y = element_text(size = 10,face="bold")) +
  labs(fill = "Consensus clusters")+
  guides(fill=guide_legend(override.aes = list(color=mycolors)))+
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters","3S stimulated patients")

输出:

enter image description here

在我的read.csv()中,引号也出现了,圆点在变量中。这就是为什么原始引用的变量现在带有点的原因。也许是阅读方面的问题。

更新

#Update
OR <- with(CLL3S.plusrec,])
OR <- OR[order(OR$risk_by_DS,OR$Patient.ID),]
OR$Patient.ID <- factor(OR$Patient.ID,levels = unique(OR$Patient.ID),ordered = T)
#Plot
ggplot(OR,"3S stimulated patients")

输出:

enter image description here

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...