R：如何在ggplots中按另一列因子或字符标签对字符列进行排序

问题描述

我正在尝试使用ggplot绘制冲积图。到目前为止，一切顺利，直到我想尝试清理该情节为止。

从图中可以看到，从左到右，第一个阶层/列是ID列，然后是一列标签：疾病风险。我想要实现的是在外面的情节中，而不是让患者ID呈锯齿状排列，我希望按疾病风险列对它们进行排序，以便所有高风险ID都放在首位，然后是低风险，然后是充满的。这样，更容易查看是否存在任何关系。

我到处寻找了ranging（）和order（）函数，它们似乎为我的实际输入数据提供了诀窍，但是一旦我在ggplot中传递了该数据帧，输出图仍然会混乱。

我考虑过将ID设置为因式，然后使用level = ....但是，如果患者ID持续增长，这不是很明智。

有没有更聪明的方法？请赐教。我已经附加了指向示例数据的链接。

https://drive.google.com/file/d/16Pd8V3MCgEHmZEButVi2UjDiwZWklK-T/view?usp=sharing

我的图形绘制代码：

library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8,"Set2"))(nb.cols)
mycolors <- c("Black")

 
#read the data
CLL3S.plusrec <- read.csv("xxxx.CSV",as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS,levels = c("low_risk","high_risk","Not filled"))
CLL3S.plusrec$`Enriched response phenotype` <- factor(CLL3S.plusrec$`Enriched response phenotype`,levels = c("Live cells","Pre-dead","TN & PDB","PDB & Lenalidomide","TN & STsveN & Live cells","Mixed"))

#here I reorder the dataframe and it looks good 
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec,CLL3S.plusrec[order(risk_by_DS),])


d <-ggplot(OR,aes(y = count,axis1= Patient.ID,axis2= risk_by_DS,axis3 = `Cluster assigned consensus`,axis4 = `Cluster assigned single drug`,axis5 = `Enriched response phenotype`
          
      )) +
  scale_x_discrete(limits = c("Patient ID","disease Risk","Consensus cluster","Single-drug cluster","Enriched drug response by Phenoptype")) +
  geom_alluvium(aes(fill=`Cluster assigned consensus`)) +
  geom_stratum(width = 1/3,fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:6]),color = "red") +
  #geom_stratum() +
  geom_text(stat = "stratum",aes(label = after_stat(stratum)),size=3) +
  theme(axis.title.x = element_text(size = 15,face="bold"))+
  theme(axis.title.y = element_text(size = 15,face="bold"))+
  theme(axis.text.x = element_text(size = 10,face="bold")) +
  theme(axis.text.y = element_text(size = 10,face="bold")) +
  labs(fill = "Consensus clusters")+
  guides(fill=guide_legend(override.aes = list(color=mycolors)))+
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters","3S stimulated patients")
  print(d)

解决方法

不确定这是否是您想要的，请尝试通过以下方式设置风险列的格式：

library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8,"Set2"))(nb.cols)
mycolors <- c("Black")


#read the data
CLL3S.plusrec <- read.csv("test data.CSV",as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS,levels = c("high_risk","low_risk","Not filled"),ordered = T)
CLL3S.plusrec$Enriched.response.phenotype <- factor(CLL3S.plusrec$Enriched.response.phenotype,levels = c("Live cells","Pre-dead","TN & PDB","PDB & Lenalidomide","TN & STSVEN & Live cells","Mixed"))

#here I reorder the dataframe and it looks good 
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec,CLL3S.plusrec[order(risk_by_DS),])


ggplot(OR,aes(y = count,axis1= reorder(Patient.ID,risk_by_DS),axis2= risk_by_DS,axis3 = reorder(Cluster.assigned.consensus,axis4 = reorder(Cluster.assigned.single.drug,axis5 = reorder(Enriched.response.phenotype,risk_by_DS)
                   
)) +
  scale_x_discrete(limits = c("Patient ID","Disease Risk","Consensus cluster","Single-drug cluster","Enriched drug response by Phenoptype")) +
  geom_alluvium(aes(fill=Cluster.assigned.consensus)) +
  geom_stratum(width = 1/3,fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:6]),color = "red") +
  #geom_stratum() +
  geom_text(stat = "stratum",aes(label = after_stat(stratum)),size=3) +
  theme(axis.title.x = element_text(size = 15,face="bold"))+
  theme(axis.title.y = element_text(size = 15,face="bold"))+
  theme(axis.text.x = element_text(size = 10,face="bold")) +
  theme(axis.text.y = element_text(size = 10,face="bold")) +
  labs(fill = "Consensus clusters")+
  guides(fill=guide_legend(override.aes = list(color=mycolors)))+
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters","3S stimulated patients")

输出：

在我的read.csv()中，引号也出现了，圆点在变量中。这就是为什么原始引用的变量现在带有点的原因。也许是阅读方面的问题。

更新：

#Update
OR <- with(CLL3S.plusrec,])
OR <- OR[order(OR$risk_by_DS,OR$Patient.ID),]
OR$Patient.ID <- factor(OR$Patient.ID,levels = unique(OR$Patient.ID),ordered = T)
#Plot
ggplot(OR,"3S stimulated patients")

输出：

dataframe ggalluvial ggplot2 r r

R：如何在ggplots中按另一列因子或字符标签对字符列进行排序

问题描述

解决方法

相关问答