如何在 R 中为热图的每列添加自定义文本?

问题描述

我有一个数据集,我正在其中绘制热图来比较 7 个组。我还有每组 2 列描述该组的数据。我正在尝试创建一个交互式绘图,根据其信息列显示每个组的信息。

以下是一个数据示例,其中 7 个组各有 2 列对应信息:

df <- structure(list(Group1 = c(9.420318259,5.801092847,4.890727291,4.589825753,4.836092781),Group2 = c(14.57805564,8.798453748,7.982599836,7.951599435,10.81418654),Group3 = c(14.49131554,7.975284646,8.258878348,7.922657108,13.3205827),Group4 = c(11.44447147,6.208332721,6.529806574,4.882623805,10.69676399),Group5 = c(22.86835197,10.94297858,7.197041788,9.237584441,12.70083108),Group6 = c(10.62687539,6.458410247,7.461916094,6.308454021,12.39464562),Group7 = c(11.09404106,6.420303272,6.821000583,5.0727153,11.13903127),Group1_Genes = c(46L,17L,23L,16L,27L),Group1_score = c(0.719,0.757,0.71,0.807,0.761),Group2_Genes = c(58L,22L,30L,40L),Group2_score = c(0.754,0.766,0.741,0.774),Group3_Genes = c(37L,14L,13L,22L),Group3_score = c(0.798,0.788,0.81,0.879,0.805),Group4_Genes = c(55L,20L,29L,21L,42L),Group4_score = c(0.774,0.768,0.822,0.781),Group5_Genes = c(71L,24L,37L,53L),Group5_score = c(0.766,0.767,0.765,0.811,0.771
    ),Group6_Genes = c(69L,Group6_score = c(0.772,0.771),Group7_Genes = c(58L,33L,48L),Group7_score = c(0.79,0.78,0.774,0.817,0.78
    )),row.names = c("Cardiac Hypertrophy","Cellular Effects of Adrenaline","Metastasis Signaling","Hormone Signaling","Estrogen Receptor Signaling"
),class = "data.frame")
#One row of this data looks like:
Pathway  Group1  Group2  Group3  Group4  Group5  Group6  Group7  Group1_score  Group1_Genes  Group2_score  Group2_Genes ...
Cardiac  0.7      0.8      0.5    0.7      0.3    0.6     0.6        0.6           34           0.4     65

我正在尝试在热图中绘制组 1-7(也是第 1:7 列),然后通过修改热图中另一个问题 (How to create an interactive heatmaply plot with custom text in R?) 中的答案使用其余列悬停文本:>

groups <- as.matrix(df[,1:7]) 

labels1 <- 
  df  %>% 
  mutate(label1 = paste(
    "Gene Overlap:",Group1_Genes,"\nMean_GB_score:",Group1_score
  )) %>% 

  transmute(across(Group1,~label1)) %>% 
  as.matrix()

labels2 <- 
  df  %>% 
  mutate(label2 = paste(
    "Gene Overlap:",Group2_Genes,Group2_score
  )) %>% 

  transmute(across(Group2,~label2)) %>% 
  as.matrix()


#I repeat making 7 labels objects to then cbind:

labels = cbind(labels1,labels2,labels3,labels4,labels5,labels6,labels7)

heatmaply(groups,custom_hovertext = labels,file = "heatmaply_plot.html",scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
  low = "pink",high = "red"))

但是尝试这样做会产生错误

Error in custom_hovertext[rowInd,colInd,drop = FALSE] : 
  subscript out of bounds

有没有办法让我在 custom_text 中创建 heatmaply() 来指定为热图的每一列提供的悬停文本信息,而不是为每个热图方块提供的全局信息?

解决方法

labels_df <- 
  df %>% 
  select(ends_with("Score"),ends_with("Genes")) %>% 
  rownames_to_column() %>% 
  pivot_longer(-rowname) %>% 
  separate(name,c("Group","var")) %>% 
  pivot_wider(c(rowname,Group),names_from = var,values_from = value) %>% 
  mutate(label = paste(
    "Gene Overlap:",Genes,"\nMean_GB_Score:",Score
  )) %>% 
  pivot_wider(rowname,names_from = Group,values_from = label)

您可以通过在任何地方打破链并运行代码来检查每一步发生了什么。但基本上我们只是进行一些转置,以使数据采用更可用的整洁格式,这样我们就不需要输入 7 个类似的表达式来计算标签。然后我们转回 heatmaply 所需的格式。

这里要提的重要一点是,在所有这些转置之后,行的顺序恰好与它们开始时的顺序相同。这很酷,但最好检查一下这些东西。

矩阵形式的标签:

labels_mat <- 
  labels_df %>% 
  select(Group1:Group7) %>% 
  as.matrix()

最后:

heatmaply(
  groups,custom_hovertext = labels_mat,scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(low = "pink",high = "red")
)