如何使用R为异构数据类型表包含数字,逻辑,字符,NA和空单元格生成热图? 尚未解决

问题描述

我想使用包含异类数据(表包含所有数据类型,例如数值,逻辑,字符,NA和空单元格)的数据框创建热图。 这是一个与我的实际数据集匹配的示例数据集。 我想在y轴上绘制“公民”,并在x轴上绘制所有其他变量(列)。


structure(list(ID = c("ID123","ID456","ID523","ID875","ID782","ID572","ID900"),Citizen = c("US","CN","MX","US","CA","CA"),Ht = c("6","NA","5","6",NA,"6"),Wt = c("200","140","160","175",NA),Age = c("NA","45","32","60","44","30"),income = c("60","50","30","20","40","20"),sex = c("M","F","M","F"),`Traffic vio` = c(TRUE,FALSE,TRUE,TRUE),Greets = c("Hello","Bonjour","Hola","Hi","Hello","Bonjour")),row.names = c(NA,-7L),class = c("tbl_df","tbl","data.frame"))

解决方法

您需要做的第一件事是将包含"NA"的字符串转换为NA常量。

library(dplyr)
df <- df %>% na_if("NA")

接下来,您需要将数字数据不存储为字符。

df <- df %>%
  mutate(across(Ht:income,as.numeric))

您可能希望字符列成为因素,尤其是CitizensexGreets

df <- df %>%
  mutate(across(where(is.character),factor)

您可能想决定如何处理NA中的Traffic vio-这更可能是TRUE还是FALSE?如果需要,可以保留它。

df <- df %>%
  mutate(`Traffic vio` = if_else(is.na(`Traffic vio`),FALSE,`Traffic vio`))

您现在可以使用geom_tile中的ggplot2制作热图。如果您想绘制汇总统计信息(例如平均值),则可能应该提前汇总数据。

df %>%
  group_by(Citizen,sex) %>%
  summarize(Age = mean(Age,na.rm = TRUE)) %>%
  ggplot() + 
  geom_tile(aes(x = sex,y = Citizen,fill = Age))

enter image description here