问题描述
我想使用包含异类数据(表包含所有数据类型,例如数值,逻辑,字符,NA和空单元格)的数据框创建热图。 这是一个与我的实际数据集匹配的示例数据集。 我想在y轴上绘制“公民”,并在x轴上绘制所有其他变量(列)。
structure(list(ID = c("ID123","ID456","ID523","ID875","ID782","ID572","ID900"),Citizen = c("US","CN","MX","US","CA","CA"),Ht = c("6","NA","5","6",NA,"6"),Wt = c("200","140","160","175",NA),Age = c("NA","45","32","60","44","30"),income = c("60","50","30","20","40","20"),sex = c("M","F","M","F"),`Traffic vio` = c(TRUE,FALSE,TRUE,TRUE),Greets = c("Hello","Bonjour","Hola","Hi","Hello","Bonjour")),row.names = c(NA,-7L),class = c("tbl_df","tbl","data.frame"))
解决方法
您需要做的第一件事是将包含"NA"
的字符串转换为NA
常量。
library(dplyr)
df <- df %>% na_if("NA")
接下来,您需要将数字数据不存储为字符。
df <- df %>%
mutate(across(Ht:income,as.numeric))
您可能希望字符列成为因素,尤其是Citizen
,sex
和Greets
。
df <- df %>%
mutate(across(where(is.character),factor)
您可能想决定如何处理NA
中的Traffic vio
-这更可能是TRUE还是FALSE?如果需要,可以保留它。
df <- df %>%
mutate(`Traffic vio` = if_else(is.na(`Traffic vio`),FALSE,`Traffic vio`))
您现在可以使用geom_tile
中的ggplot2
制作热图。如果您想绘制汇总统计信息(例如平均值),则可能应该提前汇总数据。
df %>%
group_by(Citizen,sex) %>%
summarize(Age = mean(Age,na.rm = TRUE)) %>%
ggplot() +
geom_tile(aes(x = sex,y = Citizen,fill = Age))