如何在R中拆分列名称并删除名称的一部分并将数据从宽格式转换为长格式

问题描述

我的数据格式如下:

dataset <- data.frame(taxa = c("taxa1","taxa2","taxa3"),"11908.MM.0008.Inf.6m.Stool" =c(0,1760,0),"11908.MM.01115.Inf.6m.Stool" =c(0,1517,"11908.MM.0044.Inf.6m.Stool" =c(0,10815,"11908.MM.0125.Mom.6m.Stool" = c(0,4719,0))
view(dataset)

我想将其转换为以下格式:

fix_dataset <- data.frame(study_id = c(0008,0115,0044,0125),individual = c("Inf","Inf","Mom" ),taxa1 = c(0,taxa2 = c(1760,4719),taxa3 = c(0,timept1 = c("6m","6m","6m"))

view(fix_dataset)

我试图从每个列名中切出开头的数字序列11908和“ Stool”,将列名的其他部分切开,然后从宽格式转换为长格式。

解决方法

您可以使用以下代码来实现:

library(tidyverse)
dataset %>%
  pivot_longer(cols = -taxa) %>%
  separate(col = name,into = c("info1","info2","study_id","individual","timept1","info3"),sep = "[.]") %>%
  pivot_wider(names_from = taxa,values_from = value) %>%
  select(study_id,individual,starts_with("taxa"),timept1)

给出:

# A tibble: 4 x 6
  study_id individual taxa1 taxa2 taxa3 timept1
  <chr>    <chr>      <dbl> <dbl> <dbl> <chr>  
1 0008     Inf            0  1760     0 6m     
2 01115    Inf            0  1517     0 6m     
3 0044     Inf            0 10815     0 6m     
4 0125     Mom            0  4719     0 6m 

请注意,您的研究编号存在一些不一致,即原始数据集中的编号之一是“ 01115”,而在您的首选输出中则是“ 0115”。