使用分类观察按 df 中的某个变量进行透视、计数和分组

问题描述

我有这个带有可重现数据的 df:

    for z in web_elements:
        print(z.get_attribute("innerHTML"))
    width = browser.find_element_by_xpath('//input[contains(@class,"cke_dialog_ui_input_text")]')
    width.click()

并且我通过使用以下代码旋转 df 创建了一个数据框来计算分类观察:

structure(list(`Loperamida en diarrea` = structure(c(1L,1L,2L,4L,3L,1L),.Label = c("muy efectiva","algo efectiva","no efectiva","no se"),class = c("ordered","factor")),`Carbón en diarrea` = structure(c(2L,4L),`Bismuto en diarrea` = structure(c(2L,`Rifaximina en diarrea` = structure(c(2L,`Otros antibióticos en diarrea` = structure(c(2L,`Probióticos en diarrea` = structure(c(2L,2L),`Orientación dicotómica` = c("Neurogastro","Neurogastro","No neurogastro","No neurogastro")),row.names = c(NA,10L),class = "data.frame")

结果如下:

library(tidyverse)
library(janitor)

df %>%
         pivot_longer(cols = everything()) %>%
  count(name,value) %>%
  pivot_wider(names_from = value,values_from = n,values_fill = 0) %>%
        mutate("efectiva" = `algo efectiva` + `muy efectiva`) %>%
                arrange(desc(`efectiva`)) %>%
        select(c(`name`,`efectiva`,`no efectiva`)) %>%
          adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns()

df$name <- str_remove(df$name," en diarrea")

我一直在尝试通过变量 name efectiva no efectiva Rifaximina 100.0% (10) 0.0% (0) Probióticos 80.0% (8) 20.0% (2) Bismuto 77.8% (7) 22.2% (2) Loperamida 87.5% (7) 12.5% (1) Otros antibióticos 77.8% (7) 22.2% (2) Carbón 50.0% (3) 50.0% (3) Orientación dicotómica - (0) - (0) (Neurogastro vs No Neurogastro)来分隔列,但我一直无法解决。我期望的是这样的:

Orientación dicotómica

有什么建议吗?

解决方法

编辑虽然有点延迟

library(janitor)
library(tidyverse)

df %>%
  pivot_longer(cols = 1:6) %>%
  count(`Orientación dicotómica`,name,value) %>%
  pivot_wider(id_cols = c(`Orientación dicotómica`,name),names_from = value,values_from = n,values_fill = 0,values_fn = sum) %>%
  mutate("efectiva" = `algo efectiva` + `muy efectiva`) %>%
  select(c(`Orientación dicotómica`,`name`,`efectiva`,`no efectiva`)) %>%
  adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns() -> out

merge(out %>% filter(`Orientación dicotómica` == 'Neurogastro') %>% select(name,`Neurogastro efectiva` = efectiva,`Neurogastro no efectiva` = `no efectiva`),out %>% filter(`Orientación dicotómica` == 'No neurogastro') %>% select(name,`No Neurogastro efectiva` = efectiva,`No Neurogastro no efectiva` = `no efectiva`),by = "name")

                           name Neurogastro efectiva Neurogastro no efectiva No Neurogastro efectiva No Neurogastro no efectiva
1            Bismuto en diarrea            66.7% (4)               33.3% (2)              100.0% (3)                   0.0% (0)
2             Carbón en diarrea            75.0% (3)               25.0% (1)                0.0% (0)                 100.0% (2)
3         Loperamida en diarrea            75.0% (3)               25.0% (1)              100.0% (4)                   0.0% (0)
4 Otros antibióticos en diarrea           100.0% (6)                0.0% (0)               33.3% (1)                  66.7% (2)
5        Probióticos en diarrea           100.0% (6)                0.0% (0)               50.0% (2)                  50.0% (2)
6         Rifaximina en diarrea           100.0% (6)                0.0% (0)              100.0% (4)                   0.0% (0)
,

这不是我想要的,但非常接近:

## fist I create 2 df's with filter:
library(dplyr)
df1 <- df %>% filter(`Orientación dicotómica` == "Neurogastro")
df2 <- df %>% filter(`Orientación dicotómica` != "Neurogastro")

## then I bind the 2 df's
df3 <- cbind(df1,df2) 

## finally I drop the repeated column and create a new df with renamed columns
df4 <- as.data.frame(as.matrix(df3[-4]) %>%
  list(name = df_tto$name,Neurogastro = df3[,c(2,3)],No_neurogastro = df3[,c(5,6)])) 

df4 <- df4[-(16),-(2:6)]

对此可能有更好的答案以及简化的代码,但这就是我能想到的全部内容,无论如何,它几乎可以完成工作......