问题描述
我有一个名为“调查”的数据集,其中包含个人 ID 的行和包含许多问题的列。我需要将 1 列中的值重新编码为 NA
并将观察移至另一列。
例如:
ID Food vegetable
aaa NA NA
bbb NA lemon
ccc NA sprout
ddd fruit NA
eee fruit NA
fff NA watermelon
我想改变 lemon
和 watermelon
观察,属于 ID bbb
和 fff
将它们放入 Food
列并重命名它们 { {1}}(调查受访者将他们放在错误的列中)并将 fruit
留在 NA
列中。
看起来像:
vegetable
我用过:
ID Food vegetable
aaa NA NA
bbb fruit NA
ccc NA sprout
ddd fruit NA
eee fruit NA
fff fruit NA
它可以将 survey<- survey %>%
mutate(food = if_else(str_detect(vegetable,"(lemon)|(watermelon)"),"fruit",Food))
列中的 NA
转换为 fruit
,但它不会与 food
列中的 NA
一致,它也将 vegetable
列中的所有其他水果变为 food
!
数据:
NA
P.S.:这是对已回答的 a previous question I asked 的跟进。这与之前的问题不完全相同,这就是我开始一个新问题的原因。
dplyr 版本 (1.0.2)
解决方法
一种选择是根据 import pygraphviz as pgv
A = pgv.AGraph(strict=False,directed=True,overlap=False,sep="+10,10")
[A.add_node(k) for k,v in S] # adding all nodes
A.add_edge(S.created,S.packaged_unassigned)
A.add_edge(S.packaged_unassigned,S.packaged_assigned)
A.add_edge(S.packaged_assigned,S.packaged_unassigned,style="dotted")
A.add_edge(S.packaged_assigned,S.shipped_to_distributor)
A.add_edge(S.shipped_to_distributor,S.on_distributor_side_out)
A.add_edge(S.on_distributor_side_out,S.shipped_to_deployer)
A.add_edge(S.shipped_to_deployer,S.on_distributor_side_in)
A.add_edge(S.on_distributor_side_in,S.shipped_to_lab)
A.add_edge(S.shipped_to_lab,S.on_lab_side)
A.add_edge(S.on_lab_side,S.analysis_completed)
A.add_edge(S.analysis_completed,S.completed)
A.layout()
A.draw("status_chart.png")
值是否为 Food
给定列表 Vegetable
来更新 Vegetable
和 %in%
:
not_vegetables
另一种方法是not_vegetables <- c("grape","tomato")
df %>%
mutate(Food = if_else(Vegetable %in% not_vegetables,"fruit",Food),Vegetable = if_else(Vegetable %in% not_vegetables,NA_character_,Vegetable))
,replace
两列,并在里面做across
:
if_else
,
您可以尝试使用基础 R 吗:
#Conditional
values <- c('grape','tomato')
df$Food <- ifelse(df$Vegetable %in% values,'fruit',df$Food)
df$Vegetable <- ifelse(df$Vegetable %in% values,NA,df$Vegetable)
输出:
df
ID Food Vegetable
1 aaa fruit <NA>
2 bbb fruit <NA>
3 ccc fruit <NA>
4 ddd fruit <NA>
数据
df <- structure(list(ID = c("aaa","bbb","ccc","ddd"),Food = c(NA,"fruit"),Vegetable = c("grape","tomato",NA
)),class = "data.frame",row.names = c(NA,-4L))