问题描述
在保留每一行的唯一标识符的同时,我似乎无法将数据框的一列重塑为正确的形状。我有以下数据
id x y indicator
1 1 249.6 1.124985 1
2 1 250.9 1.124756 1
3 1 252.2 1.124125 1
4 1 253.5 1.124598 1
5 1 254.8 1.127745 1
6 1 256.1 1.129102 1
7 2 249.6 2.167348 0
8 2 250.9 2.165804 0
9 2 252.2 2.164578 0
10 2 253.5 2.163828 0
11 2 254.8 2.164260 0
12 2 256.1 2.166293 0
13 3 249.6 0.04647765 0
14 3 250.9 0.04932262 0
15 3 252.2 0.05245448 0
15 3 253.5 0.05692405 0
17 3 254.8 0.06184551 0
18 3 256.1 0.06751989 0
我想将y向量整形为一个矩阵,其中每行对应一个y向量,并且还有id和indicator的其他列,而变量列则由x值标记。像这样:
id indicator 249.6 250.9 252.2 ...
1 1 1.124985 1.124756 1.124125 ...
2 0 2.167348 2.165804 2.164578 ...
3 0 0.04647765 0.04932262 0.05245448 ...
我尝试过使用这样的重塑功能:
reshape(df[c('id','x','y')],direction = "wide",idvar = "id",timevar = "x")
在这种情况下,我只是忽略了指标变量,看它是否可以工作,但是我得到的数据框只有两列,第一列是ID,第二列是y.c(249.6,250.9,252.2,253.5,254.8,256.1,257.4,258.7,260,261.3,262.6,263.9,265.2,266.5,267.8,269.1,270.4,271.7,273,274.3,275.6,276.9,278.2,279.5,280.8,282.1,283.4,284.7,286,287.3,288.6,289.9,291.2,292.5,293.8,295.1,[etc]
。 / p>
我还尝试使用xtabs
函数:a = xtabs(formula = y ~ id + indicator+ x,data=df)
,但这只是返回了一个看起来与我输入的表非常相似的表。
解决方法
在拉各斯(Largo para largo)进行格式化。 (Veja no SO emInglêsaqui)。
python setup.py install
Dados
library(dplyr)
library(tidyr)
df1 %>%
pivot_wider(
id_cols = c('id','indicator'),names_from = 'x',values_from = 'y'
)
## A tibble: 3 x 8
# id indicator `249.6` `250.9` `252.2` `253.5` `254.8` `256.1`
# <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1.12 1.12 1.12 1.12 1.13 1.13
#2 2 0 2.17 2.17 2.16 2.16 2.16 2.17
#3 3 0 0.0465 0.0493 0.0525 0.0569 0.0618 0.0675
,
关于问题中的代码,indicator
应该是idvar
的一部分。另外,如果df
中除了列出的4列之外没有其他列,则df[...]
可以缩短为df
。
reshape(df[c('id','x','y','indicator')],direction = "wide",idvar = c("id","indicator"),timevar = "x")
给予:
id indicator y.249.6 y.250.9 y.252.2 y.253.5 y.254.8 y.256.1
1 1 1 1.12498500 1.12475600 1.12412500 1.12459800 1.12774500 1.12910200
7 2 0 2.16734800 2.16580400 2.16457800 2.16382800 2.16426000 2.16629300
13 3 0 0.04647765 0.04932262 0.05245448 0.05692405 0.06184551 0.06751989
注意
可复制形式的输入:
df <- structure(list(id = c(1L,1L,2L,3L,3L),x = c(249.6,250.9,252.2,253.5,254.8,256.1,249.6,256.1),y = c(1.124985,1.124756,1.124125,1.124598,1.127745,1.129102,2.167348,2.165804,2.164578,2.163828,2.16426,2.166293,0.04647765,0.04932262,0.05245448,0.05692405,0.06184551,0.06751989),indicator = c(1L,0L,0L)),class = "data.frame",row.names = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18"))
,
所以我很快就用tidyr
找出了答案:
z = pivot_wider(df,id_cols = c("id",names_from = "x",values_from = "y")
瑞·巴拉达斯(Rui Barradas)的答案是相同的,而且效果很好。