问题描述
说我们得到一个这样的数据框:
> dput(data)
structure(list(Location = structure(1:18,.Label = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r"),class = "factor"),C1 = c(7L,NA,3L,7L,2L,NA),C2 = c(NA,8L,1L,9L,4L,1L),C3 = c(3L,5L,10L,C4 = c(NA,6L,15L,C5 = c(NA,8L)),class = "data.frame",row.names = c(NA,-18L))
记录数据的方式,我们有一个Location
列,它代表一个级别为a:r
的已知分组变量。然后,我们有C1:C5
列,它们本身代表5个簇,每个Location
的样本都是根据某个任意变量分类的。因此,每列的总和表明每个Location
有多少个样本。例如,Location == a
有10个样本,其中有7个被归类为C1
,有3个被归类为C3
。
我想创建一个列联表以执行卡方检验独立性,以查看Location
和集群分配是否独立。当数据以这种格式记录时,我们如何重塑数据以做到这一点?
更新:
除非有一种更简单的方法可以根据每行中的值从当前格式中获取列联表(可以直接在卡方检验上执行),否则我希望我们必须将其转换为整齐的格式,其中有两列Location
和Cluster
,并且每个原始样本都有一个观察值,因此输出看起来像这样:
#there would be 10 observations for location a,11 observations for b,and so on
Location Cluster
a C1
a C1
a C1
a C1
a C1
a C1
a C1
a C3
a C3
a C3
b C2
b C2
b C2
b C2
b C2
b C2
b C2
b C2
b C3
b C4
b C4
....
由此我们可以制作一个列联表并执行卡方检验
解决方法
我们可以将其整形为'long'格式,并使用const firstDateToPass = { year: 2020,month: 1,day: 26 };
const secondDateToPass = { year: 2020,day: 29 };
const getCountOfDaysGroupedByMonth = (startDate,endDate) => {
const firstMonthDateTime = DateTime.fromObject(startDate);
const secondMonthDateTime = DateTime.fromObject(endDate);
if (firstMonthDateTime.month === secondMonthDateTime.month) {
// In same month
// Return difference in days
return {
[firstMonthDateTime.monthLong]: secondMonthDateTime.day - firstMonthDateTime.day
}
}
}
console.log(getCountOfDaysGroupedByMonth(firstDateToPass,secondDateToPass)) // { January: 3 }
复制行
uncount