aggregate() 函数如何使用 cuttree() 值？

问题描述

通常聚合函数中的分组变量会对数据框进行分组，并且分组变量是与以下数据框相同的数据框的一部分：

aggregate(iris[,1:4],by = list(iris$Species),mean)

但是在分层聚类中，当我们使用 cutree() 时，返回的列表在aggregate() 函数中用于创建每个聚类的摘要。

members <- cutree(c,k = 9)
aggregate(customer_sample[,2:4],by = list(members),mean)

现在就我而言，members 包含集群编号（1 到 8）和唯一 ID，而我的数据框 customer_sample 仅包含唯一 ID。我不明白的是聚合函数如何将来自成员变量的唯一 ID 连接到数据框 customer_sample 中的唯一 ID。

这是我的完整代码。

data <- read.table("purchases.txt")
head(data)
colnames(data) = c('customer_id','purchase_amount','date_of_purchase')


#----------------Set Date and extract No of days elapsed ---------------------

data$date_of_purchase = as.Date(data$date_of_purchase,"%Y-%m-%d")
data$days_since = as.numeric(difftime(time1 = "2016-01-01",time2 = data$date_of_purchase,units = "days"))


#----------------Compute Recency,Frequency,Monetary Value-------------------
customers <- data %>% group_by(customer_id) %>% 
                                summarize(recency = min(days_since),freq = n(),amount = mean(purchase_amount))

#----------------Explore Recency,Monetary Value-------------------
head(customers)
summary(customers)
hist(customers$recency)
hist(customers$freq)
hist(customers$amount)
hist(customers$amount,breaks = 100)


#-------------------------Make a copy of customers df ------------------------
new_data <- customers
head(new_data)

#--------------Transform Data to compute similarity/dissimilarity-------------
new_data$amount <- log(new_data$amount)
hist(new_data$amount)


vec_id <- new_data$customer_id
new_data <- subset(new_data,select= -c(customer_id),drop = FALSE)

rownames(new_data) <- vec_id

head(new_data)
#---------------------------- Standardize Data -------------------------------
new_data = scale(new_data)
head(new_data)

#-------------------- Take small sample for efficiency -----------------------
sample = seq(1,18417,10)
head(sample)
customer_sample <- customers[sample,]
new_data_sample <- new_data[sample,]
#/////////////////////////////////////////////////////////////////////////////
#---------------------------- Hierarchical Clustering ------------------------
#/////////////////////////////////////////////////////////////////////////////

#-------------------------------- distance Matrix ----------------------------
d <- dist(new_data_sample)

#---------------------------------- Make Clusters ----------------------------
c = hclust(d,method = "ward.D2")

#------------------------------- Plot Dendrogram  ----------------------------
plot(c)

#------------------------------- Cut the Dendrogram --------------------------
members <- cutree(c,k = 9)   #k gives the number of clusters/segments

#--------------------------- Show first 30 customers -------------------------
members[1:30]

#----------------------  Compute frequency in each cluster -------------------
table(members)

#------------------------- Show profile of each customer ---------------------
aggregate(customer_sample[,mean)

文件是 purchases.txt

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）