如何将使用 Hellinger 度量的层次聚类分析应用于 LDA 模型?

问题描述

我在做LDA分析,我有主题,但是我需要根据Hellinger距离对主题进行聚类。我需要将LDA模型生成的20个主题分组并以树状图呈现。我分享了部分代码

textos <-select(Base_Articulos,Articulo,Evento,Ano) 
textorder <- textos[order(textos$Ano),]

bd_duplicados <- textos[duplicated(textos),] 
bd_unicos <- unique (textos) 
bd_unicos <- na.omit(bd_unicos) 
ap_td <- tibble(textos) ap_td

tidy_articulo <- ap_td %>% unnest_tokens(word,Evento)

espstopwords <- tibble(word = c(stopwords(kind = "es"))) enpstopwords <- tibble(word = c(stopwords(kind = "en")))

miastopwords <- tibble(word = c("colombia","study","bogota","colombian","colombiano","t","medellin","n","k","b","hom","cc","92","85","m","1","l","sp","50","155.000","155","59","64","70","80","18","ri","2","3","4","5","6","7","8","9"))

tidy_articulo <- tidy_articulo %>% anti_join(espstopwords) tidy_articulo <- tidy_articulo %>% anti_join(enpstopwords) tidy_articulo <- tidy_articulo %>% anti_join(miastopwords)

ap_td <- mutate(ap_td,Evento = as.character(ap_td$Evento))

tidy_articulo %>% count(word,sort = TRUE)

word_counts <- tidy_articulo %>% count(Articulo,word,sort = TRUE) %>% ungroup()

word_counts

desc_dtm <- word_counts %>% cast_dtm(Articulo,n)

desc_dtm

ap_lda <- LDA(desc_dtm,k = 20,control = list(seed = 1234))

ap_lda

ap_topics <- tidy(ap_lda,matrix = "beta") 

ap_documents <- tidy(ap_lda,matrix = "gamma")

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)