Gensim Word2vec模型在增加训练量时不会更新前一个单词的嵌入权重

问题描述

我想以一种增强的方式来训练以前训练过的word2vec模型,如果在先前的训练过程中已经看到过该单词,则更新该单词的权重,并创建和更新在以前的培训过程。例如:

from gensim.models import Word2Vec
# old corpus
corpus = [["0","1","2","3"],["2","3","1"]]
# first train on old corpus
model = Word2Vec(sentences=corpus,size=2,min_count=0,window=2)
# checkout the embedding weights for word "1"
print(model["1"])

# here comes a new corpus with new word "4" and "5"
newCorpus = [["4",["1","5","2"]]

# update the previous trained model
model.build_vocab(newCorpus,update=True)
model.train(newCorpus,total_examples=model.corpus_count,epochs=1)

# check if new word has embedding weights:
print(model["4"])  # yes

# check if previous word's embedding weights are updated
print(model["1"])  # output the same as before

即使新单词中的前一个单词的上下文已经改变,似乎前一个单词的嵌入也没有更新。有人可以告诉我如何更新以前的嵌入权重吗?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)