问题描述
我想以一种增强的方式来训练以前训练过的word2vec模型,如果在先前的训练过程中已经看到过该单词,则更新该单词的权重,并创建和更新在以前的培训过程。例如:
from gensim.models import Word2Vec
# old corpus
corpus = [["0","1","2","3"],["2","3","1"]]
# first train on old corpus
model = Word2Vec(sentences=corpus,size=2,min_count=0,window=2)
# checkout the embedding weights for word "1"
print(model["1"])
# here comes a new corpus with new word "4" and "5"
newCorpus = [["4",["1","5","2"]]
# update the previous trained model
model.build_vocab(newCorpus,update=True)
model.train(newCorpus,total_examples=model.corpus_count,epochs=1)
# check if new word has embedding weights:
print(model["4"]) # yes
# check if previous word's embedding weights are updated
print(model["1"]) # output the same as before
即使新单词中的前一个单词的上下文已经改变,似乎前一个单词的嵌入也没有更新。有人可以告诉我如何更新以前的嵌入权重吗?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)