问题描述
enter image description hereenter image description here我正在从事 Python 项目,我正在使用 Word2Vec 来推荐产品。 该代码对于包含 19401 的数据集工作得非常好,但是每当我传递产品的 ID 时,我都会收到这个错误“keyerror : word '1077' not invocabulary” 我不知道如何解决这个问题,因为我对此知之甚少,我还在学习中。请帮我解决这个问题!
purchases_train = []
for i in tqdm(product_train):
temp = train_df[train_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_train.append(temp)
purchases_val = []
for i in tqdm(validation_df['Clothing ID'].unique()):
temp = validation_df[validation_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_val.append(temp)
model = Word2Vec(window = 10,sg = 1,hs = 0,negative = 10,# for negative sampling
alpha=0.03,min_count= 1,min_alpha=0.0007,seed = 14)
model.build_vocab(purchases_train,progress_per=200)
model.train(purchases_train,total_examples = model.corpus_count,epochs=10,report_delay=1)
# save word2vec model
model.save("word2vec_2.model")
model.init_sims(replace=True)
# extract all vectors
X = model[model.wv.vocab]
products = train_df[["Clothing ID","Review Text"]]
# remove duplicates
products.drop_duplicates(inplace=True,subset='Clothing ID',keep="last")
# create product-ID and product-description dictionary
products_dict = products.groupby('Clothing ID')['Review Text'].apply(list).to_dict()
def similar_products(v,n = 6):
# extract most similar products for the input vector
ms = model.similar_by_vector(v,topn= n+1)[1:]
# extract name and similarity score of the similar products
new_ms = []
for j in ms:
pair = (products_dict[j[0]][0],j[1])
new_ms.append(pair)
return new_ms
similar_products(model['1077'])
解决方法
如果您收到错误 word '847' not in vocabulary
,那么您可以确定:您的训练数据中未提供令牌 '847'
。
如果您认为它在那里,您应该查看数据以确认它不在。
如果您的代码需要能够对不在训练数据中的单词做一些有用的事情,您应该将其扩展为:
(1) 在尝试获取词向量之前先检查词是否存在
if '847' in model:
similar_products(model['847'])
else:
# do something else
...
...或...
(2) 抓住 KeyError
并在它被抓住时做其他事情。