问题描述
我正在尝试以从here提取的pkl格式加载经过预训练的word2vec模型
我用来加载它的代码行:
model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl')
但是,我不断收到以下错误(完整追溯):
UnpicklingError Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
55
56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
58
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls,fname_or_handle,**kwargs)
1551 @classmethod
1552 def load(cls,**kwargs):
-> 1553 model = super(WordEmbeddingsKeyedVectors,cls).load(fname_or_handle,**kwargs)
1554 if isinstance(model,FastTextKeyedVectors):
1555 if not hasattr(model,'compatible_hash'):
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls,**kwargs)
226 @classmethod
227 def load(cls,**kwargs):
--> 228 return super(BaseKeyedVectors,**kwargs)
229
230 def similarity(self,entity1,entity2):
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls,fname,mmap)
433 compress,subname = SaveLoad._adapt_by_suffix(fname)
434
--> 435 obj = unpickle(fname)
436 obj._load_specials(fname,mmap,compress,subname)
437 logger.info("loaded %s",fname)
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
1396 # Because of loading from S3 load can't be used (missing readline in smart_open)
1397 if sys.version_info > (3,0):
-> 1398 return _pickle.load(f,encoding='latin1')
1399 else:
1400 return _pickle.loads(f.read())
UnpicklingError: invalid load key,':'.
我尝试用load_word2vec_format加载它,但是没有运气。任何想法可能有什么问题吗?
解决方法
每个链接https://wikipedia2vec.github.io/wikipedia2vec/pretrained/将使用该库的Wikipedia2Vec.load()
方法加载。
Gensim的.load()
方法仅应用于直接从Gensim模型对象保存的文件。
Wikipedia2Vec项目确实说,它们的.txt
文件格式将与.load_word2vec_format()
一起加载,因此您也可以尝试使用-.txt
格式文件之一。
它们的完整模型.pkl
文件将仅使用其类自己的加载功能。