没有这样的文件或目录：'google_news_vectors.bin' 选项A-手动解压缩选项B-让gensim解压缩

问题描述

我正在尝试加载google_news_vecotors.bin文件，但它提供了一个错误。下面是我的代码，它写在nlp_gen2.py文件中

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('google_news_vectors.bin',binary=True)

我得到的错误是：

FileNotFoundError Traceback (most recent call last) in 1 import gensim
----> 2 model = gensim.models.KeyedVectors.load_word2vec_format('google_news_vectors.bin',binary=True)

C:\Anaconda3\envs\DataScience\lib\site-packages\gensim\models\keyedvectors.py
in load_word2vec_format(cls,fname,fvocab,binary,encoding,unicode_errors,limit,datatype) 1547 return _load_word2vec_format(
1548 cls,fvocab=fvocab,binary=binary,encoding=encoding,unicode_errors=unicode_errors,-> 1549 limit=limit,datatype=datatype)
1550 1551 @classmethod

C:\Anaconda3\envs\DataScience\lib\site-packages\gensim\models\utils_any2vec.py
in _load_word2vec_format(cls,datatype,binary_chunk_size) 273 274
logger.info("loading projection weights from %s",fname) --> 275 with
utils.open(fname,'rb') as fin: 276 header =
utils.to_unicode(fin.readline(),encoding=encoding) 277 vocab_size,vector_size = (int(x) for x in header.split()) # throws for invalid
file format

C:\Anaconda3\envs\DataScience\lib\site-packages\smart_open\smart_open_lib.py
in open(uri,mode,buffering,errors,newline,closefd,opener,ignore_ext,transport_params) 185 encoding=encoding,186
errors=errors,--> 187 newline=newline,188 ) 189 if fobj is not None:

C:\Anaconda3\envs\DataScience\lib\site-packages\smart_open\smart_open_lib.py
in _shortcut_open(uri,newline) 285 open_kwargs['errors'] = errors 286 --> 287 return
_builtin_open(local_path,buffering=buffering,**open_kwargs) 288 289

FileNotFoundError: [Errno 2] No such file or directory:
'google_news_vectors.bin'

我的文件结构如下：

我该如何解决？

解决方法

文件名为“ GoogleNews-vectors-negative300.bin”，但是如您所见，该文件已损坏。再次下载并解压缩rar。

您的问题并未清楚显示文件的命名方式，因为资源管理器未显示文件扩展名。请参见this guide将其打开。

由于某种原因，您有一个名为GoogleNews-vectors-negative300.bin的文件夹。情况并非如此。

选项A-手动解压缩

下载GoogleNews-vectors-negative300.bin.gz。它应该恰好是1647046227字节，其MD5为1c892c4707a8a1a508b01a01735c0339。通过检查文件属性来确认文件大小。
解压缩文件。看来您已经安装了WinRAR，它应该能够执行gunzip操作。
您现在应该拥有3644258522字节的文件GoogleNews-vectors-negative300.bin，其MD5为023bfd73698638bdad5f84df53404c8b。

现在，以下代码应该可以工作了：

import gensim
filename = 'GoogleNews-vectors-negative300.bin'
model = gensim.models.KeyedVectors.load_word2vec_format(filename,binary=True)

选项B-让gensim解压缩

下载GoogleNews-vectors-negative300.bin.gz。它应该恰好是1647046227字节，其MD5为1c892c4707a8a1a508b01a01735c0339。通过检查文件属性来确认文件大小。

现在，以下代码应该可以工作了：

import gensim
filename = 'GoogleNews-vectors-negative300.bin.gz'
model = gensim.models.KeyedVectors.load_word2vec_format(filename,binary=True)

_{此答案基于此Colab。}

gensim python word2vec