无法重新创建用于训练 FastText 的 Gensim 文档类型错误:必须提供 corpus_file 或 corpus_iterable 值之一

问题描述

我正在尝试制作自己的 Fasttext 嵌入,因此我访问了 Gensim 官方文档和具有精确 4.0 版本的 implemented this exact code below

from gensim.models import FastText
from gensim.test.utils import common_texts

model = FastText(vector_size=4,window=3,min_count=1)  # instantiate
model.build_vocab(sentences=common_texts)
model.train(sentences=common_texts,total_examples=len(common_texts),epochs=10)

令我惊讶的是,它给了我以下错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-6b2d1de02d90> in <module>
      1 model = FastText(vector_size=4,min_count=1)  # instantiate
----> 2 model.build_vocab(sentences=common_texts)
      3 model.train(sentences=common_texts,epochs=10)

~/anaconda3/lib/python3.8/site-packages/gensim/models/word2vec.py in build_vocab(self,corpus_iterable,corpus_file,update,progress_per,keep_raw_vocab,trim_rule,**kwargs)
    477 
    478         """
--> 479         self._check_corpus_sanity(corpus_iterable=corpus_iterable,corpus_file=corpus_file,passes=1)
    480         total_words,corpus_count = self.scan_vocab(
    481             corpus_iterable=corpus_iterable,progress_per=progress_per,trim_rule=trim_rule)

~/anaconda3/lib/python3.8/site-packages/gensim/models/word2vec.py in _check_corpus_sanity(self,passes)
   1484         """Checks whether the corpus parameters make sense."""
   1485         if corpus_file is None and corpus_iterable is None:
-> 1486             raise TypeError("Either one of corpus_file or corpus_iterable value must be provided")
   1487         if corpus_file is not None and corpus_iterable is not None:
   1488             raise TypeError("Both corpus_file and corpus_iterable must not be provided at the same time")

TypeError: Either one of corpus_file or corpus_iterable value must be provided

有人可以帮忙看看这里发生了什么吗?

解决方法

所以我找到了这个问题的答案。他们在两者中的参数 sentence 都有问题:

model.build_vocab(sentences=common_texts)
model.train(sentences=common_texts,total_examples=len(common_texts),epochs=10)

您所要做的就是删除参数名称或简单地传递第一个参数,即 corpus_iterable

model.build_vocab(common_texts)
model.train(common_texts,epochs=10)

model.build_vocab(corpus_iterable=common_texts)
model.train(corpus_iterable=common_texts,epochs=10)