使用LDA时python中的get_coherence函数错误

问题描述

我在使用相干模型时遇到问题

我的代码是

def compute_coherence_values(dictionary,corpus,texts,limit,start,step):
    coherence_values = []
    model_list = []
    for num_topics in range(start,step):
        model = gensim.models.ldamodel.Ldamodel(corpus=corpus,id2word=id2word,num_topics=num_topics)
        model_list.append(model)

        coherencemodel = CoherenceModel(model=model,texts=texts,dictionary=dictionary,coherence="c_v")
        coherence_values.append(coherencemodel.get_coherence())

    return model_list,coherence_values

coherence_values = []
model_list = []

# topic number
nt = pre_nt

start_ = nt;
limit_ = nt + 1;
step_ = 1;

model_list1,coherence_values1 = compute_coherence_values(dictionary=id2word,corpus=corpus,texts=texts_wi_new,start=start_,limit=limit_,step=step_)

错误是

Traceback (most recent call last):
  File "<string>",line 1,in <module>
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\spawn.py",line 105,in spawn_main
Traceback (most recent call last):
  File "<input>",line 3,in <module>
  File "<input>",line 92,in compute_coherence_values
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py",line 609,in get_coherence
    confirmed_measures = self.get_coherence_per_topic()
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py",line 569,in get_coherence_per_topic
    self.estimate_probabilities(segmented_topics)
  File "D:\All Python\venv\lib\site-packages\gensim\models\coherencemodel.py",line 541,in estimate_probabilities
    self._accumulator = self.measure.prob(**kwargs)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\probability_estimation.py",line 156,in p_boolean_sliding_window
    return accumulator.accumulate(texts,window_size)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py",line 444,in accumulate
    workers,input_q,output_q = self.start_workers(window_size)
  File "D:\All Python\venv\lib\site-packages\gensim\topic_coherence\text_analysis.py",line 478,in start_workers
    worker.start()
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\process.py",line 112,in start
    self._popen = self._Popen(self)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\context.py",line 223,in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\context.py",line 322,in _Popen
    return Popen(process_obj)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\popen_spawn_win32.py",line 89,in __init__
    reduction.dump(process_obj,to_child)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\reduction.py",line 60,in dump
    ForkingPickler(file,protocol).dump(obj)
brokenPipeError: [Errno 32] broken pipe
    exitcode = _main(fd)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\spawn.py",line 114,in _main
    prepare(preparation_data)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\spawn.py",line 225,in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\multiprocessing\spawn.py",line 277,in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\runpy.py",line 261,in run_path
    code,fname = _get_code_from_file(run_name,path_name)
  File "C:\Users\lee96\AppData\Local\Programs\Python\python37\Lib\runpy.py",line 231,in _get_code_from_file
    with open(fname,"rb") as f:
OSError: [Errno 22] Invalid argument: 'D:\\All Python\\<input>'

此部分发生错误

coherencemodel.get_coherence()

我用pycharm。我该怎么解决？

对不起看起来您的帖子大部分是代码；请添加更多详细信息。看起来您的帖子大部分是代码；请添加更多详细信息。看起来您的帖子大部分是代码；请添加更多详细信息。

解决方法

我在使用完全相同的代码时遇到了完全相同的问题。当我从Spyder IDE运行该代码时，它工作得很好，但是当我将其插入Power BI时，它会出错。到目前为止，我已经将其从功能中分解出来并循环到下面的基本行中。 LDA和Coherence模型运行良好，但是由于某种原因，调用get_coherence（）时会出错。

model = gensim.models.ldamodel.LdaModel(corpus,num_topics=5,id2word=dictionary,passes=10)

coherencemodel = CoherenceModel(model=model,texts=texts,dictionary=dictionary,coherence='c_v')

test = coherencemodel.get_coherence()

以下是我收到的错误消息的一部分：

RuntimeError：已尝试在启动新进程之前当前过程已完成其引导阶段。

这可能意味着您没有使用fork来启动您的孩子流程，而您忘记了在主界面中使用适当的习惯用法模块：
        if __name__ == '__main__':
            freeze_support()
            ...
如果程序不执行，则可以省略“ freeze_support（）”行被冻结以生成可执行文件。

详细信息： DataSourceKind = Python DataSourcePath = Python Message = Python脚本错误。

我对此进行了更多研究，并发现了其他一些对我有帮助的文章，但最终似乎错误与Windows框架中的多处理有关。

where to put freeze_support() in a Python script? https://docs.python.org/2/library/multiprocessing.html#windows

对我有用的是，我将所有代码放在下面的代码行下：

if __name__ == '__main__':
    freeze_support()  
    model_list,coherence_values = compute_coherence_values(dictionary=dictionary,corpus=corpus,start=start,limit=limit,step=step)
    max_value = max(coherence_values)
    max_index = coherence_values.index(max_value)

    best_model = model_list[max_index]

    ldamodel= best_model

我不是Python上最出色的开发人员，但是我可以根据需要工作。如果其他人有更好的建议，我将无所不在：）

lda python python-3.x