使用Gensim进行主题建模

问题描述

我一直在尝试使用gensim在Python中进行主题建模。我有以下数据集:

文档

"Sugar is bad to consume. My sister likes to have sugar,but not my father."
"My father spends a lot of time driving my sister around to dance practice."
"Doctors suggest that driving may cause increased stress and blood pressure."
"Sometimes I feel pressure to perform well at school,but my father never seems to drive my sister to do better."
"Health experts say that Sugar is not good for your lifestyle."

我尝试如下对它进行词法化:

texts = map(gensim.utils.lemmatize,Docs)

并运行LDA:

dictionary = gensim.corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(doc) for doc in texts]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(corpus,num_topics=3,id2word = dictionary,passes=50)
ldamodel.print_topics()

但是我遇到一个错误。你知道如何解决吗?

谢谢

错误:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-15-b36df3b5374b> in <module>
----> 1 import pattern
      2 
      3 dictionary = gensim.corpora.Dictionary(Docs)
      4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
      5 Lda = gensim.models.ldamodel.LdaModel

ModuleNotFoundError: No module named 'pattern'

整个错误消息:

---> 3 dictionary = gensim.corpora.Dictionary(Docs)
      4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
      5 Lda = gensim.models.ldamodel.LdaModel

/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in __init__(self,documents,prune_at)
     82 
     83         if documents is not None:
---> 84             self.add_documents(documents,prune_at=prune_at)
     85 
     86     def __getitem__(self,tokenid):

/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in add_documents(self,prune_at)
    195 
    196         """
--> 197         for docno,document in enumerate(documents):
    198             # log progress & run a regular check for pruning,once every 10k docs
    199             if docno % 10000 == 0:

/anaconda3/lib/python3.7/site-packages/gensim/utils.py in lemmatize(content,allowed_tags,light,stopwords,min_length,max_length)
   1676     if not has_pattern():
   1677         raise ImportError(
-> 1678             "Pattern library is not installed. Pattern library is needed in order to use lemmatize function"
   1679         )
   1680     from pattern.en import parse

ImportError: Pattern library is not installed. Pattern library is needed in order to use lemmatize function

解决方法

尝试安装pattern软件包。这需要存在。

pip install pattern

Gensim utils.py使用此验证功能:

def has_pattern():
    """Check whether the `pattern <https://github.com/clips/pattern>`_ package is installed.
    Returns
    -------
    bool
        Is `pattern` installed?
    """
    try:
        from pattern.en import parse  # noqa:F401
        return True
    except ImportError:
        return False

我确实注意到此软件包在pip install gensim期间未通过验证,这很不清楚。

Collecting gensim
  Using cached https://files.pythonhosted.org/packages/70/cf/87b25b265d23498b2b70ce873495cf7ef91394c4baff240210e26f3bc18a/gensim-3.8.3-cp37-cp37m-macosx_10_9_x86_64.whl
Requirement already satisfied: numpy>=1.11.3 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.17.2)
Requirement already satisfied: scipy>=0.18.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.3.1)
Requirement already satisfied: six>=1.5.0 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.12.0)
Collecting smart-open>=1.8.1 (from gensim)
Collecting boto3 (from smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/c4/24/b9facc760789cf844880c178b64d26d9f4a0ef06af3e99473f38fba94657/boto3-1.14.56-py2.py3-none-any.whl
Requirement already satisfied: requests in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.22.0)
Requirement already satisfied: boto in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.49.0)
Collecting jmespath<1.0.0,>=0.7.1 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
Collecting s3transfer<0.4.0,>=0.3.0 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl
Collecting botocore<1.18.0,>=1.17.56 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/b1/82/499909b818bddde1a4fc1228389d9d29cc2ede766a2a7370aed033dd07f9/botocore-1.17.56-py2.py3-none-any.whl
Requirement already satisfied: certifi>=2017.4.17 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (1.24.2)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2.8)
Requirement already satisfied: docutils<0.16,>=0.10 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (0.15.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (2.8.0)
Installing collected packages: jmespath,botocore,s3transfer,boto3,smart-open,gensim
Successfully installed boto3-1.14.56 botocore-1.17.56 gensim-3.8.3 jmespath-0.10.0 s3transfer-0.3.3 smart-open-2.1.1

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...