TypeError：“ PDF”对象不可迭代

问题描述

Extrated Test data from PDF file using PDF Plumer library,using LDA techique for Topic Modeling this is what i doing. How to get through this error PDF object is not iterable

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.decomposition import LatentDirichletAllocation

def topic_modeling(tata):
    print('Cleaned tata TXT')
    Infos = [tata]
    #print(type(tata))
    #print(infos[:50])
    cv = CountVectorizer(stop_words='english')
    dtm = cv.fit_transform(tata)
    #print('DTM')
    #dtm
    LDA = LatentDirichletAllocation(n_components=1,random_state=42)
    LDA.fit(dtm)
    single_topic = LDA.components_[0]
    #print('singletopic')

    topics_list = []
    print('list Created ')
    #topics1 = []
    #print('for loop')
    for index,topic in enumerate(LDA.components_):
        #print('index')
        #print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
        for i in topic.argsort()[-30:]:
            print(i)
            #print(i)
            topics_list.append(cv.get_feature_names()[i])
            #print('topics append')
            topics = ','.join(topics_list)
            #print('Topics__')
    return topics

topicextract = topic_modeling(tata)

Cleaned tata TXT
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-31281b9b8614> in <module>()
----> 1 topicextract = topic_modeling(tata)

2 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _count_vocab(self,raw_documents,fixed_vocab)
   1127         values = _make_int_array()
   1128         indptr.append(0)
-> 1129         for doc in raw_documents:
   1130             feature_counter = {}
   1131             for feature in analyze(doc):

TypeError: 'PDF' object is not iterable

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

lda python text-extraction topic-modeling