问题描述
Extrated Test data from PDF file using PDF Plumer library,using LDA techique for Topic Modeling this is what i doing. How to get through this error PDF object is not iterable
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
def topic_modeling(tata):
print('Cleaned tata TXT')
Infos = [tata]
#print(type(tata))
#print(infos[:50])
cv = CountVectorizer(stop_words='english')
dtm = cv.fit_transform(tata)
#print('DTM')
#dtm
LDA = LatentDirichletAllocation(n_components=1,random_state=42)
LDA.fit(dtm)
single_topic = LDA.components_[0]
#print('singletopic')
topics_list = []
print('list Created ')
#topics1 = []
#print('for loop')
for index,topic in enumerate(LDA.components_):
#print('index')
#print(f'THE TOP 15 WORDS FOR TOPIC #{index}')
for i in topic.argsort()[-30:]:
print(i)
#print(i)
topics_list.append(cv.get_feature_names()[i])
#print('topics append')
topics = ','.join(topics_list)
#print('Topics__')
return topics
topicextract = topic_modeling(tata)
Cleaned tata TXT
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-31281b9b8614> in <module>()
----> 1 topicextract = topic_modeling(tata)
2 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _count_vocab(self,raw_documents,fixed_vocab)
1127 values = _make_int_array()
1128 indptr.append(0)
-> 1129 for doc in raw_documents:
1130 feature_counter = {}
1131 for feature in analyze(doc):
TypeError: 'PDF' object is not iterable
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)