如何获得分类的每篇文章的相关性分数,Python NLP

问题描述

这是一段代码,这段代码将Text分为10个类别,最后展示了算法的整体准确性:

import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfTransformer

df = pd.read_csv('data/wine_data.csv')

counter = Counter(df['variety'].tolist())
top_10_varieties = {i[0]: idx for idx,i in enumerate(counter.most_common(10))}
df = df[df['variety'].map(lambda x: x in top_10_varieties)]

description_list = df['description'].tolist()
varietal_list = [top_10_varieties[i] for i in df['variety'].tolist()]
varietal_list = np.array(varietal_list)

count_vect = CountVectorizer()
x_train_counts = count_vect.fit_transform(description_list)


tfidf_transformer = TfidfTransformer()
x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts)

train_x,test_x,train_y,test_y = train_test_split(x_train_tfidf,varietal_list,test_size=0.3)

clf = MultinomialNB().fit(train_x,train_y)
y_score = clf.predict(test_x)

n_right = 0
for i in range(len(y_score)):
if y_score[i] == test_y[i]:
    n_right += 1

print("Accuracy: %.2f%%" % ((n_right/float(len(test_y)) * 100))) code here

我的问题,如何获得数据集中每篇文章的相关性分数,如下所示:

Relevance scores

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)