问题描述
我正在研究text的多类分类问题,其中有很多不同的类(超过15个)。 我已经训练了Logistic分类器方法(方法只是示例)。 我能够弄清楚如何获得两堂课作为预测 我也想知道这些课程的概率
我正在使用的示例代码:
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
import pickle
filename = 'CR11sep.pkl'
with open(filename,'rb') as f:
movieVzer,movieTfmer,clf = pickle.load(f)
probab=clf.predict_proba(movieTfmer.transform(movieVzer.transform(df.text)))
#Identify the indexes of the top predictions
top_n_predictions = np.argsort(probab,axis = 1)[:,-2:]
#then find the associated SOC code for each prediction
top_class = clf.classes_[top_n_predictions]
reasons = pd.DataFrame(top_class)
df['reason1'] = reasons[1]
df['reason2'] = reasons[0]
当前输出:
source user time text reason1 reason2
0 hi neha 0 0:neha:hi 1 3
1 there ram 1 1:ram:there 1 5
2 ball neha 2 2:neha:ball 3 7
3 item neha 3 3:neha:item 6 1
4 go there ram 4 4:ram:go there 7 8
5 kk ram 5 5:ram:kk 1 3
6 hshs neha 6 6:neha:hshs 2 6
7 ggsgs neha 7 7:neha:ggsgs 15 9
所需的输出:
source user time text reason reason2 reason1_prob reason2_prob
0 hi neha 0 0:neha:hi 1 2 .8 .1
1 there ram 1 1:ram:there 1 6 .7 .2
2 ball neha 2 2:neha:ball 3 7 .. ..
3 item neha 3 3:neha:item 6 4
4 go there ram 4 4:ram:go there 7 9
5 kk ram 5 5:ram:kk 1 2
6 hshs neha 6 6:neha:hshs 2 3
7 ggsgs neha 7 7:neha:ggsgs 15 1
我尝试过
[sorted(np.round(probab,3)[li])[::-1][:2] for li in range(0,len(probab))]
但是正在寻找我可以排序和索引的方式
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)