多类分类在数据帧中排名前两类及其概率

问题描述

我正在研究text的多类分类问题,其中有很多不同的类(超过15个)。 我已经训练了Logistic分类方法方法只是示例)。 我能够弄清楚如何获得两堂课作为预测 我也想知道这些课程的概率

我正在使用的示例代码

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer

import pickle
filename = 'CR11sep.pkl'
with open(filename,'rb') as f:
    movieVzer,movieTfmer,clf = pickle.load(f)

probab=clf.predict_proba(movieTfmer.transform(movieVzer.transform(df.text)))

#Identify the indexes of the top predictions
top_n_predictions = np.argsort(probab,axis = 1)[:,-2:]

#then find the associated SOC code for each prediction
top_class = clf.classes_[top_n_predictions]
reasons = pd.DataFrame(top_class)

df['reason1'] = reasons[1]
df['reason2'] = reasons[0]

当前输出

    source  user   time    text         reason1  reason2
0   hi      neha    0      0:neha:hi       1        3
1   there   ram     1      1:ram:there     1        5
2   ball    neha    2      2:neha:ball     3        7
3   item    neha    3      3:neha:item     6        1
4   go there ram    4      4:ram:go there  7        8
5   kk       ram    5      5:ram:kk        1        3
6   hshs    neha    6      6:neha:hshs     2        6
7   ggsgs   neha    7      7:neha:ggsgs    15       9

所需的输出

    source  user   time    text         reason  reason2 reason1_prob reason2_prob
0   hi      neha    0      0:neha:hi       1      2        .8           .1
1   there   ram     1      1:ram:there     1      6        .7           .2
2   ball    neha    2      2:neha:ball     3      7         ..          ..
3   item    neha    3      3:neha:item     6      4
4   go there ram    4      4:ram:go there  7      9
5   kk       ram    5      5:ram:kk        1      2
6   hshs    neha    6      6:neha:hshs     2      3
7   ggsgs   neha    7      7:neha:ggsgs    15     1

我尝试过

[sorted(np.round(probab,3)[li])[::-1][:2] for li in range(0,len(probab))]

但是正在寻找我可以排序和索引的方式

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)