多类分类在数据帧中排名前两类及其概率

问题描述

我正在研究text的多类分类问题，其中有很多不同的类（超过15个）。我已经训练了Logistic分类器方法（方法只是示例）。我能够弄清楚如何获得两堂课作为预测我也想知道这些课程的概率

我正在使用的示例代码：

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer

import pickle
filename = 'CR11sep.pkl'
with open(filename,'rb') as f:
    movieVzer,movieTfmer,clf = pickle.load(f)

probab=clf.predict_proba(movieTfmer.transform(movieVzer.transform(df.text)))

#Identify the indexes of the top predictions
top_n_predictions = np.argsort(probab,axis = 1)[:,-2:]

#then find the associated SOC code for each prediction
top_class = clf.classes_[top_n_predictions]
reasons = pd.DataFrame(top_class)

df['reason1'] = reasons[1]
df['reason2'] = reasons[0]

当前输出：

    source  user   time    text         reason1  reason2
0   hi      neha    0      0:neha:hi       1        3
1   there   ram     1      1:ram:there     1        5
2   ball    neha    2      2:neha:ball     3        7
3   item    neha    3      3:neha:item     6        1
4   go there ram    4      4:ram:go there  7        8
5   kk       ram    5      5:ram:kk        1        3
6   hshs    neha    6      6:neha:hshs     2        6
7   ggsgs   neha    7      7:neha:ggsgs    15       9

所需的输出：

    source  user   time    text         reason  reason2 reason1_prob reason2_prob
0   hi      neha    0      0:neha:hi       1      2        .8           .1
1   there   ram     1      1:ram:there     1      6        .7           .2
2   ball    neha    2      2:neha:ball     3      7         ..          ..
3   item    neha    3      3:neha:item     6      4
4   go there ram    4      4:ram:go there  7      9
5   kk       ram    5      5:ram:kk        1      2
6   hshs    neha    6      6:neha:hshs     2      3
7   ggsgs   neha    7      7:neha:ggsgs    15     1

我尝试过

[sorted(np.round(probab,3)[li])[::-1][:2] for li in range(0,len(probab))]

但是正在寻找我可以排序和索引的方式

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

multiclass-classification pandas predict python scikit-learn