问题描述
我有一个包含两列的数据集,一列标识电子邮件是否被归类为垃圾邮件,另一列显示电子邮件的内容。香港专业教育学院一直试图与PSO和ABC实施幼稚的贝叶斯。但是我收到错误TypeError:字符串索引必须是整数。
email_train,email_test,spam_train,spam_test train_test_split(dfTotal.Email,dfTotal.Spam,test_size=0.3,random_state=0)
email_test_dtm = cv.transform(email_test)
# convert to TFIDF form
email_test_tf = tf.fit_transform(email_test_dtm)
email_test_tf
人工蜂群
from Hive import Hive
from Hive import Utilities
from sklearn.metrics import log_loss
# ---- SOLVE TEST CASE WITH ARTIFICIAL BEE COLONY ALGORITHM
def run(lowBounds,upBounds,evaluator):
model = Hive.BeeHive(lower = lowBounds,# MUST BE A LIST !
upper = upBounds,# MUST BE A LIST !
fun = evaluator,numb_bees = 100,max_itrs = 2,)
# runs model
cost,sol = model.run()
# plots convergence
Utilities.ConvergencePlot(cost)
# prints out best solution
print("fitness Value ABC: {0}".format(model.best))
ABC_model = MultinomialNB(alpha=10**sol[0]).fit(email_train_tf,spam_train) # Create the optimized model with best parameter
result = ABC_model.predict(email_test_tf) # predict with the ABC_model
return sol,result
进口机会
import optunity
import optunity.metrics
朴素贝叶斯
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb.get_params()
# fit tf-idf representation to NB model
nb.fit(email_train_tf,spam_train)
# class predictions for testing set
result1 = nb.predict(email_test_tf)
def evaluator(params):
nBayes = MultinomialNB(alpha=10**params[0]).fit(email_train_tf,spam_train)
pred_proba = nBayes.predict_proba(email_test_tf)
return log_loss(spam_test,pred_proba)
sol,result3 = run([-2],[1],evaluator)
我收到的回溯信息如下:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-006c296828ed> in <module>
8
9
---> 10 sol,evaluator)
<ipython-input-32-6852d973eb15> in run(lowBounds,evaluator)
19
20 # plots convergence
---> 21 Utilities.ConvergencePlot(cost)
22
23 # prints out best solution
c:\users\lidak\article\src\hive\Hive\Utilities.py in ConvergencePlot(cost)
55 labels = ["Best Cost Function","Mean Cost Function"]
56 plt.figure(figsize=(12.5,4));
---> 57 plt.plot(range(len(cost["best"])),cost["best"],label=labels[0]);
58 plt.scatter(range(len(cost["mean"])),cost["mean"],color='red',label=labels[1]);
59 plt.xlabel("Iteration #");
TypeError: string indices must be integers
<figure size 900x288 with 0 Axes>
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)