短信垃圾邮件检测:ValueError:形状1,8667和7764,2未对齐:8667dim 1!= 7764dim 0

问题描述

我不是ML专家,但我正在使用Naive Bayes分类器进行垃圾短信预测,该模型运行良好,但是当我告诉我要从泡菜文件中进行预测时,我仍然遇到上述错误

    import numpy as np
import pandas as pd
import string
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.naive_bayes import MultinomialNB
import pickle as pk
from nltk.stem import snowballstemmer
import time
def textPreProcess(text):
    punctuationsToNone = str.maketrans('','',string.punctuation)
    text = text.translate(punctuationsToNone)
    text = [word for word in text.split() if word.lower() not in stopwords.words("english")]
    return " ".join(text)

def stemming(text):
    words = map(lambda t: snowballstemmer("english").stem(t),text.split())
    return " ".join(words)
sms = pd.read_csv('spam_dataset.csv')
texts = sms['text'].copy()
vectorizer = TfidfVectorizer("english")
features = vectorizer.fit_transform(texts)
# And apply the pre-processing methods to the new DataFrame
texts = texts.apply(textPreProcess)
texts = texts.apply(stemming)
sms['length'] = sms['text'].apply(len)
lengths = sms['length'].values
features = np.hstack((features.todense(),lengths[:,None]))
features_train,features_test,labels_train,labels_test = train_test_split(features,sms['type'],test_size = 0.2,random_state = int(time.time()))
model = MultinomialNB(alpha=0.2)
model.fit(features_train,labels_train)
predicted = model.predict(features_test)
accuracyscore = accuracy_score(labels_test,predicted)
print(accuracyscore)
pk.dump(vectorizer,open("vectorizer.pkl","wb"))
pk.dump(model,open("model.pkl","wb"))

我的代码可从模型pk文件进行预测

from flask import Flask,render_template,request,session,flash
from datetime import date
import nexmo
import numpy as np

import pickle as pk
import pandas as pd
import sqlite3 as sql
from nltk.corpus import stopwords
from nltk.stem import snowballstemmer
import string
from flask import redirect

@app.route("/send-sms",methods=['POST'])                   # at the end point /
def send_sms_post():

    def createDataFrame(message):
        return pd.DataFrame({
            'message': [message],'length': [len(message)]
        })

    def textPreProcess(text):
        text = text.translate(str.maketrans('',string.punctuation))
        text = [word for word in text.split() if word.lower() not in stopwords.words("english")]
        return " ".join(text)

    def stemming(text):
        words = map(lambda t: snowballstemmer('english').stem(t),text.split())
        return " ".join(words)

    def extractFreatures(sms):
        texts = sms['message'].copy()
        lengths = sms['length'].values
        texts = texts.apply(textPreProcess)
        texts = texts.apply(stemming)
        vectorizer = pk.load(open('vectorizer.pkl','rb'))
        features = vectorizer.transform(texts)
        features = np.hstack((features.todense(),None]))
        return features

    def predict(features):
        model = pk.load(open('spam_predictor.pkl','rb'))
        label = model.predict(features)
        return label

    message = str(request.form["message"].strip())
    sms = createDataFrame(message)
    features = extractFreatures(sms)
    model = pk.load(open('spam_predictor.pkl','rb'))
    result = model.predict(features)[0]

**错误消息**

回溯(最近通话最近):文件 “ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 第2463行,在致电 返回self.wsgi_app(环境,start_response)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, wsgi_app中的第2449行 响应= self.handle_exception(e)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 第1866行,在handle_exception中 重新提升(exc_type,exc_value,tb)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask_compat.py”, 第39行,加价 提高价值文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, wsgi_app中的第2446行 响应= self.full_dispatch_request()文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 1951行,在full_dispatch_request中 rv = self.handle_user_exception(e)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 第1820行,在handle_user_exception中 重新提升(exc_type,exc_value,tb)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask_compat.py”, 第39行,加价 提高价值文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 1949行,在full_dispatch_request中 rv = self.dispatch_request()文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ flask \ app.py”, 1935行,在dispatch_request中 返回self.view_functionsrule.endpoint文件“ C:\ Users \ Richiehortiz \ PycharmProjects \ sms_spam \ starter.py”,行119, 在send_sms_post中 结果= model.predict(功能)[0]文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ sklearn \ naive_bayes.py”, 第65行,在预测中 jll = self._joint_log_likelihood(X)文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ sklearn \ naive_bayes.py”, 第737行,在 joint_log_likelihood中 返回(safe_sparse_dot(X,self.feature_log_prob .T)+文件“ C:\ Users \ Richiehortiz \ AppData \ Roaming \ Python \ python36 \ site-packages \ sklearn \ utils \ extmath.py”, 第142行,在safe_sparse_dot中 返回np.dot(a,b)ValueError:形状(1,8667)和(7764,2)不对齐:8667(dim 1)!= 7764(dim 0)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)