错误：“ int”对象没有属性“ lower”关于CountVectorizer和Pandas

问题描述

我无法将CountVectorizer应用于Excel导入的数据集。我尝试将数据中的所有整数交换为字符串，但是CountVectorizer仍会注册整数。

import numpy as np
import sklearn
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


pos = pd.read_excel("/content/drive/My Drive/Polarity_pos.xlsx",header = None,names=None)

neg = pos = pd.read_excel("/content/drive/My Drive/Polarity_neg.xlsx",names=None)


merged_train = pd.merge(pos,neg)


string = merged_train.astype('str')

train=pd.DataFrame(data=string).replace('\d+','NUM',regex=True)


print(train.loc[19,:])


#analyzer='word',stop_words=None,analyzer = 'word' 
vectorizer = cv()
count_vector = vectorizer.fit_transform(train)

出现错误：

AttributeError                            Traceback (most recent call last)
<ipython-input-116-adcd263d8e89> in <module>()
     26 #analyzer='word',analyzer = 'word'
     27 vectorizer = cv()
---> 28 count_vector = vectorizer.fit_transform(train)
     29 
     30 

3 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _preprocess(doc,accent_function,lower)
     66     """
     67     if lower:
---> 68         doc = doc.lower()
     69     if accent_function is not None:
     70         doc = accent_function(doc)

AttributeError: 'int' object has no attribute 'lower'

解决方法

可能是您为fit_transform向CountVectorizer提供了错误的输入。它不需要数据框，而是“可遍历原始文本文档”。请参见docs.，以便您可以尝试展平数据框，然后使用矢量化器。但是请确保您正在做的事情仍然适合您的问题。试试这个：

count_vector = vectorizer.fit_transform(train.stack())

train.stack()将您的数据框转换为序列。

countvectorizer pandas pandas python scikit-learn

错误：“ int”对象没有属性“ lower”关于CountVectorizer和Pandas

问题描述

解决方法

相关问答