错误:“ int”对象没有属性“ lower”关于CountVectorizer和Pandas

问题描述

我无法将CountVectorizer应用于Excel导入的数据集。我尝试将数据中的所有整数交换为字符串,但是CountVectorizer仍会注册整数。

import numpy as np
import sklearn
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.linear_model import Perceptron
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


pos = pd.read_excel("/content/drive/My Drive/Polarity_pos.xlsx",header = None,names=None)

neg = pos = pd.read_excel("/content/drive/My Drive/Polarity_neg.xlsx",names=None)


merged_train = pd.merge(pos,neg)


string = merged_train.astype('str')

train=pd.DataFrame(data=string).replace('\d+','NUM',regex=True)


print(train.loc[19,:])


#analyzer='word',stop_words=None,analyzer = 'word' 
vectorizer = cv()
count_vector = vectorizer.fit_transform(train)

出现错误

AttributeError                            Traceback (most recent call last)
<ipython-input-116-adcd263d8e89> in <module>()
     26 #analyzer='word',analyzer = 'word'
     27 vectorizer = cv()
---> 28 count_vector = vectorizer.fit_transform(train)
     29 
     30 

3 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _preprocess(doc,accent_function,lower)
     66     """
     67     if lower:
---> 68         doc = doc.lower()
     69     if accent_function is not None:
     70         doc = accent_function(doc)

AttributeError: 'int' object has no attribute 'lower'

解决方法

可能是您为fit_transformCountVectorizer提供了错误的输入。它不需要数据框,而是“可遍历原始文本文档”。请参见docs.,以便您可以尝试展平数据框,然后使用矢量化器。但是请确保您正在做的事情仍然适合您的问题。试试这个:

count_vector = vectorizer.fit_transform(train.stack())

train.stack()将您的数据框转换为序列。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...