将列中的文本数据拆分为标记时出现“ AttributeError”

问题描述

我正在读取源json文件，然后尝试根据空格''将json的'Content'列拆分为单独的令牌。但是，此步骤将引发以下错误“ AttributeError: 'list' object has no attribute 'split'”。您能帮我理解为什么会这样吗，以及可能的解决方法是什么。请在下面找到我的代码段：

#Config file load
config = configparser.ConfigParser()
config.read('capstone_config.parm')

#Loading of data 
Source_File_Path=config['DEFAULT']['Data_file_path']
Source_Filename=config['Intent.LDA']['TgtFilename']
File=Source_File_Path + '\\' + Source_Filename
dataset=pd.read_json(File,orient='index',encoding='UTF-8')
dataset= dataset.replace(np.nan,'',regex=True)
print(dataset.shape)
dataset.head(2)

#Extracting the bigrams and trigrams and recreating the content to improve sentiment scoring
data = dataset.copy()
#Tokenize the content
corpus_token = [doc.split(' ') for doc in list(data.Content)]

最后一行抛出以下错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-426623e24865> in <module>
      2 data = dataset.copy()
      3 #Tokenize the content
----> 4 corpus_token = [doc.split(' ') for doc in list(data.Content)]

<ipython-input-12-426623e24865> in <listcomp>(.0)
      2 data = dataset.copy()
      3 #Tokenize the content
----> 4 corpus_token = [doc.split(' ') for doc in list(data.Content)]

AttributeError: 'list' object has no attribute 'split'

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

json python python-3.x tokenize