问题描述
我有一个 txt
文件,其中包含按列编写的句子和标签,如下所示:
O are
O there
O any
O good
B-GENRE romantic
I-GENRE comedies
O out
B-YEAR right
I-YEAR now
O show
O me
O a
O movie
O about
B-PLOT cars
I-PLOT that
I-PLOT talk
我想将这个 txt
文件中的数据读取到两个嵌套列表中。
所需的输出应该是这样的:
labels = [['O','O','B-GENRE','I-GENRE','B-YEAR','I-YEAR'],['O','B-PLOT','I-PLOT','I-PLOT']]
sentences = [['are','there','any','good','romantic','comedies','out','right','now'],['show','me','a','movie','about','cars','that','talk']]
我尝试了以下方法:
with open("engtrain.bio.txt","r") as f:
lsta = []
for line in f:
lsta.append([x for x in line.replace("\n","").split()])
但我有以下输出:
[['O','are'],'there'],'any'],'good'],['B-GENRE','romantic'],['I-GENRE','comedies'],'out'],['B-YEAR','right'],['I-YEAR',[],'show'],'me'],'a'],'movie'],'about'],['B-PLOT','cars'],['I-PLOT','that'],'talk']]
更新 我还尝试了以下方法:
with open("engtest.bio.txt","r") as f:
lines = f.readlines()
labels = []
sentences = []
for l in lines:
as_list = l.split("\t")
labels.append(as_list[0])
sentences.append(as_list[1].replace("\n",""))
不幸的是,仍然有错误:
IndexError Traceback (most recent call last)
<ipython-input-66-63c266df6ace> in <module>()
6 as_list = l.strip().split("\t")
7 labels.append(as_list[0])
----> 8 sentences.append(as_list[1].replace("\n",""))
IndexError: list index out of range
原始数据来自此链接(engtest.bio 或 entrain.bio):https://groups.csail.mit.edu/sls/downloads/movie/
你能帮我吗?
提前致谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)