将文本文件中的列数据转换为 Python 中的嵌套列表？

问题描述

我有一个 txt 文件，其中包含按列编写的句子和标签，如下所示：

O   are
O   there
O   any
O   good
B-GENRE romantic
I-GENRE comedies
O   out
B-YEAR  right
I-YEAR  now

O   show
O   me
O   a
O   movie
O   about
B-PLOT  cars
I-PLOT  that
I-PLOT  talk

我想将这个 txt 文件中的数据读取到两个嵌套列表中。所需的输出应该是这样的：

labels = [['O','O','B-GENRE','I-GENRE','B-YEAR','I-YEAR'],['O','B-PLOT','I-PLOT','I-PLOT']]
sentences = [['are','there','any','good','romantic','comedies','out','right','now'],['show','me','a','movie','about','cars','that','talk']]

我尝试了以下方法：

with open("engtrain.bio.txt","r") as f:
  lsta = []
  for line in f:
    lsta.append([x for x in line.replace("\n","").split()])

但我有以下输出：

[['O','are'],'there'],'any'],'good'],['B-GENRE','romantic'],['I-GENRE','comedies'],'out'],['B-YEAR','right'],['I-YEAR',[],'show'],'me'],'a'],'movie'],'about'],['B-PLOT','cars'],['I-PLOT','that'],'talk']]

更新我还尝试了以下方法：

with open("engtest.bio.txt","r") as f:
  lines = f.readlines()
  labels = []
  sentences = []
  for l in lines:
    as_list = l.split("\t")
    labels.append(as_list[0])
    sentences.append(as_list[1].replace("\n",""))

不幸的是，仍然有错误：

IndexError                                Traceback (most recent call last)
<ipython-input-66-63c266df6ace> in <module>()
      6     as_list = l.strip().split("\t")
      7     labels.append(as_list[0])
----> 8     sentences.append(as_list[1].replace("\n",""))

IndexError: list index out of range

原始数据来自此链接（engtest.bio 或 entrain.bio）：https://groups.csail.mit.edu/sls/downloads/movie/

你能帮我吗？

提前致谢

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

named-entity-recognition nested-lists python type-conversion