问题描述
我有一个列表格式的NER数据。
样本数据:
[[('Silica','NN','_','B-Material'),('nanoparticles','NNS','I-Material'),('possessing','VBG','O'),('three','CD','B-Data'),('different','JJ','I-Data'),('diameters',('(','(',('23',(',',('74',('and','CC',('170',('nm',(')',')',('were','VBD',('used','VBN',('to','TO',('modify','B-Process'),('a','DT',('piperidine',('-',':',('cured',('epoxy',('polymer',('.','.','O')],[('Fracture',('tests','I-Process'),('performed',('values',('of','IN',('the',('toughness',('increased',('steadily','RB',('as',('concentration',('silica',('was','O')]]
我需要将其转换为CoNLL-2003 NER数据格式,并将其保存在文本文件中。我的已实现代码未按预期工作。我的实现:
name= 'coll2003_train_com.txt'
def data_format(name,seq):
test = []
for i in seq:
for j in i:
test.append(j)
with open(name,'w',encoding="utf-8") as f1:
for i in test:
ii='\t'.join(i)
f1.writelines(ii + '/n')
#f1.writelines('/n')
return test
m=data_format(name,cc1)
结果以一句话而不是单独的一行保存在文本文件中。
解决方法
尝试一下:
In [9]: fp = open(name,'w')
In [10]: for i in data:
...: for j in i:
...: fp.write('\t'.join(list(j))+'\n')
...:
In [11]: fp.close()