以特殊方式标记句子

问题描述

from os import listdir
from os.path import isfile,join
from datasets import load_dataset
from transformers import BertTokenizer

test_files = [join('./test/',f) for f in listdir('./test') if isfile(join('./test',f))]

dataset = load_dataset('json',data_files={"test": test_files},cache_dir="./.cache_dir")

运行代码后,dataset["test"]["abstract"]的输出如下:

[['eleven politicians from 7 parties made comments in letter to a newspaper .',"said dpp alison saunders had ` damaged public confidence ' in justice .",'ms saunders ruled lord janner unfit to stand trial over child abuse claims .','the cps has pursued at least 19 suspected paedophiles with dementia .'],['an increasing number of surveys claim to reveal what makes us happiest .','but are these generic lists really of any use to us ?','janet street-porter makes her own list - of things making her unhappy !'],["author of ` into the wild ' spoke to five rape victims in missoula,montana .","` missoula : rape and the justice system in a college town ' was released april 21 .","three of five victims profiled in the book sat down with abc 's nightline wednesday night .",'kelsey belnap,allison huguet and hillary mclaughlin said they had been raped by university of montana football '
  'players .',"huguet and mclaughlin 's attacker,beau donaldson,pleaded guilty to rape in 2012 and was sentenced to 10 years .",'belnap claimed four players gang-raped her in 2010,but prosecutors never charged them citing lack of probable '
  'cause .','mr krakauer wrote book after realizing close friend was a rape victim .'],['tesco announced a record annual loss of £ 6.38 billion yesterday .','drop in sales,one-off costs and pensions blamed for financial loss .','supermarket giant now under pressure to close 200 stores nationwide .','here,retail industry veterans,plus mail writers,identify what went wrong .'],...,['snp leader said alex salmond did not field questions over his family .',"said she was not ` moaning ' but also attacked criticism of women 's looks .",'she made the remarks in latest programme profiling the main party leaders .','ms sturgeon also revealed her tv habits and recent image makeover .','she said she relaxed by eating steak and chips on a saturday night .']]

enter image description here

我希望每个句子都具有这种标记化结构。我怎么能用拥抱脸做这样的事情?事实上,我认为我必须将上述列表的每个列表展平以获得字符串列表,然后对每个字符串进行标记。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...