尝试使用 tdb2.tdbloader 加载 Wikidata truthy-latest.nt 导致代码:58/PROHIBITED_COMPONENT_PRESENT in USER

问题描述

使用 Apache Jena Fuseki,我尝试从 Wikidata 加载 latest-truthy.nt 数据集,但在尝试导入文件时出现以下错误。从 Bitplan 取得成功的地方获得了以下成功的灵感。

错误日志:

14:36:16 INFO  loader          :: Add: 198.500.000 latest-truthy.nt (Batch: 453.309 / Avg: 213.382)
14:36:17 ERROR riot            :: [line: 198884173,col: 87] Bad IRI: <https://abertillerymuseum@btconnect.com> Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
org.apache.jena.riot.RiotException: [line: 198884173,col: 87] Bad IRI: <https://abertillerymuseum@btconnect.com> Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:146)
    at org.apache.jena.riot.system.ParserProfileStd.internalMakeIRI(ParserProfileStd.java:112)
    at org.apache.jena.riot.system.ParserProfileStd.resolveIRI(ParserProfileStd.java:85)
    at org.apache.jena.riot.system.ParserProfileStd.createURI(ParserProfileStd.java:187)
    at org.apache.jena.riot.system.ParserProfileStd.create(ParserProfileStd.java:259)
    at org.apache.jena.riot.lang.LangNTriples.tokenAsNode(LangNTriples.java:70)
    at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:109)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:184)
    at org.apache.jena.riot.RDFParser.read(RDFParser.java:357)
    at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:323)
    at org.apache.jena.riot.RDFParser.parse(RDFParser.java:298)
    at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:550)
    at org.apache.jena.tdb2.loader.base.LoaderOps.inputFile(LoaderOps.java:107)
    at org.apache.jena.tdb2.loader.base.LoaderBase.loadOne(LoaderBase.java:125)
    at org.apache.jena.tdb2.loader.base.LoaderBase.lambda$load$0(LoaderBase.java:102)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at org.apache.jena.tdb2.loader.base.LoaderBase.load(LoaderBase.java:99)
    at tdb2.tdbloader.lambda$execBulkLoad$4(tdbloader.java:196)
    at org.apache.jena.atlas.lib.Timer.time(Timer.java:85)
    at tdb2.tdbloader.execBulkLoad(tdbloader.java:194)
    at tdb2.tdbloader.loadQuads(tdbloader.java:175)
    at tdb2.tdbloader.exec(tdbloader.java:136)
    at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
    at tdb2.tdbloader.main(tdbloader.java:64)

要导入的脚本:

@ECHO off
cd apache-jena-4.0.0
echo start import on %DATE% %TIME%

tdb2_tdbloader --loader=parallel --loc "C:\fuseki\data" "F:\latest-truthy.nt" > tdb2-out.log 2> tdb2-err.log

echo finish import on %DATE% %TIME%
pause

文件结构:

- C:/fuseki/
-- apache-jena-4.0.0/
-- apache-jena-fuseki-4.0.0/
-- data/
-- startfusekidb.bat
-- wikidata2fuseki.bat

- F:/
-- latest-truthy.nt

这是 Fuseki 的问题吗?我无法自己打开 .nt 文件来消除该问题。是否有任何我可以使用的标志,以便它跳过使用 tdbloader 对给定导入的验证?

我也在维基数据的 IRC 频道中询问这个问题,看看他们是否能够帮助我。

更新: 我从 IRC 的某个人那里得到了答案,他们告诉我数据集 Errors in Wikidata 中存在很多错误,所以我知道需要找到一种方法来跳过与错误相关的行并继续加载。但是 Fuseki TDB2 Commands 没有显示任何帮助。

还尝试 --help 输出以下内容,从而表明不存在跳过?

c:\fuseki\apache-jena-4.0.0\bin>tdb2_tdbloader -h
tdbloader--loader= [--desc DATASET | --loc DIR] FILE ...
  Location
      --loc=DIR              Location (a directory)
      --tdb=                 Assembler description file
      --graph=IRI            Act on a named graph
      --loader=              Loader to use: 'basic','phased' (default),'sequential','parallel' or 'light'
      --syntax=LANG          Syntax of data from stdin
  Symbol definition
      --set                  Set a configuration symbol to a value
      --mem=FILE             Execute on an in-memory TDB database (for testing)
      --desc=                Assembler description file
  General
      -v   --verbose         Verbose
      -q   --quiet           Run with minimal output
      --debug                Output information for debugging
      --help
      --version              Version information
      --strict               Operate in strict SPARQL mode (no extensions of any kind)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...