如何通过SPARQLWrapper获取Fuseki API以正确报告详细的错误消息?

问题描述

概述:在

https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py

https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py

我试图允许在字典的python列表和基于Jena / SPARQL的存储之间进行“往返”操作。

方法在我的用例中效果很好,在尝试了一段时间后,我进入了更多需要解决的细节。

stackoverflow问题listOfDict to RDF conversion in python targeting Apache Jena Fuseki解决了最初的问题,https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed问题2-5显示了一些已经解决的详细问题。

现在我正在处理约180000条记录,我想从6个不同的数据源中导入,每个数据源似乎都有新的奇异记录 导致方法失败。

例如一批记录给了我以下日志:

read 45601 events in   0.6 s
storing 45601 events to sparql
  batch for         1 -      2000 of     45601 cr:Event in    0.6 s ->    0.6 s
  batch for      2001 -      4000 of     45601 cr:Event in    0.5 s ->    1.1 s
  batch for      4001 -      6000 of     45601 cr:Event in    0.5 s ->    1.6 s
  batch for      6001 -      8000 of     45601 cr:Event in    0.5 s ->    2.1 s
  batch for      8001 -     10000 of     45601 cr:Event in    0.5 s ->    2.6 s
  batch for     10001 -     12000 of     45601 cr:Event in    0.7 s ->    3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",line 1073,in _query
    response = urlopener(request)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 222,in urlopen
    return opener.open(url,data,timeout)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 531,in open
    response = meth(req,response)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 640,in http_response
    response = self.parent.error(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 569,in error
    return self._call_chain(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 502,in _call_chain
    result = func(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 649,in http_error_default
    raise HTTPError(req.full_url,code,msg,hdrs,fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint,probably the sparql query is bad formed.

Response:
b'Error 400: Bad Request\n'

现在,由于我没有任何问题的详细信息,因此我正在使用二进制搜索。上面的错误我只知道问题 的记录的batchIndex在12000和14000之间,所以我是。将限制设置为14000并将batchSize设置为100即可。

 batch for     13301 -     13400 of     14000 cr:Event in    0.0 s ->    4.3 s

现在是最后一个成功的批次。所以我正在使用二进制搜索:13450失败,13425失败,13412 ok,13418 ok,13422失败,13420 ok,13421 ok 因此记录13422是元凶,我打开调试模式以查看为该记录创建的INSERT数据:

  cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
  cr:Event__102140gtm20003 cr:Event_location "M\\"unster,Germany".
  cr:Event__102140gtm20003 cr:Event_source "crossref".
  cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
  cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local fields".
  cr:Event__102140gtm20003 cr:Event_startDate "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
  cr:Event__102140gtm20003 cr:Event_year 1999.
  cr:Event__102140gtm20003 cr:Event_month 9.
  cr:Event__102140gtm20003 cr:Event_endDate "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.

因此,“Münster”位置中的Umlaut编码“ \ u”是这里的罪魁祸首。我将解决此问题。真正的问题是:

我如何通过SPARQLWrapper获取Fuseki API以正确报告详细的错误消息 *

例如像

error in line # cr:Event__102140gtm20003 cr:Event_location "M\\"unster,Germany". is  not a valid triple?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)