问题描述
概述:在
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
和
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
我试图允许在字典的python列表和基于Jena / SPARQL的存储之间进行“往返”操作。
该方法在我的用例中效果很好,在尝试了一段时间后,我进入了更多需要解决的细节。
stackoverflow问题listOfDict to RDF conversion in python targeting Apache Jena Fuseki解决了最初的问题,https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed问题2-5显示了一些已经解决的详细问题。
现在我正在处理约180000条记录,我想从6个不同的数据源中导入,每个数据源似乎都有新的奇异记录 导致方法失败。
例如一批记录给了我以下日志:
read 45601 events in 0.6 s
storing 45601 events to sparql
batch for 1 - 2000 of 45601 cr:Event in 0.6 s -> 0.6 s
batch for 2001 - 4000 of 45601 cr:Event in 0.5 s -> 1.1 s
batch for 4001 - 6000 of 45601 cr:Event in 0.5 s -> 1.6 s
batch for 6001 - 8000 of 45601 cr:Event in 0.5 s -> 2.1 s
batch for 8001 - 10000 of 45601 cr:Event in 0.5 s -> 2.6 s
batch for 10001 - 12000 of 45601 cr:Event in 0.7 s -> 3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",line 1073,in _query
response = urlopener(request)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 222,in urlopen
return opener.open(url,data,timeout)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 531,in open
response = meth(req,response)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 640,in http_response
response = self.parent.error(
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 569,in error
return self._call_chain(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 502,in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",line 649,in http_error_default
raise HTTPError(req.full_url,code,msg,hdrs,fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint,probably the sparql query is bad formed.
Response:
b'Error 400: Bad Request\n'
现在,由于我没有任何问题的详细信息,因此我正在使用二进制搜索。上面的错误我只知道问题 的记录的batchIndex在12000和14000之间,所以我是。将限制设置为14000并将batchSize设置为100即可。
batch for 13301 - 13400 of 14000 cr:Event in 0.0 s -> 4.3 s
现在是最后一个成功的批次。所以我正在使用二进制搜索:13450失败,13425失败,13412 ok,13418 ok,13422失败,13420 ok,13421 ok 因此记录13422是元凶,我打开调试模式以查看为该记录创建的INSERT数据:
cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
cr:Event__102140gtm20003 cr:Event_location "M\\"unster,Germany".
cr:Event__102140gtm20003 cr:Event_source "crossref".
cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local fields".
cr:Event__102140gtm20003 cr:Event_startDate "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
cr:Event__102140gtm20003 cr:Event_year 1999.
cr:Event__102140gtm20003 cr:Event_month 9.
cr:Event__102140gtm20003 cr:Event_endDate "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
因此,“Münster”位置中的Umlaut编码“ \ u”是这里的罪魁祸首。我将解决此问题。真正的问题是:
我如何通过SPARQLWrapper获取Fuseki API以正确报告详细的错误消息 *
例如像
error in line # cr:Event__102140gtm20003 cr:Event_location "M\\"unster,Germany". is not a valid triple?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)