带有Elasticsearch python stream_bulk

问题描述

我正在尝试在python中使用ElasticSearchstreaming_bulk帮助程序,但是如果我升级到Elasticsearch 6.8.1客户端(最新版本,与我的ES 6.x服务器兼容),它就会中断。最后一个似乎对我有用的客户端是6.4.0,但是我不想停止更新客户端。这是有问题的代码摘录:

es_client = Elasticsearch(
    hosts=[{'host': host,'port': 443}],http_auth=awsauth,use_ssl=True,verify_certs=True,timeout=600,connection_class=Requestshttpconnection,http_compress=True,dead_timeout=1,retry_on_timeout=True
)

streaming_bulk(
    es_client,gen_actions(),max_retries=3,raise_on_exception=True,index=index,doc_type=doc_type,chunk_size=50)

gen_actions以以下形式返回字典:

{
    '_id': 'iiiii','_op_type': 'update','doc_as_upsert' : True,'doc': { .. some fields ... }
}

对于某些操作,会发生以下错误

File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py",line 218,in streaming_bulk
    for data,(ok,info) in zip(
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py",line 113,in _process_bulk_chunk
    raise e
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py",line 109,in _process_bulk_chunk
    resp = client.bulk("\n".join(bulk_actions) + "\n",*args,**kwargs)
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/utils.py",line 84,in _wrapped
    return func(*args,params=params,**kwargs)
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/client/__init__.py",line 1556,in bulk
    return self.transport.perform_request(
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/transport.py",line 351,in perform_request
    status,headers_response,data = connection.perform_request(
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/HTTP_Requests.py",line 161,in perform_request
    self._raise_error(response.status_code,raw_data)
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/connection/base.py",line 229,in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code,TransportError)(
elasticsearch.exceptions.TransportError: TransportError(500,'json_parse_exception','Illegal character ((CTRL-CHAR,code 31)): only regular white space (\\r,\\n,\\t) is allowed between tokens\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@45e0489b; line: 1,column: 2]')

起初,我虽然是问题doc的一部分,但是并不一致。对于某些文档,甚至是琐碎的文档,似乎都发生了错误。文档中没有特殊字符-仅字符串/整数/浮点数。在python 3.6.9,3.8上测试。仅当我将ES客户端更新到高于6.4.0的版本时才会发生。

有什么想法吗?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)