Elasticsearch一个特定的碎片在不同的数据节点中保持初始化

问题描述

我收到ElasticsearchStatusWarning的消息，说集群状态为黄色。运行运行状况检查API后，我将看到以下内容

卷曲-X GET http：// localhost：9200 / _cluster / health /

{"cluster_name":"my-elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":8,"number_of_data_nodes":3,"active_primary_shards":220,"active_shards":438,"relocating_shards":0,"initializing_shards":2,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":99.54545454545455}

initializing_shards是2。所以，我进一步运行以下调用

卷曲-X GET http：// localhost：9200 / _cat / shards？h = index，shard，prirep，state，unssigned.reason | grep INIT

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33457  100 33457    0   graph_vertex_24_18549             0 r INITIALIZING ALLOCATION_Failed
  0  79609      0 --:--:-- --:--:-- --:--:-- 79659

卷曲-X GET http：// localhost：9200 / _cat / shards / graph_vertex_24_18549

graph_vertex_24_18549 0 p STARTED      8373375 8.4gb IP1   elasticsearch-data-1
graph_vertex_24_18549 0 r INITIALIZING               IP2 elasticsearch-data-2

并在几分钟后重新运行同一命令，显示现在它已在elasticsearch-data-0中初始化。见下文

graph_vertex_24_18549 0 p STARTED      8373375 8.4gb IP1   elasticsearch-data-1
graph_vertex_24_18549 0 r INITIALIZING               IP0   elasticsearch-data-0

如果我在几分钟后再次运行它，我可以看到它再次在elasticsearch-data-2中初始化。但是它永远不会开始。

curl -X GET http：// localhost：9200 / _cat / allocation？v

shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
   147      162.2gb   183.8gb    308.1gb      492gb           37 IP1 IP1 elasticsearch-data-2
   146      217.3gb   234.2gb    257.7gb      492gb           47 IP2   IP2   elasticsearch-data-1
   147      216.6gb   231.2gb    260.7gb      492gb           47 IP3  IP3  elasticsearch-data-0

curl -X GET http：// localhost：9200 / _cat / nodes？v

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
IP1            7          77  20    4.17    4.57     4.88 mi        -      elasticsearch-master-2
IP2          72          59   7    2.59    2.38     2.19 i         -      elasticsearch-5f4bd5b88f-4lvxz
IP3           57          49   3    0.75    1.13     1.09 di        -      elasticsearch-data-2
IP4           63          57  21    2.69    3.58     4.11 di        -      elasticsearch-data-0
IP5            5          59   7    2.59    2.38     2.19 mi        -      elasticsearch-master-0
IP6            69          53  13    4.67    4.60     4.66 di        -      elasticsearch-data-1
IP7           8          70  14    2.86    3.20     3.09 mi        *      elasticsearch-master-1
IP8           30          77  20    4.17    4.57     4.88 i         -      elasticsearch-5f4bd5b88f-wnrl4

curl -s -XGET http：// localhost：9200 / _cluster / allocation / explain -d'{ “ index”：“ graph_vertex_24_18549”，“ shard”： 0，“ primary”：false}'-H'内容类型：application / json'

{"index":"graph_vertex_24_18549","shard":0,"primary":false,"current_state":"initializing","unassigned_info":{"reason":"ALLOCATION_Failed","at":"2020-11-04T08:21:45.756Z","Failed_allocation_attempts":1,"details":"Failed shard on node [1XEXS92jTK-wwanNgQrxsA]: Failed to perform indices:data/write/bulk[s] on replica [graph_vertex_24_18549][0],node[1XEXS92jTK-wwanNgQrxsA],[R],s[STARTED],a[id=RnTOlfQuQkOumVuw_NeuTw],failure RemoteTransportException[[elasticsearch-data-2][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large,data for [<transport_request>] would be [4322682690/4gb],which is larger than the limit of [4005632409/3.7gb],real usage: [3646987112/3.3gb],new bytes reserved: [675695578/644.3mb]]; ","last_allocation_status":"no_attempt"},"current_node":{"id":"o_9jyrmOSca9T12J4bY0Nw","name":"elasticsearch-data-0","transport_address":"IP:9300"},"explanation":"the shard is in the process of initializing on node [elasticsearch-data-0],wait until initialization has completed"}

由于与上述相同的异常，我较早收到未分配碎片的警报-“ CircuitBreakingException [[parent]数据太大，[]的数据将是[4322682690 / 4gb]，大于限制为[4005632409 / 3.7gb]“

但是那时候堆只有2G。我将其增加到4G。现在我看到了同样的错误，但是这次是关于初始化分片而不是未分配的分片。

我该如何补救？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）